上个星期翻的那篇文章里貌似提了下parsec,照着文档试玩了下。之前貌似玩过ruby的那个代码生成工具racc,它的名气不是很大而且文档不怎么全,感觉不是很爽。写了个helloworld就再也没碰。再写个parsec的helloworld。
语法分析的helloword就是表达式计算了。parsec跟*acc貌似不怎么像。不过无非都是上下文无关文法嘛,这个有名的ebnf:
expr ::= expr ’+’ term | term term ::= term ’*’ factor | factor factor ::= ’(’ expr ’)’ | digit+ digit ::= ’0’ | ’1’ | ... | ’9’
葫芦画瓢地写出parsec版:
import Text.ParserCombinators.Parsec import qualified Text.ParserCombinators.Parsec.Token as P import Text.ParserCombinators.Parsec.Language (haskellDef) do_parse p input = case (parse p "" input) of Left err -> do { putStr "parse error at:"; print err; } Right x -> do { print x; } expr :: Parser Int expr = do { a < - expr; char '+'; b <- term; return (a + b); } <|> term term :: Parser Int term = do { a < - term; char '*'; b <- factor; return (a*b); } factor :: Parser Int factor = do { char '('; a <- expr; char ')'; return a; } <|> number number = do { a < - many1 digit; return (read a); }
parsec, Parser Combinators嘛。当然跟*acc那套不一样了。函数在这里就都是个组合子,可以像零件那样方便地组合。
do_parse这个函数取两个参数,一个是parser,一个是表示表达式的字符串。haskell的do-notation除了允许像python那样的缩进风格之外,还有这里用的c-style。parsec就用monad来表示sequence,< |>来表示choice,某种意义上讲,貌似只要它俩就足够构造出复杂的parser了。咋一看可能要比*acc那伪bnf的观感麻烦得多,但别忘了组合子的优势:可以方便地组合。用几个简单的组合子可以方便地构造出复杂的组合子,复杂的组合子继续组合成更复杂的组合子,而使用起来可是极为简洁。
嗯,废话真多。
进ghci测试一下
*Main> do_parse expr "1+1" *** Exception: stack overflow
查手册,发现这么一句“Unfortunately, left-recursive grammars can not be specified directly in a combinator library. If you accidently write a left recursive program, the parser will go into an infinite loop! ”
哑然。
组合子还是函数嘛,想想在c中要是定义这样的函数会怎样。
Paser expr(){ a = expr(); ... }
不过“However, every left-recursive grammar can be rewritten to a non-left- recursive grammar. The library provides combinators which do this automatically for you (chainl and chainl1). ”
嗯。手册里貌似写了个表达式计算的例子,就直接用了Parsec内置的ParsecExpr:
import ParsecExpr expr :: Parser Integer expr = buildExpressionParser table factor < ?> "expression" table = [[op "*" (*) AssocLeft , op "/" div AssocLeft] ,[op "+" (+) AssocLeft, op "-" (-) AssocLeft] ] where op s f assoc = Infix (do{ string s; return f}) assoc factor = do{ char ’(’ ; x < - expr ; char ’)’ ; return x } <|> number < ?> "simple expression" number :: Parser Integer number = do{ ds < - many1 digit ; return (read ds) } <?> "number"
呃,运算符的优先级,左结合都考虑了。不过这样也太没意思了,该我写的都让它给内置了,不过那句话不是提到有个chainl1么,就用chainl1重写:
module Main where import Text.ParserCombinators.Parsec import qualified Text.ParserCombinators.Parsec.Token as P import Text.ParserCombinators.Parsec.Language (haskellDef) main = do { input < - getLine; do_parse expr input; } do_parse p input = case (parse p "" input) of Left err -> do { putStr "parse error at:"; print err; } Right x -> do { print x; } lexer = P.makeTokenParser haskellDef parens = P.parens lexer symbol = P.symbol lexer naturalOrFloat = P.naturalOrFloat lexer expr = term `chainl1` addop term = factor `chainl1` mulop factor = parens expr < |> number number = do { norf < - naturalOrFloat; case norf of Left n -> return (read $ show n); --so sucks Right f -> return f; } mulop = do { symbol "*" ; return (*); } < |> do { symbol "/" ; return (/); } addop = do { symbol "+" ; return (+); } < |> do { symbol "-" ; return (-); }
用到了ParsecToken。它就直接使用了haskell的token定义,也就是说,你可以在这个程序里使用haskell的注释
*Main> do_parse expr "1+{-- it's a comment --}(1/2)" 1.5

传说中的沙发……
“Unfortunately, left-recursive grammars can not be specified directly in a combinator library. If you accidently write a left recursive program, the parser will go into an infinite loop!”
@wangweinoo1 这点多少有些不爽,不能照着抄bnf