试玩了下parsec

上个星期翻的那篇文章里貌似提了下parsec,照着文档试玩了下。之前貌似玩过ruby的那个代码生成工具racc,它的名气不是很大而且文档不怎么全,感觉不是很爽。写了个helloworld就再也没碰。再写个parsec的helloworld。

语法分析的helloword就是表达式计算了。parsec跟*acc貌似不怎么像。不过无非都是上下文无关文法嘛,这个有名的ebnf:

expr     ::=   expr  ’+’   term  |  term
term     ::=   term  ’*’   factor   |  factor
factor   ::=(’  expr   ’)’  |  digit+
 
digit    ::=0’  |  ’1’  |  ...  |  ’9

葫芦画瓢地写出parsec版:

import Text.ParserCombinators.Parsec
import qualified Text.ParserCombinators.Parsec.Token as P
import Text.ParserCombinators.Parsec.Language (haskellDef)
 
do_parse p input
	= case (parse p "" input) of
		Left err -> do {
			putStr "parse error at:";
			print err;
		}
		Right x -> do {
			print x;
		}
 
expr :: Parser Int
expr = do {
	a < - expr;
	char '+';
	b <- term;
	return (a + b);
} <|> term
 
term :: Parser Int
term = do {
	a < - term;
	char '*';
	b <- factor;
	return (a*b);
}
 
factor :: Parser Int
factor = do {
	char '(';
	a <- expr;
	char ')';
	return a;
} <|> number
 
number = do {
	a < - many1 digit;
	return (read a);
}

parsec, Parser Combinators嘛。当然跟*acc那套不一样了。函数在这里就都是个组合子,可以像零件那样方便地组合。

do_parse这个函数取两个参数,一个是parser,一个是表示表达式的字符串。haskell的do-notation除了允许像python那样的缩进风格之外,还有这里用的c-style。parsec就用monad来表示sequence,< |>来表示choice,某种意义上讲,貌似只要它俩就足够构造出复杂的parser了。咋一看可能要比*acc那伪bnf的观感麻烦得多,但别忘了组合子的优势:可以方便地组合。用几个简单的组合子可以方便地构造出复杂的组合子,复杂的组合子继续组合成更复杂的组合子,而使用起来可是极为简洁。

嗯,废话真多。
进ghci测试一下

*Main> do_parse expr "1+1"
*** Exception: stack overflow

查手册,发现这么一句“Unfortunately, left-recursive grammars can not be specified directly in a combinator library. If you accidently write a left recursive program, the parser will go into an infinite loop! ”

哑然。
组合子还是函数嘛,想想在c中要是定义这样的函数会怎样。

Paser expr(){
	a = expr();
	...
}

不过“However, every left-recursive grammar can be rewritten to a non-left- recursive grammar. The library provides combinators which do this automatically for you (chainl and chainl1). ”

嗯。手册里貌似写了个表达式计算的例子,就直接用了Parsec内置的ParsecExpr:

import   ParsecExpr
 
expr      ::  Parser   Integer
expr      =  buildExpressionParser      table   factor
          < ?>  "expression"
 
table     =  [[op  "*"  (*)  AssocLeft ,   op  "/"  div  AssocLeft]
             ,[op  "+"  (+)  AssocLeft,    op  "-"  (-)  AssocLeft]
             ]
          where
             op  s f  assoc
                =  Infix   (do{  string   s;  return  f})   assoc
 
factor    =  do{  char  ’(’
                ; x  < - expr
                ; char  ’)’
                ; return   x
               }
          <|>  number
          < ?>  "simple   expression"
 
number    ::  Parser   Integer
number    =  do{  ds < -  many1   digit
                ; return   (read   ds)
               }
          <?>  "number"

呃,运算符的优先级,左结合都考虑了。不过这样也太没意思了,该我写的都让它给内置了,不过那句话不是提到有个chainl1么,就用chainl1重写:

module Main where
 
import Text.ParserCombinators.Parsec
import qualified Text.ParserCombinators.Parsec.Token as P
import Text.ParserCombinators.Parsec.Language (haskellDef)
 
main = do {
	input < - getLine;
	do_parse expr input;
}
 
do_parse p input
	= case (parse p "" input) of
		Left err -> do {
			putStr "parse error at:";
			print err;
		}
		Right x -> do {
			print x;
		}
 
lexer = P.makeTokenParser haskellDef
parens = P.parens lexer
symbol = P.symbol lexer
naturalOrFloat = P.naturalOrFloat lexer
 
expr = term `chainl1` addop
term = factor `chainl1` mulop
factor = parens expr < |> number
number = do {
	norf < - naturalOrFloat;
   	case norf of
		Left n -> return (read $ show n);  --so sucks
	   	Right f	-> return f;
}
 
mulop = do { symbol "*" ; return (*); } < |> do { symbol "/" ; return (/); }
addop = do { symbol "+" ; return (+); } < |> do { symbol "-" ; return (-); }

用到了ParsecToken。它就直接使用了haskell的token定义,也就是说,你可以在这个程序里使用haskell的注释

*Main> do_parse expr "1+{-- it's a comment --}(1/2)"
1.5

2 Responses to “试玩了下parsec”


Leave a Reply