top | item 41422815

(no title)

jasone | 1 year ago

Hard disagree. Yacc has unnecessary footguns, in particular the fallout from using LALR(1), but more modern parser generators like bison provide LR(1) and IELR(1). Hand-rolled recursive descent parsers as well as parser combinators can easily obscure implicit resolution of grammar ambiguities. A good LR(1) parser generator enables a level of grammar consistency that is very difficult to achieve otherwise.

discuss

order

thomasmg|1 year ago

> Hand-rolled recursive descent parsers as well as parser combinators can easily obscure implicit resolution of grammar ambiguities.

Could you give a concrete, real-life example of this? I have written many recursive-descent parsers and never ran into this problem (Apache Jackrabbit Oak SQL and XPath parser, H2 database engine, PointBase Micro database engine, HypersonicSQL, NewSQL, Regex parsers, GraphQL parsers, and currently the Bau programming language).

I have often heard that Bison / Yacc / ANTLR etc are "superior", but mostly from people that didn't actually have to write and maintain production-quality parsers. I do have experience with the above parser generators, eg. for university projects, and Apache Jackrabbit (2.x). I remember that in each case, the parser generators had some "limitations" that caused problems down the line. Then I had to spend more time trying to work around the parser generator limitations than actually doing productive work.

This may sound harsh, but well that's my experience... I would love to hear from people that had a different experience for non-trivial projects...

masfuerte|1 year ago

If you start with an unambiguous grammar then you aren't going to introduce ambiguities by implementing it with a recursive descent parser.

If you are developing a new grammar it is quite easy to accidentally create ambiguities and a recursive descent parser won't highlight them. This becomes painful when you try to evolve the grammar.

tgv|1 year ago

The original comment says that using yacc/bison is "fundamentally misguided." But parser generators make it easy to add a correct parser to your project. It's obviously not the only way. Hand-rolling has a bunch of pitfalls, and easily leads to apparently correct behavior that does weird things on untested input. Your comment then is a bit like: I've never had memory corruption in C, so Rust/Java/etc. is for toy projects only.

tgv|1 year ago

Same. LR(k) and LL(k) are readable and completely unambiguous, in contrast to PEG, where ambiguity is resolved ad hoc: PEG doesn't have a single definition, so implementations may differ, and the original PEG uses the order of the rules and backtracking to resolve ambiguity, which may lead to different resolutions in different contexts. Ambiguity does not leap out to the programmer.

OTOH, an LL(1) grammar can be used to generate a top-down/recursive descent parser, and will always be correct.

HelloNurse|1 year ago

A large portion of this consistency is not making executive decisions about parsing ambiguities. The difference between "the language is implicitly defined by what the parser does" and "the grammar for the language has been refined one failed test at a time" is large and practically important.