top | item 32031591

Chumsky, a Rust parser-combinator library with error recovery

115 points| brundolf | 3 years ago |github.com | reply

23 comments

order
[+] jitl|3 years ago|reply
I haven't written a parser with Chumsky, but I've played with a little one a bit if you wanna see an example syntax. The error reporting for this project is implemented with `ariadne` which is also really slick.

Parser: https://github.com/ekzhang/percival/blob/main/crates/perciva...

Error reporting: https://github.com/ekzhang/percival/blob/main/crates/perciva...

Datalog playground: https://percival.ink/

To see an error report, delete some punctuation from one of the Datalog code blocks then press shift-return.

[+] brundolf|3 years ago|reply
I'm probably going to convert my compiler project to Rust just so I can use Chumsky and Ariadne. The error feedback and recovery is so much better than anything I could have written by hand
[+] cercatrova|3 years ago|reply
The creator of Chumsky is also quite a big proponent of getting generic associated types stabilized in Rust [0], interestingly enough. They have several comments talking about how GATs were very helpful for Chumsky to model their parsing and combinators.

[0] https://github.com/rust-lang/rust/pull/96709

[+] dataangel|3 years ago|reply
It's weird to me how much effort is being put in on that thread to find examples of crates needing it when people wanted templated typedefs that they could put inside classes in C++ for YEARS before C++11. The use cases for this stuff were around 15+ years ago!
[+] fanf2|3 years ago|reply
My main question is how this compares to nom which has long been a solid choice for parser combinators in Rust. But no mention in the readme?
[+] mullr|3 years ago|reply
Error recovery in nom is left as a very obtuse exercise to the reader. Custom error reporting is difficult at best. That stuff is supposed to be better in chumsky; I don’t know if it actually is.

However, for my own parser which is currently written in nom, my current plan is to port it over to tree-sitter. Its error recovery is completely automatic, and a fair sight better than anything I have time to do by hand.

[+] IshKebab|3 years ago|reply
I've used Nom. It isn't really that suited to parsing languages with things like precedence. It also doesn't have any error recovery, and error messages are very basic. It's ok if you are designing your own language because you can design the language to be easy to parse with Nom but I'm not sure I'd recommend it for parsing an existing language.

I've also used Tree Sitter. It has error recovery and a powerful grammar system but the downsides are that the grammar system is quite confusing compared to parser combinators, it's written in C which makes cross compilation a pain, and it doesn't actually do the whole job. You get a stringly typed tree of nodes that you have to do a second parse over. Quite tedious. Acceptable if you don't need a full AST though, e.g. you're just searching for specific nodes.

I haven't tried Chumsky yet but I definitely will. Looks very promising.

[+] charleskinbote|3 years ago|reply
I was going to ask the same thing. I've used nom for a library of mine but wasn't totally satisfied with it, so I think I'll give this a try.
[+] voxl|3 years ago|reply
Actually trying to write a parser with this was something else for me, the kinds of types i was looking at seemed impenetrable. It looks very nice, but usability, at least for me, made me wash my hands and just roll my own parser.

I've used nom successfully in the past, even when it was macro-hell. Part of that might have been the greater amount of available combinators though, making getting really into the weeds less likely.

[+] brundolf|3 years ago|reply
Yeah the types themselves are fairly impenetrable, but if you follow the tutorial it's not too hard to learn to actually use. I just did the tutorial and it made a lot of sense

In fairness, the same can be said of Rust's iterator types; they drive autocomplete and they surface errors when you do something wrong, but they're not really directly readable. This sort of thing is the reason `impl` types exist

[+] de_keyboard|3 years ago|reply
I'm interested in how this library handles recursion, e.g.

   Expr = '(' Expr ')'
        | Expr '+' Expr
It's very easy to get stuck into infinite loops when handling recursion in parser-combinator libraries.

Does this library improve on that?

[+] ufo|3 years ago|reply
I wonder what are the error recovery strategies that it implements. The README doesn't go into details.
[+] avgcorrection|3 years ago|reply
There’s also Pomsky which is a language alternative to regex.