Chumsky, a Rust parser-combinator library with error recovery

[+] jitl|3 years ago|reply

I haven't written a parser with Chumsky, but I've played with a little one a bit if you wanna see an example syntax. The error reporting for this project is implemented with `ariadne` which is also really slick.

Parser: https://github.com/ekzhang/percival/blob/main/crates/perciva...

Error reporting: https://github.com/ekzhang/percival/blob/main/crates/perciva...

Datalog playground: https://percival.ink/

To see an error report, delete some punctuation from one of the Datalog code blocks then press shift-return.

[+] brundolf|3 years ago|reply

I'm probably going to convert my compiler project to Rust just so I can use Chumsky and Ariadne. The error feedback and recovery is so much better than anything I could have written by hand

[+] cercatrova|3 years ago|reply

The creator of Chumsky is also quite a big proponent of getting generic associated types stabilized in Rust [0], interestingly enough. They have several comments talking about how GATs were very helpful for Chumsky to model their parsing and combinators.

[0] https://github.com/rust-lang/rust/pull/96709

[+] dataangel|3 years ago|reply

It's weird to me how much effort is being put in on that thread to find examples of crates needing it when people wanted templated typedefs that they could put inside classes in C++ for YEARS before C++11. The use cases for this stuff were around 15+ years ago!

[+] fanf2|3 years ago|reply

My main question is how this compares to nom which has long been a solid choice for parser combinators in Rust. But no mention in the readme?

[+] mullr|3 years ago|reply

Error recovery in nom is left as a very obtuse exercise to the reader. Custom error reporting is difficult at best. That stuff is supposed to be better in chumsky; I don’t know if it actually is.

However, for my own parser which is currently written in nom, my current plan is to port it over to tree-sitter. Its error recovery is completely automatic, and a fair sight better than anything I have time to do by hand.

[+] IshKebab|3 years ago|reply

I've used Nom. It isn't really that suited to parsing languages with things like precedence. It also doesn't have any error recovery, and error messages are very basic. It's ok if you are designing your own language because you can design the language to be easy to parse with Nom but I'm not sure I'd recommend it for parsing an existing language.

I've also used Tree Sitter. It has error recovery and a powerful grammar system but the downsides are that the grammar system is quite confusing compared to parser combinators, it's written in C which makes cross compilation a pain, and it doesn't actually do the whole job. You get a stringly typed tree of nodes that you have to do a second parse over. Quite tedious. Acceptable if you don't need a full AST though, e.g. you're just searching for specific nodes.

I haven't tried Chumsky yet but I definitely will. Looks very promising.

[+] charleskinbote|3 years ago|reply

I was going to ask the same thing. I've used nom for a library of mine but wasn't totally satisfied with it, so I think I'll give this a try.

[+] mpalmer|3 years ago|reply

It's mentioned in the performance section, "another crate with similar design".

https://github.com/zesterer/chumsky#performance

[+] voxl|3 years ago|reply

Actually trying to write a parser with this was something else for me, the kinds of types i was looking at seemed impenetrable. It looks very nice, but usability, at least for me, made me wash my hands and just roll my own parser.

I've used nom successfully in the past, even when it was macro-hell. Part of that might have been the greater amount of available combinators though, making getting really into the weeds less likely.

[+] brundolf|3 years ago|reply

Yeah the types themselves are fairly impenetrable, but if you follow the tutorial it's not too hard to learn to actually use. I just did the tutorial and it made a lot of sense

In fairness, the same can be said of Rust's iterator types; they drive autocomplete and they surface errors when you do something wrong, but they're not really directly readable. This sort of thing is the reason `impl` types exist

[+] aljazmerzen|3 years ago|reply

I can vouch for adriane, the error display library that is the sister project of Chomsky.

We integrated it into PRQL compiler and the errors are beautiful!

https://github.com/prql/prql/pull/275

[+] de_keyboard|3 years ago|reply

I'm interested in how this library handles recursion, e.g.

   Expr = '(' Expr ')'
        | Expr '+' Expr

It's very easy to get stuck into infinite loops when handling recursion in parser-combinator libraries.

Does this library improve on that?

[+] zesterer|3 years ago|reply

Chumsky still has trouble with left recursion, like many PEG parsers, but it's fairly easy to rewrite such grammars without left recursion as demonstrated in the tutorial: https://github.com/zesterer/chumsky/blob/master/tutorial.md

[+] unknown|3 years ago|reply

[deleted]

[+] ufo|3 years ago|reply

I wonder what are the error recovery strategies that it implements. The README doesn't go into details.

[+] avgcorrection|3 years ago|reply

There’s also Pomsky which is a language alternative to regex.

23 comments