top | item 40331083

(no title)

sigil | 1 year ago

This idea was used to great effect in Matt Might’s “Parsing with Derivatives” paper [0]! And it featured prominently in the Compilers class he taught at the University of Utah.

[0] https://matt.might.net/papers/might2011derivatives.pdf

discuss

banish-m4|1 year ago

Excellent paper. I recall it goes past simple/regular languages and can be used to parse anything without first building a lexer.

galaxyLogic|1 year ago

Looks very interesting. Does it mean that RegExp Derivatives can be used to parse languages which basic regular expressions cannot?

As an example basic HTML cannot (?) be parsed by RegExp because tag-pairs can contain tag-pairs:

   <div> <div> </div> </div>

eludes RegExp matching, it seems to me, because a typical standard RegExp would only match "<div> <div> </div>" and would not see the 2nd </div>.

Can RegExp Derivatives do it better?

sigil|1 year ago

HTML can be described by a context-free grammar [0], but not by a regular grammar [1]. If a language can be described by a regular grammar, you can parse it with a regular expression -- that's where the "regular" in RegExp comes from!

Derivatives of RegExps don't automatically unlock parsing of context-free grammars, afaik. For that you need recursion. They do however unlock some very elegant parser designs.

[0] https://en.wikipedia.org/wiki/Context-free_grammar

[1] https://en.wikipedia.org/wiki/Regular_grammar

banish-m4|1 year ago

Read up on context sensitive grammars.