This idea was used to great effect in Matt Might’s “Parsing with Derivatives” paper [0]! And it featured prominently in the Compilers class he taught at the University of Utah.
HTML can be described by a context-free grammar [0], but not by a regular grammar [1]. If a language can be described by a regular grammar, you can parse it with a regular expression -- that's where the "regular" in RegExp comes from!
Derivatives of RegExps don't automatically unlock parsing of context-free grammars, afaik. For that you need recursion. They do however unlock some very elegant parser designs.
banish-m4|1 year ago
galaxyLogic|1 year ago
As an example basic HTML cannot (?) be parsed by RegExp because tag-pairs can contain tag-pairs:
eludes RegExp matching, it seems to me, because a typical standard RegExp would only match "<div> <div> </div>" and would not see the 2nd </div>.Can RegExp Derivatives do it better?
sigil|1 year ago
Derivatives of RegExps don't automatically unlock parsing of context-free grammars, afaik. For that you need recursion. They do however unlock some very elegant parser designs.
[0] https://en.wikipedia.org/wiki/Context-free_grammar
[1] https://en.wikipedia.org/wiki/Regular_grammar
banish-m4|1 year ago