top | item 41404279

(no title)

xiaq | 1 year ago

Right, I may have forgot to mention that lexerless parsers are somewhat unusual.

I didn't have much time in the talk to go into the reason, so here it is:

- You'll need a more complex lexer to parse a shell-like syntax. For example, one common thing you do with lexers is get rid of whitespaces, but shell syntax is whitespace sensitive: "a$x" and "a $x" (double quotes not part of the code) are different things: the first is a single word containing a string concatenation, the second is two separate words.

- If your parser backtracks a lot, lexing can improve performance: you're not going back characters, only tokens (and there are fewer tokens than characters). Elvish's parser doesn't backtrack. (It does use lookahead fairly liberally.)

Having a lexerless parser does mean that you have to constantly deal with whitespaces in every place though, and it can get a bit annoying. But personally I like the conceptual simplicity and not having to deal with silly tokens like LBRACE, LPAREN, PIPE.

I have not used parser generators enough to comment about the benefits of using them compared to writing a parser by hand. The handwritten one works well so far :)

discuss

throwaway2016a|1 year ago

That example you gave could certainly be done in Lex/Flex and I assume other lexers/tokenizers as well, for instance, you would probably use states and have "$x" in the initial state evaluate to a different token type than "$x" in the string state.

But I do get your meaning, I've written a lot of tokenizers by hand as well, sometimes they can be easier to follow the hand written code. Config files for grammars can get convoluted fast.

But again, I was not meaning it as criticism. But your talk title does start with "How to write a programming language and shell in Go" so given the title I think Lexers / Tokenizers are worth noting.

xiaq|1 year ago

Yeah, ultimately there's an element of personal taste at play.

The authoritative tone of "how to write ..." is meant in jest, but obviously by doing that I risk being misunderstood. A more accurate title would be "how I wrote ...", but it's slightly boring and I was trying hard to get my talk proposal accepted you see :)