top | item 947814

Writing a Lexer

16 points| kingkilr | 16 years ago |lazypython.blogspot.com | reply

2 comments

order
[+] fadmmatt|16 years ago|reply
I think the coolest way to write a lexer is to use the derivative of regular expressions:

http://matt.might.net/articles/implementation-of-regular-exp...

It's surprisingly easy to implement, as the prior example demonstrates in Scheme.

This is the technique I recommend to my advanced compilers class.

[+] mahmud|16 years ago|reply
The first lexer and recursive descent parser from Levine's "Lex & Yacc" book has been hammered into my head. I rewrote it from memory so often it's now part of my internal standard library; as late as this morning I was writing it in lisp to process the beanstalkd message queue protocol. I wrote it a few days ago as well, for Redis, and before that by a week for Memcached.

Manual lexer writing is a reusable but non-transferable technique, specially when regexes are slow or over kill; every little use-case has its small set of states, own termination condition, constituent characters and tokens :-)

Basic compiler techniques gave me the freedom to write software that I wanted, not software prescribed to me by a book or work specs (let's face it, writing tools is the most fun kind of programming.) Sometimes I am grateful I know these things and can make whatever I fancy, other times I am spiteful for not being good at it; my work has thus far been marked by exuberant incompetence :-P

[Edit:

Or you can skip lexing and go straight to language design:

http://www.cs.brown.edu/~sk/Publications/Books/ProgLangs/200...

]