top | item 970031

(no title)

ryah | 16 years ago

clearly you're unfamiliar with http.

discuss

I certainly am. It's a complete mystery to me. Is it like visual basic?

The only PITA with HTTP is chunked encoding. Whoever thought that gem up should be shot. The rest is fairly trivial. Certainly parsing headers is. This implementation looks pretty silly. Having individual states for each of the characters in "HTTP" etc? WTF?

edit: Instead of just downmodding me, why not explain exactly what part of parsing HTTP headers is non trivial?

viraptor|16 years ago

The parser correctness is not trivial. The RFC2616 contains the complete grammar you need, so it's fairly simple to implement. OTOH, if you write a parser on your own, you're likely to miss stuff like section 4.2, which explains that header values can be multiline if they include LWS. The parser from this article will fail on (just looking at the source, I'm 99.5% sure of this):

    abc:
     def

It also doesn't like tabs and will not support comma-separated header values. It's not rocket science to write a "good enough" http parser, but writing a fully compliant one is something completely different. There are also cool parts of the spec that you can read 10 times and come to different conclusions - for example what does the "\" CR LF section mean if it's inside a quoted string and does it finish the header value or not. Writing a "correct" parser is a LOT of fun...

Keeping separate states for characters in HTTP saves you a couple of cycles probably, because you match as you go and can reject the message early and with the exact place that didn't match. It's a bit useless for a 4-letter string though.