(no title)
dmsnell | 1 year ago
This, IMO, is a bigger reason to avoid regex and XML parsers for HTML documents. The rules aren’t apparent when thinking linearly about what strings appear after or before each other; they become clearer when thinking of HTML as a shorthand syntax for certain kinds of push and pop operations.
XHTML is easier to parse, but for well-formed documents pushes the complexity of invalid markup into the rendering side. For example, it’s well-formed to include a button inside a button, so XHTML browsers render exactly this, but it makes no sense from a UI perspective and strange things happen when invalid markup is sent in well-formed XML.
No comments yet.