(no title)
arvinjoar | 8 years ago
Another thing is that you don't really need to handle HTML at all, only a small subsection that might be totally fine with a regex, even a simple one, for a lot of cases.
The true enemy is parsing something that might change over time, and that's totally unrelated to the regex issue.
tmaly|8 years ago
Recently I replaced this with a xml tokenizer I wrote in Go that can deal with invalid or corrupt xml. On top of this I have used a state machine to make it possible to handle different situations.