(no title)
jove_
|
2 years ago
As everyone has pointed out, this does not count. Note that the idea that regex can't parse html is specific and proven. What it means is that you can't write an expression that matches both the opening and matching closing tags. There's no way to handle nested tags within a single regex. It's only possible to write a regex that matches up to a finite nesting limit.
im3w1l|2 years ago
See, normally the whole point of parsing something is to get data out right. And the way a regex gets data out is through capture groups. But herein lies the issue, a capture group can only capture one piece of information!
Consider a simple regular language: a non empty sequence of comma separated positive integers. We would like to get the integers out. An attempt
The first group captures the first number, the second group is just something we introduced for the purpose of writing the regex, we don't care about the value. The third (inner) group should ideally capture all the subsequent numbers separately. But it doesn't! If you try to run that regex on 1,2,3,4,5,6,7,8,9 you will find that group 1 matches 1. And group 3 matches 9. Where did all the other numbers go?!So really, you have to give the regex some outside help, maybe an outside loop, maybe splitting on a regex rather than parsing with one. Even for this simple language!
And when you are already doing that, why the step to giving it a bit more help, perhaps a stack, is quite small.
Tainnor|2 years ago