top | item 43991982

(no title)

thaliaarchi | 9 months ago

I find it interesting that MicroPython's `re` module[0] is implemented with a backtracking regular expression engine from re1.5[1], instead of one of the linear-time engines from the same library. (Russ Cox covers the various engines in the excellent blog series[2] which re1 is a companion to.) I figure the choice was made due to binary size or memory constraints, though they're all quite small.

[0]: https://github.com/micropython/micropython/tree/master/lib/r...

[1]: https://github.com/pfalcon/re1.5/tree/v0.8.2

[2]: https://swtch.com/~rsc/regexp/regexp2.html

discuss

order

matt_trentini|9 months ago

Yes, it was chosen for low size and memory constraints. But it is limited in features (like counted repetitions):

https://docs.micropython.org/en/latest/library/re.html

so alternatives to provide additional features have been discussed... Either extending the existing module or swapping to a more feature-rich library. Possibly even doing so for larger micros that can afford the additional flash/memory, though that makes support more challenging.

thaliaarchi|9 months ago

I was talking about the performance, not the feature set. Russ Cox's re1 and the re1.5 fork have several engines for different implementation strategies. re1 was written for primarily pedagogical reasons, so its minimality comes from that.

The engine chosen by MicroPython is vulnerable to catastrophic backtracking and switching to the Pike VM implementation would fix that. Instead of backtracking in the text when the pattern doesn't match, the Pike VM iterates each char in the text only once, visiting the states valid for that position in lock step. Consequently, it allocates a list of “thread”s, proportional in length to the number of states in the pattern (though usually patterns have relatively few states). Many security issues have resulted from regexp denials of service, so this slight memory tradeoff might be worthwhile.

Since recursiveloop.c has been changed by MicroPython, those changes would need to be ported to pike.c. The fixes are small and none of the extra features exploit the backtracking, so this should be easy.