Why do you loop over haystack multiple times though? If you iterate over long string once and write fixed loop with vowels checks in way that will be friendly to autovectorize optimization it might be faster and more idiomatic C or C++.
Thanks for reminding me, I meant to also benchmark the interchanged version. I updated the file above with the new benchmark results.
The original commenter called `find()` once per vowel, so that's why I benchmarked the regex against the less-idiomatic code.
The interchanged version (loop over haystack outside, over vowels inside) is (mildly) faster than all the regex versions except for short strings & no vowels.
One thing to note is that all the loop versions are not easily generalizable to non-ASCII strings, while the regex version is fairly easily.
I poked that loop in compiler explorer for few minutes and I think its allergic to autovectorization. Lets assume that properly vectorized regexp from proper language in the future wins :)
tylerhou|8 months ago
The original commenter called `find()` once per vowel, so that's why I benchmarked the regex against the less-idiomatic code.
The interchanged version (loop over haystack outside, over vowels inside) is (mildly) faster than all the regex versions except for short strings & no vowels.
One thing to note is that all the loop versions are not easily generalizable to non-ASCII strings, while the regex version is fairly easily.
SleepyMyroslav|8 months ago