(no title)
414owen | 2 years ago
https://owen.cafe/posts/the-same-speed-as-c/
And as others have pointed out, you can tweak the input, then vectorize the algo, if you want to go that route.
I considered this a pedagogical exercise and I sincerely hope nobody will start dropping down to assembly without a very good reason to.
sriku|2 years ago
Tempest1981|2 years ago
romnon|2 years ago
rajnathani|2 years ago
One thought: If the code is rewritten using bit arithmetic, then potentially the result could be even faster as there need not be a pointer look-up.
A bit arithmetic solution would have a mask created for the characters ‘p’ and ‘s’, and then the result could be AND-ed, and then with more bit arithmetic this all 1s value can be translated to a 1 if and only if all the bits are 1. Following which, there would be a no conditional check and simply be both an add and a subtract operation but where the value to be added will only be 1 if the mask for ‘p’ matches and 1 to be subtracted if the mask for ‘s’ matches respectively. I’m not fully sure if this would necessarily be faster than the pointer look-up solution, but it would be interested to try this version of the code and see how fast it performs.
Update: The bit arithmetic could also be done with an XOR on the mask, and following which the ‘popcnt’ x86 instruction could be used to figure out if all are 0 bits.
JohnMakin|2 years ago
jdsalaro|2 years ago