top | item 36623958

(no title)

414owen | 2 years ago

Nice! There's a part two in which I rewrote the C. I got a 12x speedup :)

https://owen.cafe/posts/the-same-speed-as-c/

And as others have pointed out, you can tweak the input, then vectorize the algo, if you want to go that route.

I considered this a pedagogical exercise and I sincerely hope nobody will start dropping down to assembly without a very good reason to.

discuss

order

sriku|2 years ago

Wondering how res += (c=='s')-(c=='p') might do. I sure there is some C undefined behaviour relevant there. Curious but too lazy to check it myself!

Tempest1981|2 years ago

While `false` evaluates to 0, not sure `true` always evaluate to 1 in C... maybe compiler dependent. Maybe add `? 1 : 0`

romnon|2 years ago

ive seen people doing += !!(c=='s')-!!(c=='p') for that

rajnathani|2 years ago

Great post!

One thought: If the code is rewritten using bit arithmetic, then potentially the result could be even faster as there need not be a pointer look-up.

A bit arithmetic solution would have a mask created for the characters ‘p’ and ‘s’, and then the result could be AND-ed, and then with more bit arithmetic this all 1s value can be translated to a 1 if and only if all the bits are 1. Following which, there would be a no conditional check and simply be both an add and a subtract operation but where the value to be added will only be 1 if the mask for ‘p’ matches and 1 to be subtracted if the mask for ‘s’ matches respectively. I’m not fully sure if this would necessarily be faster than the pointer look-up solution, but it would be interested to try this version of the code and see how fast it performs.

Update: The bit arithmetic could also be done with an XOR on the mask, and following which the ‘popcnt’ x86 instruction could be used to figure out if all are 0 bits.

JohnMakin|2 years ago

Thank you for your post and reply but I fear with a post + title like this you may just be chumming the waters.

jdsalaro|2 years ago

What do you mean by "chumming the waters" in this context?