(no title)
414owen | 2 years ago
It achieves 3.88GiB/s
I intentionally didn't go down the route of vectorizing. I wanted to keep the scope of the problem small, and show off the assembly tips and tricks in the post, but maybe there's potential for a future post, where I pad the input string and vectorize the algorithm :)
nwallin|2 years ago
Don't want to pass the string length? That's fine, we can figure that out for ourselves. This code:
Is 27GB/s. With a little bit of blocking: That's ~55GB/s.Anyway, the point is, you're pretty far from the point where you ought to give up on C and dive into assembly.
utopcell|2 years ago
magicalhippo|2 years ago
I've seen plenty of cases where replacing hand-written assembly with C (or similar) lead to a substantial performance increase because the assembly code was written for some old CPU and not the best way of doing things on current CPUs.
rajnathani|2 years ago
SleepyMyroslav|2 years ago
Thank you. I hope people who post random assembly listings on HN written in some extinct ISA will read your posts.