top | item 36629313

(no title)

fuber2018 | 2 years ago

I took the 64-bit SWAR ('S'IMD-'W'ithin-'A'-'R'egister) road and passed in the string length - the calling code has the length "right there"!!!

Using the original run_switches function, app took 3.554s (average of 10 runs).

With the SWAR-version with the string length passed in, app took 0.117s (average of 10 runs).

That's an overall 27.6x speedup.

discuss

order

fuber2018|2 years ago

If I unroll the main while loop to handle 4x as much each time through the loop in the SWAR-version, the runtime drops to 0.0562s (average 10 runs).

That's an overall 57.5x speedup.

fuber2018|2 years ago

If I convert the unrolled-64-bit SWAR function to use 32-bit chunks instead, average runtime almost doubles, approx. 0.1s now.

Need sleep now.