(no title)
414owen | 2 years ago
On my machine I'm getting 0.244s for `loop-5.x64.s` and 0.422s for your implementation above.
I'm not sure why exactly we're seeing this discrepancy, and for what it's worth your implementation looks faster to me. I guess this is why you need to always benchmark on the hardware you're going to be running the code on...
xoranth|2 years ago
I would have expected yours to be faster given that it needs to execute fewer instructions per loop iteration. Though maybe the CPU can run `adc` on more ports compared to a load from memory?