(no title)
isaacimagine | 9 months ago
The speedup was surprising in the sense that the bit-level parallelism fell out naturally: that 64× speedup alone was unexpected and pretty sweet. There's likely still a lot of speed left on the table. I just did the bare minimum to get the C code working: it's single-threaded, there's no vectorization, lots of register spilling, etc. Imagine the speedup you'd get running the circuit on e.g. an FPGA.
But no, it was not surprising in the sense that yeah, multiplying billions of floats is going to be much slower than a handful of parallel bitwise ops. Physics is physics, doesn't matter how good your optimizer is.
jgord|9 months ago
ps. superb writeup and project
isaacimagine|9 months ago