(no title)
sujayakar | 1 year ago
it's interesting that the wins for batching start diminishing at 8. I'm curious then how the subsequent optimizations fare with batch size 8 (rather than 128).
smaller batch sizes are nice since it determines how much request throughput we'd need to saturate this system. at batch size 8, we need 1s / ~30ns * 8 = 266M searches per second to fully utilize this algorithm.
the multithreading results are also interesting -- going from 1 to 6 threads only improves overhead by 4x. curious how this fares on a much higher core count machine.
curiouscoding|1 year ago
I suspect that at higher core counts, we can still saturate the full RAM bandwidth with only 4-5 cores, so that the marginal gains with additional cores will be very small. That's good though, because that gives CPU time to work on the bigger problem to determine the right queries, and to deal with the outputs (as long as that is not too memory bound in itself, although it probably is).
Bulat_Ziganshin|1 year ago