(no title)
axeluser | 7 months ago
2 and 3. The "operations" are both the array's size (actually the nearest power of 2) and lookup operations. I first thought that latency should be equal for both cases, because my benchmarks may be too naive - they test lookup by sequential memory access, i.e., iterating buckets from 0 to n, and iterating from first to last element in each bucket. I'll try to experiment with a more shuffled workload in my free time. Thank you for the question. I've not thought about it deeply. Also, counting mispredictions is a good idea; I'll add this metric to my benchmarks to get a complete picture.
4. I didn't pay much attention to that because the trend was still the same, and even the ratio was similar for different counts of lookups per underlying memory layout. There was just some minor deviation, but I ignored it. Maybe I'll run more granular tests with hardware counters to catch some insights.
5. SIMD, in my opinion, would be an overhead there because for a single lookup in my configuration, I'm using a single 32-bit value, and Cuckoo Filter will touch at max 64 bits (2 lookups). However, I was planning to experiment with batching of lookups; I just haven't reached it yet.
curiouscoding|7 months ago
Yeah; that completely eliminates the cache misses / memory latency you'd have in practice. Of course eliminating that bottleneck is useful if you want to purely optimize CPU speed, but probably not quite as representative of real workloads. Also makes sense then that different array sizes give similar results: streaming tends to be fast anyway, regardless of the array size.
maxlapdev|7 months ago
axeluser|7 months ago