(no title)
benwills | 9 months ago
This is a minor example, but since you asked...
https://github.com/Cyan4973/xxHash/blob/dev/xxhash.h#L6432
That's an example of a fair number of accumulators that are stored as XXHash goes through its input buffer.
Many modern hash functions store more state/accumulators than they used to. Previous generations of hash functions would often just have one or two accumulators and run through the data. Many modern hash functions might even store multiple wider SIMD variables for better mixing.
And if you're storing enough state that it doesn't fit in your registers, the CPU will put it into the data cache.
Dylan16807|9 months ago
And there's 150+ registers in the actual chip.
But my argument is more that there isn't really an efficient or inefficient way to use L1. So unless you have an enormous amount of state, the question is moot. And if you have so much state you're spilling to L2, that's not when you worry about good or bad cache use, that's a weird bloat problem.