(no title)
maypok86 | 2 years ago
1. Yes, S3-FIFO can be implemented using lock-free queues, but the problem is that each write to a filled cache using this design will cause a large number of additional atomic operations not friendly to the processor's cache, while bp-wrapper on the contrary amortizes this load. And reading with frequency update on hot entries can have a bad effect on performance. In many ways this is exactly what the last posts in my discussion with Ben are about (not really about this, but the current problem with otter read speed is caused by a similar problem). https://github.com/Yiling-J/theine-go/issues/29#issuecomment...
2. But the main problem for me is not even that. Lock-free queues work fine as long as you only need to support Get and Set operations, but as soon as you want to add features to your cache, the complexity of the implementation starts to increase, and some features are very hard to add to such a structure. Also, improving the eviction policy is under a big question mark, because not only do you have to think about how to improve the eviction policy, but also how to avoid locks while doing so or how not to slow down the implementation with your improvements. BP-Wrapper has no such problems at all, allows you to use any eviction policy and focus on improving different parts of your cache independently of each other.
1a1a11a|2 years ago
1. why does it require multiple atomic operations at each insert? Our implementation in cachelib only used one CAS operation. I will get back to you with an illustration. Maybe you can show me something I missed.
2. I like the bp-wrapper idea. It is intuitive and can be used with most/all algorithms. Batching does reduce/amortize the overhead of locking for LRU-based algorithms. But IIRC, all the promotions are still serialized. Because the promotions are still protected by the lock, and only one core can promote objects at the same time. Caffeine achieves amazing scalability because it drops all the elements that need to be updated/promoted under high contention, which means it effectively becomes a FIFO. I don't think this is the correct way to claim good scalability.
3. "reading with frequency update on hot entries can have a bad effect on performance" I don't understand this. The frequency of hot entries should quickly reach 1 (if using a one-bit counter) or 3 (if using a two-bit counter) and will not be updated anymore. Why would read and update conflict?
4. Yes, implementing get and set with locking is trivial; adding support for delete requires a bit of work (10s lines) but should be manageable. But why would lock-free queues make other features hard to implement? Can you give an example?
maypok86|2 years ago
2. It's a bit non-obvious here. Caffeine does not lose a single insert, update, delete operation and moreover, keeps track of their correct execution with fantastic scrupulosity. Also the loss of some reads is inherent in many eviction policies. And caffeine's lossy buffers also have a great ability: the number of these buffers is automatically adjusted at runtime based on contention, which minimizes losses.
3. I specifically mentioned here that it's not the same, but it is possible to design a load that will force as many xadd operations as possible and as few load operations as possible. But in principle you can ignore this point, as it is rather artificial.
4. Here I rather mean that all features should live in the lock-free model or somehow be separately integrated into the architecture (which only complicates things). Plus, for example, a person wants to modify the eviction policy a bit, for example, by adding the lfu part to it and then it's not clear how to do it.