It would be interesting to see this on reddit's workload. The entire system was designed around the cache getting a 95%+ hit rate, because basically anything on front page of the top 1000 subreddits will get the overwhelming majority of traffic, so the cache is mostly filled with that.
In other words, this solves the problem of "one hit wonders" getting out of the cache quickly, but that basically already happened with the reddit workload.
The exception to that was Google, which would scrape old pages, and which is why we shunted them to their own infrastructure and didn't cache their requests. Maybe with this algo, we wouldn't have had to do that.
Wouldn’t one hit wonders still be an issue? They might get evicted relatively fast anyway but assuming an LRU each will still take a cache entry until they go through the entire thing and finally get evicted.
Although if that’s your concern you can probably just add a smaller admission cache in front of the main cache, possibly with a promotion memory.
Caffeine is a gem. Does what it claims, no drama, no scope creep, just works. I've used it in anger multiple times, most notably in Apache Cassandra and DataStax Astra, where it handles massive workloads invisibly, just like you'd want.
Shoutout to author Ben Manes if he sees this -- thanks for the great work!
Years ago I encountered a caching system that I misremembered as being a plugin for nginx and thus was never able to track down again.
It had a clever caching algorithm that favored latency over bandwidth. It weighted hit count versus size, so that given limited space, it would rather keep two small records that had more hits than a large record, so that it could serve more records from cache overall.
For some workloads the payload size is relatively proportional to the cost of the request - for the system of record. But latency and request setup costs do tend to shift that a bit.
But the bigger problem with LRU is that some workloads eventually resemble table scans, and the moment the data set no longer fits into cache, performance falls off a very tall cliff. And not just for that query but now for all subsequent ones as it causes cache misses for everyone else by evicting large quantities of recently used records. So you need to count frequency not just recency.
For every caching algorithm you can design an adversarial workload that will perform poorly with the cache. Your choice of caching algorithm/strategy needs to match your predicted workload. As you're alluding there's also the question of which resource are you trying to optimize for, if you're trying to minimize processing time that might be a little different than optimizing for bandwidth.
You might be interested in this thread [1] where I described an idea for how to incorporate the latency penalty into the eviction decision. A developer even hacked a prototype that showed promise. The problem is that there is not enough variety in the available trace data to be confident that a design isn't overly fit to a particular workload and doesn't generalize. As more data sets become available it will become possible to experiment with ideas and fix unexpected issues until a correct, simple, elegant design emerges.
> However, diving into a new caching approach without a deep understanding of our current system seemed premature
Love love love this - I really enjoy reading articles where people analyze existing high performance systems instead of just going for the new and shiny thing
A thing I worry about a lot is discontinuities in cache behaviour (simple example: let’s say a client polls a list of entries, and downloads each entry from the list one at a time to see if it is different. Obviously this feels like a bit of a silly way for a client to behave. If you have a small lru cache (eg maybe it is partitioned such that partitions are small and all the requests from this client go to the same partition) then there is some threshold size where the client transitions from ~all requests hitting the cache to ~none hitting the cache.)
This is a bit different from some behaviours always being bad for cache (eg a search crawler fetches lots of entries once).
Am I wrong to worry about these kinds of ‘phase transitions’? Should the focus just be on optimising hit rate in the average case?
As the article mentions, Caffeine's approach is to monitor the workload and adapt to these phase changes. This stress test [1] demonstrates shifting back and forth between LRU and MRU request patterns, and the cache reconfiguring itself to maximize the hit rate. Unfortunately most policies are not adaptive or do it poorly.
Thankfully most workloads are a relatively consistent pattern, so it is an atypical worry. The algorithm designers usually have a target scenario, like cdn or database, so they generally skip reporting the low performing workloads. That may work for a research paper, but when providing a library we cannot know what our users workloads are nor should we expect engineers to invest in selecting the optimal algorithm. Caffeine's adaptivity removes this burden and broaden its applicability, and other language ecosystems have been slowly adopting similar ideas in their caching libraries.
I had a team that just did not get my explanations that they had created such a scenario. I had to show them the bus sized “corner case” they had created before they agreed to a more sophisticated cache.
That project was the beginning of the end of my affection for caches. Without very careful discipline that few teams have, once they are added all organic attempts at optimization are greatly complicated. It’s global shared state with all the problems that brings. And if you use it instead of the call stack to pass arguments around (eg passing ID instead of User and making everyone look it up ten times), then your goose really is cooked.
These are exactly the things to worry about in an application that has enough scale for it. My usual approach is to have a wiki page or document to describe these limitations and roughly the order of magnitude where you will encounter them. Then do nothing and let them be until that scale is on the horizon.
There is no point fixing a "this could be slow if we have more than 65535 users" if you currently have 100 users.
I usually add a few pointers to the document on how to increase the scaling limit a bit without major rebuilding (e.g. make this cache size 2x larger). Those are useful as a short term solution during the time needed to build the real next version.
Caching itself is introducing a discontinuity, because whether a request does or does not hit the cache will have vastly different performance profiles (and if not, then the cache may be a bit useless).
I think the only way to approach this problem is statistically, but average is a bad metric. I think you’d care about some high percentile instead.
I tried to reimplement Linux’s algorithm in [1], but I cannot be sure about correctness. They adjust the fixed sizes at construction based on device’s total memory, so it varies if a phone or server. This fast trace simulation in the CI [2] may be informative (see DClock). Segmentation is very common, where algorithms differ by how they promote and how/if they adapt the sizes.
really random question - but what is used to create the images in this blog post? I see this style quite often but never been able to track down what is used.
jedberg|1 year ago
In other words, this solves the problem of "one hit wonders" getting out of the cache quickly, but that basically already happened with the reddit workload.
The exception to that was Google, which would scrape old pages, and which is why we shunted them to their own infrastructure and didn't cache their requests. Maybe with this algo, we wouldn't have had to do that.
masklinn|1 year ago
Although if that’s your concern you can probably just add a smaller admission cache in front of the main cache, possibly with a promotion memory.
adbachman|1 year ago
guessing post bodies and link previews feels too easy.
comment threads? post listings?
was there a lot of nesting?
it sounds like you're describing a whole post--use message, comments, and all--for presentation to a browser or crawler.
(sorry, saw the handle and have so many questions :D)
NovaX|1 year ago
jbellis|1 year ago
Shoutout to author Ben Manes if he sees this -- thanks for the great work!
plandis|1 year ago
NovaX|1 year ago
khana|1 year ago
[deleted]
hinkley|1 year ago
It had a clever caching algorithm that favored latency over bandwidth. It weighted hit count versus size, so that given limited space, it would rather keep two small records that had more hits than a large record, so that it could serve more records from cache overall.
For some workloads the payload size is relatively proportional to the cost of the request - for the system of record. But latency and request setup costs do tend to shift that a bit.
But the bigger problem with LRU is that some workloads eventually resemble table scans, and the moment the data set no longer fits into cache, performance falls off a very tall cliff. And not just for that query but now for all subsequent ones as it causes cache misses for everyone else by evicting large quantities of recently used records. So you need to count frequency not just recency.
YZF|1 year ago
NovaX|1 year ago
[1] https://github.com/ben-manes/caffeine/discussions/1744
thomastay|1 year ago
Love love love this - I really enjoy reading articles where people analyze existing high performance systems instead of just going for the new and shiny thing
dan-robertson|1 year ago
> Caching is all about maximizing the hit ratio
A thing I worry about a lot is discontinuities in cache behaviour (simple example: let’s say a client polls a list of entries, and downloads each entry from the list one at a time to see if it is different. Obviously this feels like a bit of a silly way for a client to behave. If you have a small lru cache (eg maybe it is partitioned such that partitions are small and all the requests from this client go to the same partition) then there is some threshold size where the client transitions from ~all requests hitting the cache to ~none hitting the cache.)
This is a bit different from some behaviours always being bad for cache (eg a search crawler fetches lots of entries once).
Am I wrong to worry about these kinds of ‘phase transitions’? Should the focus just be on optimising hit rate in the average case?
NovaX|1 year ago
Thankfully most workloads are a relatively consistent pattern, so it is an atypical worry. The algorithm designers usually have a target scenario, like cdn or database, so they generally skip reporting the low performing workloads. That may work for a research paper, but when providing a library we cannot know what our users workloads are nor should we expect engineers to invest in selecting the optimal algorithm. Caffeine's adaptivity removes this burden and broaden its applicability, and other language ecosystems have been slowly adopting similar ideas in their caching libraries.
[1] https://github.com/ben-manes/caffeine/wiki/Efficiency#adapti...
hinkley|1 year ago
That project was the beginning of the end of my affection for caches. Without very careful discipline that few teams have, once they are added all organic attempts at optimization are greatly complicated. It’s global shared state with all the problems that brings. And if you use it instead of the call stack to pass arguments around (eg passing ID instead of User and making everyone look it up ten times), then your goose really is cooked.
t0mas88|1 year ago
There is no point fixing a "this could be slow if we have more than 65535 users" if you currently have 100 users.
I usually add a few pointers to the document on how to increase the scaling limit a bit without major rebuilding (e.g. make this cache size 2x larger). Those are useful as a short term solution during the time needed to build the real next version.
ratorx|1 year ago
I think the only way to approach this problem is statistically, but average is a bad metric. I think you’d care about some high percentile instead.
quotemstr|1 year ago
NovaX|1 year ago
[1] https://github.com/ben-manes/caffeine/blob/master/simulator/...
[2] https://github.com/ben-manes/caffeine/actions/runs/130865965...
nighthawk454|1 year ago
https://archive.is/w8yFG
https://web.archive.org/web/20250202094451/https://adriacabe... (images are cached better here)
dstroot|1 year ago
bean-weevil|1 year ago
Lord_Zero|1 year ago
homebrewer|1 year ago
https://github.com/kovidgoyal/kitty/issues — 0.239% vs 0.137%
https://github.com/kovidgoyal/kitty/issues — 0.729% vs 0.317%
https://github.com/kovidgoyal/kitty/graphs/contributors
unknown|1 year ago
[deleted]
jupiterroom|1 year ago
itishappy|1 year ago
https://d2lang.com/
https://www.drawio.com/
For something a bit lower level, try:
https://roughjs.com/
It's what powers the sketch-like look from many of the sites above.
atombender|1 year ago
[1] https://excalidraw.com/
jupiterroom|1 year ago
homarp|1 year ago
unknown|1 year ago
[deleted]
unknown|1 year ago
[deleted]
synthc|1 year ago
unknown|1 year ago
[deleted]
theandrewbailey|1 year ago
[deleted]
DonHopkins|1 year ago
[deleted]
urbandw311er|1 year ago
unification_fan|1 year ago