(no title)
ckdot2
|
8 months ago
"I think now caching is probably best understood as a tool for making software simpler" - that's cute. Caching might be beneficial for many cases, but if it doesn't do one thing then this is simplifying software. There's that famous quote "There are only two hard things in Computer Science: cache invalidation and naming things.", and, sure, it's a bit ironical, but there's some truth in there.
bloppe|8 months ago
The author is comparing "off-the-shelf" caching with custom caching. They're coming from the assumption that you must be caching somehow and arguing that the word "caching" should be understood to mean only particular approaches to the general idea of caching. And obviously the whole point of the general idea is to optimize things.
It's a rhetorical mess
AdieuToLogic|8 months ago
The joke form of this quote goes along the lines of:
:-Ddcminter|8 months ago
there's two hard problems in computer science: we only have one joke and it's not funny.
Apparently⁰ by Philip Scott Bowden¹
⁰ https://martinfowler.com/bliki/TwoHardThings.html
¹ https://x.com/pbowden/status/468855097879830528
AndrewOMartin|8 months ago
> I don't see what's so hard about DNS, it's just cache invalidation and naming things.
SAI_Peregrinus|8 months ago
There are three hard problems in Computer Science:
1) Cache invalidation
2) Naming th3) Concurings
rency
4) Off-by-one errors
heikkilevanto|8 months ago
Likewise, naming things is simple as long as you alone, or a in a small team. But as soon as there are multiple organizations with all their own traditions, it gets tricky. Just witness the eternal flame wars about camelCase, PascalCase, snake_case, kebab-case, and UPPER_CASE. It is almost as hopeless culture clash as Emacs vs Vi vs PowerPoint...
(I leave the off-by-one errors as an exercise for the reader)
TeMPOraL|8 months ago
- The language dimension - choice of words, that are good enough for the purpose, and not confusing. For example, "Manager" is as ambiguous as it gets, it can mean many thing, except we've been using it long enough that there's a more specific shape of meaning[0] for that word in code/program architecture contexts - so you still would use it instead of, say "Coordinator", which would raise all kinds of questions that "Manager" no longer does.
- The epistemological dimension - whether the word you chose correctly names the concept you meant, and whether the concept you meant is actually the right one to describe the thing you're trying to describe. Ultimately, this is the hard thing at the root of philosophy. In practice, it manifests like e.g. choice between digging into some obscure branches of mathematics to correctly name the thing "endofunctor" or something, or calling it "Square" and saying "fuck it, we'll clarify the exceptions in the comments".
--
[0] - I mean "more specific" in the sense it's distinct from the other meanings and somewhat narrow - but still it's fuzzy as heck and you can't describe it fully in words; it's basically tacit knowledge.
gblargg|8 months ago
Pet_Ant|8 months ago
yashasolutions|8 months ago
bell-cot|8 months ago
All software has to name things, and count. Caching (including invalidation) is best understood as a liability. If you can foist it off on your CPU and OS and DB, good for you. Programming whatever you're actually trying to get done is already hard enough.
yxhuvud|8 months ago
They also tend not to be very hard.
Cthulhu_|8 months ago
hatthew|8 months ago
If you have a system with "slow storage" and "fast storage", caching is a way to abstract that away to just "storage".
The author is arguing that the latter is the default way we should think about the concept of caching, which is a valid opinion to have.
Traubenfuchs|8 months ago
We use caching a lot, anything that gets cached can only be written by one service each. The writing services emit cache invalidation messages via SNS that cache users must listen to via SQS, to clear/update their cache.
Alternatively we cache stuff with just a TTL, when immediate cache invalidation is not important.
Where‘s the struggle?
pton_xd|8 months ago
If there are no real consequences for reading stale data, and your writes are infrequent enough, then indeed you're lucky and have a relatively simple problem.
williamdclt|8 months ago
hmottestad|8 months ago
If it doesn’t guarantee delivery, then I believe you will at some point have a client that reads a cached value thinking it’s still valid because the invalidation message got lost in the network.
Cthulhu_|8 months ago
> anything that gets cached can only be written by one service each
How do you guarantee it's only written by one service each? Sounds like locking across network boundaries, which is not easy.
> The writing services emit cache invalidation messages via SNS that cache users must listen to via SQS
SNS and SQS are both nontrivial services (at least you don't have to build / maintain them I suppose) that require training to use effectively and avoid any possible footguns
I think you're underestimating the complexity in your own solution, and you're probably lucky that some of the harder problems have already been solved for you.
motorest|8 months ago
If you don't understand how and why and when eventual consistency is a problem, you will never understand why cache invalidation is hard.
By the sound of your example, you only handle scenarios where naive approaches to cache invalidation serve your needs, and you don't even have to deal with problems caused by spikes to origin servers. That's perfectly fine.
Others do. They understand the meme. You can too if you invest a fee minutes reading up on the topic.
porridgeraisin|8 months ago
graealex|8 months ago
It's used in DNS, which already was an example here. There is no way to be sure clients see an updated value before end of TTL. As a result, you have to use very conservative TTLs. It's very inefficient.
tengbretson|8 months ago
1. Be content with/resilient to the possibility of stale data.
2. Gatekeep all reads and writes (for some subset of the key space) through a single thread.
That's basically it.
unknown|8 months ago
[deleted]
unknown|8 months ago
[deleted]
unknown|8 months ago
[deleted]
EGreg|8 months ago
Both are not that difficult, honestly.
Aren’t there a lot harder things out there
IshKebab|8 months ago
Think about all those times your program isn't building and `make clean` fixes it.
Valodim|8 months ago
You could group these two things into "getting the data model right" as the single hard thing, perhaps that rings more true to you :)
gpderetta|8 months ago
quuxplusone|8 months ago
It's also possible that these used to be the only two hard problems at the time the aphorism was first recorded, but the underlying state of the world has changed since then and the aphorism, as recorded, is no longer current.
TOGoS|8 months ago
A lot of people haven't caught on, and try to cache things using ambiguous names, hence the struggle to invalidate their caches when the meaning changes.
[1] This can be applied even if you don't know the content yet; you just have to unambiguously name the inputs to the function that produces it. You might not know what all the inputs are, and then you have to start adding stuff like "unknown-unknown-2025-07-03T16", but it'll still basically work.
ninalanyon|8 months ago
szundi|8 months ago
[deleted]
whateveracct|8 months ago
and - as the OP suggests - it works best when the cache is a well-defined abstraction with properties and rules about how it works
just because "caching" is mentioned in a meme doesn't mean it can't be true that it can simplify software
BowBun|8 months ago
I have to push back here, I think this is objectively untrue. By definition a system or piece of code on where you add a condition where something else happens (cache) that behaves differently than the uncached path increases complexity.
I'm not saying it's wrong to cache things or that they aren't useful, but I think they absolutely are an abstraction and an optimization at the cost of complexity. Good code bases hide complexity from the devs all the time, so it's not a question of whether you can code it away, but rather how difficult is it to troubleshoot the internals of the system.
fastball|8 months ago
The only scenario where it would simplify software is if a bunch of complex (non-cache) things are being done to improve perf, and a cache would be the simpler solution. But in that case the simplifying step is not adding a cache, it is removing complex things that aren't actually required. After that you add a cache to improve performance (which increases complexity but is worth it for this imagined use-case). But maybe you remove the complex perf shenanigans, and realize that perf is still "good enough" even without a cache, keeping your software even simpler.
jameshart|8 months ago
Look at how CPU cache line behaviors radically change the performance of superficially similar algorithms.
Look at how query performance for a database server drops off a cliff the moment the working cache no longer fits in memory.
Hiding complexity can be a simplification, until you exceed the bounds of the simplification and the complexity you hid demands your attention anyway.
ckdot2|8 months ago
moritzwarhier|8 months ago
But getting them right can easily cross the boundary of purely optimizing performance towards simplifying public API of something. I think this is true.
I'd imagine an involved example where semantics and caching really start to offer a trade-off.
Imagine that somehow querying the actual meteorological data is quite expensive, and consider this badly written pseudocode (equals sign denoting default parameters):
- measureCurrentTemparature()
- retrieveAccurateTemperatureForNanoSecond(momentInTime)
-> cached abstractions which would access cached data:
- getTempearature(moment = now(), tolerance = 1min)
- getCurrentTemperature(tolerance = MIN_TOLERANCE)
I know, reality is much more complicated, and using time (seeing it as quasi-continuous) as a caching parameter is already stretching it so far.
Just a stupid example that came to my mind.
I've bitten myself in the ass with caching rasterized reprentations of images more than once, where the input were SVG images or limited formats that convert to SVG.
aswanson|8 months ago
PaulHoule|8 months ago