(no title)
Spiwux | 2 years ago
Why, exactly, do we need to put a memory cache such as Redis in front of Postgres? Postgres has its own in-memory cache that it updates on reads and writes, right? What makes Postgres' cache so much worse than a dedicated Redis?
tetha|2 years ago
If you have many inserts and deletes on a table, the table will build up tombstones and postgres will eventually be forced to vacuum the table. This doesn't block normal operation, but auto vacuums on large tables can be resource intensive - especially on the storage/io side. And this - at worst - can turn into a resource contention so you either end up with an infinite auto vacuum (because the vacuum can't keep up fast enough), or a severe performance impact on all queries on the system (and since this is your postgres-as-redis, there is a good chance all of the hot paths rely on the cache and get slowed down significantly).
Both of these result in different kinds of fun - either your applications just stop working because postgres is busy cleaning up, or you end up with some horrible table bloat in the future, which will take hours and hours of application downtime to fix, because your drives are fast, but not that fast.
There are ways to work around this, naturally. You could have an expiration key with an index on it, and do "select * from cache order by expiration_key desc limit 1", and throw pg_partman at it to partition the table based on the expiration key, and drop old values by dropping partitions and such... but at some point you start wondering if using a system meant for this kinda workload is easier.
zomnoys|2 years ago
jasonwatkinspdx|2 years ago
By comparison an in memory kv cache is much more streamlined. They basically just need to move bytes from a hash table to a network socket as fast as possible, with no transactional concerns.
The semantics matter as well. PostgreSQL has to assume all data needs to be retained. Memcached can always just throw something away. Redis persistence is best effort with an explicit loss window. That has enormous practical implications on their internals.
So in practical terms this means they're in different universes performance wise. If your workload is compatible with a kv cache semantically, adding memcached to your infrastructure will probably result in a savings overall.
vbezhenar|2 years ago
Sometimes it makes sense, when your workload is not going to hit the limits of your available hardware.
But generally you should be prepared to move everything you can out of the database, so database will not spend any CPU on things that could be computed on another computer. And cache is one of those things. If you can avoid hitting database, by hitting another server, it's a great thing to do.
Of course you should not prematurely optimize. Start simple, hit your database limits, then introduce cache.
dpedu|2 years ago
All of these questions go away or are greatly simplified with redis.
watermelon0|2 years ago
Similarly to Postgres, Redis replication is also async, which means that replicas can be out-of-sync for a brief period of time.
mjdijk|2 years ago
renegade-otter|2 years ago
ww520|2 years ago
stonemetal12|2 years ago
The article talks about using Unlogged tables, they double write speed by forgoing the durability and safety of the WAL. It doesn't mention query speed because it is completely unaffected by the change.
hamandcheese|2 years ago
If you can keep your entire working set in memory, though, then it probably doesn't matter that much.
gjvc|2 years ago
darth_avocado|2 years ago
kapperchino|2 years ago
10000truths|2 years ago
candiddevmike|2 years ago
dvfjsdhgfv|2 years ago
It's not worse, this is just a cheap way to increase performance without having to scale the main instance vertically.
VWWHFSfQ|2 years ago
Maybe you don't want to run the same expensive queries all the time to serve your json API?
There's a million reasons you might want to cache expensive queries somewhere upstream from the actual database.
unknown|2 years ago
[deleted]