Has anyone seen max (p100) client latencies of 300 to 400ms but totally normal p99? We see this across almost all our redis clusters on elasticache and have no idea why. CPU usage is tiny. Slowlog shows nothing.
I would guess your problem is probably scheduler based. The default(ish) Linux scheduler operates in 100ms increments, the first use of a client takes 3-4 round-trips. TCP opens, block, request is sent, the client blocks on write, the client attempts to read and blocks on read. If CPU usage is high momentarily, each of these yields to another process and your client isn't scheduled for another 100ms
Are you evicting or deleting large sets (or lists or sorted sets)? We use a Django ORM caching library that adds each resultset's cache key to a set of keys to invalidate when that table is updated – at which point it issues `DEL <set key>` and if that set has grown to hundreds of thousands – or millions! – of keys the main Redis process will block completely for as long as it takes to loop through and evict them.
Is the memory full and evicting? Or do you have a large db with lots of keys with ttls? Redis does a bunch of maintenance stuff on the same thread iirc in the background but not really
GauntletWizard|2 years ago
jontonsoup|2 years ago
nicwolff|2 years ago
jontonsoup|2 years ago
tayo42|2 years ago
jontonsoup|2 years ago
We do expire but we don’t think we have a thundering herd problem with them all happening at the same time.
secondcoming|2 years ago
jontonsoup|2 years ago