It’s a bit confusing to me exactly what went wrong. I think that when you have a redis/valkey cluster with multiple nodes and you use the cluster uri, there must be some kind of load balancer or custom routing. When we would attempt to connect to valkey the connection would look good, but when we would submit commands to it they would never execute. We had written our application so that it would operate with no issue (just slower) if the cache goes down. In this case, connections looked good but no work was actually being done. AWS support suggested we restart the nodes but because they were not responding they never shut down … or at least it took a really long time. They were never able to tell us what actually happened. My guess is that valkey command execution got stuck somehow but was still able to create new connections.
No comments yet.