(no title)
achanda358 | 1 year ago
Why is this a problem? A typical deployment will have multiple replicas, with (hopefully) small replication lag. Those should be able to be promoted to be the new primary within a minute.
achanda358 | 1 year ago
Why is this a problem? A typical deployment will have multiple replicas, with (hopefully) small replication lag. Those should be able to be promoted to be the new primary within a minute.
porker|1 year ago
What happens within that minute to database writes?
sgarland|1 year ago
achanda358|1 year ago
In my example, I will get a page for large replication lag. But not for an unplanned failover. That will be an alert, but not a page.
zsoltkacsandi|1 year ago
You cannot build operational procedures based on “hope”.
High replication lag occurs for many many reasons (and they are not a rare event, or something that you can prevent). As well as network partitions.
Replication and binary logs can get corrupted, there can be deadlocks, duplicated row errors, etc.
The thing is that database administration is a broad and complicated topic, a small mistake or the lack of understanding how these systems work can easily lead to huge data losses.
anonzzzies|1 year ago
Ah yes, HN. You know there are billions of sites(wp mostly), LoB apps etc that run on 1 mysql/pg/etc instance right? Replicas are not typical and a tiny minority.
movedx|1 year ago
kgeist|1 year ago