(no title)
jorgeortiz85 | 10 years ago
This is one of our few remaining unsharded databases (legacy problems...), so we can't easily canary a fraction of serving capacity. However, one clear remediation we can implement easily is to have our tooling change a replica first, failover to it as primary, and, if problems are detected, quickly fail back to the healthy former primary.
Lesson learned. We'll be doing a review of all of our database tooling to make sure changes are always canaried or easily reversible.
No comments yet.