(no title)
jolynch | 2 years ago
* We didn't think about how we would retry this operation when something fails or times out (idempotency)
* We didn't put the appropriate checksums in the right place (corruption)
* We didn't handle the load, often due to trying to provide stronger guarantees than the application needs, and went down causing lost operations (performance bottlenecks)
* We deployed bad software to the app or database, causing irreparable corruption that can't be fixed because we already purged the relevant commit/redo logs + snapshots.
I legitimately don't understand the calls for "SERIALIZABLE is the only valid isolation level" - I have not typically (ever that I can recall) seen at-scale production systems pay that cost for writes _and_ reads. Almost all applications I've seen (including banking/payment software) are fine with eventually consistent reads, as long as the staleness period is understood and reasonably bounded in time. Once you move past a single geographic datacenter, serializable writes become extremely expensive unless you can automatically home users to the appropriate leader datacenter, which most engineering teams can't guarantee.
The key is typically not isolation, it's modeling your application in an idempotent fashion that doesn't require isolation to be correct and keeping snapshots and those idempotent operation logs for a good few weeks at minimum. Maybe the Java analogy would be "if you can design it to not need locks, do that".
aaomidi|2 years ago
It is by no means a silver bullet and depending on your application it may not be the right choice.
nextaccountic|2 years ago
If you ever fetch data once and use it locally many times, you are back to handling stale data.