top | item 38737585

(no title)

jolynch | 2 years ago

As a long time user and developer of databases, I would suggest isolation failures are not actually the source of most data related bugs. Most bugs I deal with are due to alternative failure modes like:

* We didn't think about how we would retry this operation when something fails or times out (idempotency)

* We didn't put the appropriate checksums in the right place (corruption)

* We didn't handle the load, often due to trying to provide stronger guarantees than the application needs, and went down causing lost operations (performance bottlenecks)

* We deployed bad software to the app or database, causing irreparable corruption that can't be fixed because we already purged the relevant commit/redo logs + snapshots.

I legitimately don't understand the calls for "SERIALIZABLE is the only valid isolation level" - I have not typically (ever that I can recall) seen at-scale production systems pay that cost for writes _and_ reads. Almost all applications I've seen (including banking/payment software) are fine with eventually consistent reads, as long as the staleness period is understood and reasonably bounded in time. Once you move past a single geographic datacenter, serializable writes become extremely expensive unless you can automatically home users to the appropriate leader datacenter, which most engineering teams can't guarantee.

The key is typically not isolation, it's modeling your application in an idempotent fashion that doesn't require isolation to be correct and keeping snapshots and those idempotent operation logs for a good few weeks at minimum. Maybe the Java analogy would be "if you can design it to not need locks, do that".

discuss

order

aaomidi|2 years ago

Serializble is easy to reason about and it also moves the problems with distributed systems to the database where it can more appropriately be handled imo.

It is by no means a silver bullet and depending on your application it may not be the right choice.

nextaccountic|2 years ago

You only benefit from it if you re-fetch data from database every time you need it, and never cache.

If you ever fetch data once and use it locally many times, you are back to handling stale data.