top | item 43514402

(no title)

sewen | 11 months ago

Indeed, the persistence layer is sensitive, and we do take this pretty serious.

All data is persisted via RocksDB. Not only the materialized state of invocations and journals, but even the log itself uses RocksDB as the storage layer for sequence of events. We do that to benefit from the insane testing and hardening that Meta has done (they run millions of instances). We are currently even trying to understand which operations and code paths Meta uses most, to adopt the code to use those, to get the best-tested paths possible.

The more sensitive part would be the consensus log, which only comes into play if you run in distributed deployments. In a way, that puts us into a similar boat as companies like Neon: having reliably single-node storage engine, but having to build the replication and failover around that. But in that is also the value-add over most databases.

We do actually use Jepsen internally for a lot of testing.

(Side note: Jepsen might be one of the most valuable things that this industry has - the value it adds cannot be overstated)

discuss

sewen|11 months ago

I just realized I missed an important part: The primary durability for the bulk of the state comes from S3 (or similar object store). The periodic snapshots give you like an automatic frequent backup mechanism for free, which in itself is a nice property to have.