Show HN: SlateDB – An embedded storage engine built on object storage
26 points| riccomini | 1 year ago |github.com
To mitigate high write API costs (PUTs), SlateDB batches writes. Rather than writing every put() call to object storage, MemTables are flushed periodically to object storage as a string-sorted table (SST). The flush interval is configurable.
To mitigate write latency, SlateDB provides an async put method. Clients that prefer strong durability can await on put until the MemTable is flushed to object storage (trading latency for durability). Clients that prefer lower latency can simply ignore the future returned by put.
To mitigate read latency and read API costs (GETs), SlateDB will use standard LSM-tree caching techniques: in-memory block caches, compression, bloom filters, and local SST disk caches.
Reubend|1 year ago
Bottomless storage: yes, but couldn't you theoretically achieve this with plenty of cloud DB services? Amazon Aurora goes up to 128 TB, and once your DB gets to that size, it's likely that you can hire some dedicated engineers to handle more complicated setups.
High durability: yes, but couldn't this be achieves with a "normal" DB that has a read replica using object storage, rather than the entire DB using object storage?
Easy replication: arguably not easier than normal replication, depending on which cloud DB you're considering as an alternative.
roh26it|1 year ago
dangoodmanUT|1 year ago
there's a class of folks who desperately need this. It's the KV equivalent to turbopuffer
dangoodmanUT|1 year ago
For example, if you have a durability flush interval, what is the WAL for? L0 is the WAL now.
riccomini|1 year ago
Basically, we wanted to have WAL writes go on the order of milliseconds, but we wanted L0 SSTs to be larger since they actually service reads.
The architecture page has more detail if you haven't found it yet:
https://slatedb.io/docs/architecture
Already__Taken|1 year ago