top | item 41249543

Show HN: SlateDB – An embedded storage engine built on object storage

26 points| riccomini | 1 year ago |github.com

SlateDB is an embedded storage engine built as a log-structured merge-tree. Unlike traditional LSM-tree storage engines, SlateDB writes data to object storage (S3, GCS, ABS, MinIO, Tigris, and so on). Leveraging object storage allows SlateDB to provide bottomless storage capacity, high durability, and easy replication. The trade-off is that object storage has a higher latency and higher API cost than local disk.

To mitigate high write API costs (PUTs), SlateDB batches writes. Rather than writing every put() call to object storage, MemTables are flushed periodically to object storage as a string-sorted table (SST). The flush interval is configurable.

To mitigate write latency, SlateDB provides an async put method. Clients that prefer strong durability can await on put until the MemTable is flushed to object storage (trading latency for durability). Clients that prefer lower latency can simply ignore the future returned by put.

To mitigate read latency and read API costs (GETs), SlateDB will use standard LSM-tree caching techniques: in-memory block caches, compression, bloom filters, and local SST disk caches.

7 comments

Reubend|1 year ago

It's a very very cool idea, but I'm still not clear on the main benefits.

Bottomless storage: yes, but couldn't you theoretically achieve this with plenty of cloud DB services? Amazon Aurora goes up to 128 TB, and once your DB gets to that size, it's likely that you can hire some dedicated engineers to handle more complicated setups.

High durability: yes, but couldn't this be achieves with a "normal" DB that has a read replica using object storage, rather than the entire DB using object storage?

Easy replication: arguably not easier than normal replication, depending on which cloud DB you're considering as an alternative.

roh26it|1 year ago

Also wondering if this would become expensive very fast if it ends up using S3 with a large number of PUT calls

dangoodmanUT|1 year ago

if the benefits are not obvious to you, then you're not the target user, or you don't understand what kind of person needs this

there's a class of folks who desperately need this. It's the KV equivalent to turbopuffer

dangoodmanUT|1 year ago

I've been working on something super similar, but some of the arch decisions here are curious considering the clear tradeoffs made.

For example, if you have a durability flush interval, what is the WAL for? L0 is the WAL now.

riccomini|1 year ago

Great question! We started out with the design you described--WAL as L0. But we found that there's a bit of a tension between wanting to have L0 SSTs be larger (and having fewer of them) to reduce metadata size, while we wanted to keep WAL SSTs small and frequent (to reduce async/await latency).

Basically, we wanted to have WAL writes go on the order of milliseconds, but we wanted L0 SSTs to be larger since they actually service reads.

The architecture page has more detail if you haven't found it yet:

https://slatedb.io/docs/architecture

Already__Taken|1 year ago

Seems analogous to putting seaweedfs in front of a cloud S3. Then adding a database. We use(unrelated) Zenoh and Loki keeping state on S3 so it would be interesting to have a KV engine.