What's the big deal about embedded key-value databases?

[+] mprovost|3 years ago|reply

I feel like this is missing any mention of the history of KV stores. Unix came with an embedded database (dbm) from the early days (1979) [0] which was rewritten at Berkeley into the more popular bdb in the 80s. [1] Sendmail was one of the more common programs that used it. And then when djb built his replacement for sendmail, qmail, he invented cdb. [2]

[0] https://en.wikipedia.org/wiki/DBM_(computing)

[1] https://en.wikipedia.org/wiki/Berkeley_DB

[2] https://cr.yp.to/cdb.html

[+] Xeoncross|3 years ago|reply

I highly recommend people comfortable with Go checkout the building blocks at https://github.com/thomasjungblut/go-sstables

This codebase shows how SSTables, WAL, memtables, recordio, skiplists, segment files, and other storage engine components work in a digestible way. Includes a demo database showing how it all comes together to make a RocksDB / LevelDB competitor (not really).

[+] artificial|3 years ago|reply

Very cool! In a similar vein Distributed Services with Go [0] works through SST creating a KV store. I found it helpful for working with BadgerDB [1].

[0] https://pragprog.com/titles/tjgo/distributed-services-with-g...

[1] https://github.com/dgraph-io/badger

[+] tjungblut|3 years ago|reply

Thank you! And thanks for all the stargazers :) Let me know if you have any issues, happy to help and fix things if necessary.

[+] Adiqq|3 years ago|reply

Honestly, I'm still not sure, why would I use something like RocksDB instead or in addition to plain PostgreSQL/MongoDB/Redis instances.

I don't work with a lot of data, but typically my decisions base on basic factors and purpose:

PostgreSQL - SQL, structured data, cannot scale horizontally

MongoDB - NoSQL, unstructured data

Redis - key-value, distributed cache

I get it that you can replace storage engine and you can theoretically get more performance, but in practice compatibility and standardization is more important, because a lot of products (including third-party) will already use PostgreSQL/MongoDB/Redis, so it's no-brainer to use it as well for your solution.

However for me to pick RocksDB or some other, new, shining database/storage engine, there would have to be more compelling reasons.

[+] jzelinskie|3 years ago|reply

Unless you are building a database, these embedded KV store libraries are less likely to be the best solution the job. If you are considering them for an app that isn't a database, you should also take a long, hard look at SQLite first.

What's also interesting is the trend of newer distributed "database systems" like Vitess[0] or SpiceDB[1] that forego embedded KV stores and instead reuse existing SQL databases as their "embedded database". Vitess leverages MySQL and SpiceDB leverages MySQL, PostgreSQL, CockroachDB, or Spanner. Systems built this way get to leverage many high-level features from existing databases systems such that they can focus on innovating in even higher-level functionality. In the case of Vitess, it's scaling, distributing, and schema management of MySQL. In the case of SpiceDB, it's building a database specifically optimized for querying access control data in a way that can coordinate with causality across multiple services.

[0]: https://github.com/vitessio/vitess

[1]: https://github.com/authzed/spicedb

[+] Xeoncross|3 years ago|reply

Like S3 or Redis, RocksDB is much more performant when you don't need the query engine and want to have highly compact storage with fast lookups and high write throughput.

Storage engines are different levels of complexity based on the query requirements. Simple K/V stores can run circles around Postgres/MySQL as long as you don't need the extra features.

[+] zarzavat|3 years ago|reply

In your list RocksDB is most like Redis, but even faster because the data doesn't have to leave the process.

Think of it as a high performance sports car like a Ferrari. It's not good at taking the kids to school or buying groceries. But if you need to prioritise performance at the expense of all other considerations then it's exactly what you need.

[+] eis|3 years ago|reply

A few more entries that might be of interest:

  * DynamoDB and the Dynamo KV store
  * LMDB (embedded kv)
  * Dgraph (distributed graph db) and its embedded kv store BadgerDB

[+] lacker|3 years ago|reply

IMO it's just confusing to call both, say, RocksDB and MySQL "databases". They sit at different levels of the stack and it is easier to just think of them as entirely different things, your "SQL database" and your "storage engine". So your stack looks like

Application

|

MySQL

|

RocksDB

|

Filesystem

In general the MySQL layer is doing all the convenient stuff for application developers like supporting different queries and datatypes. The RocksDB layer is optimizing for performance metrics like throughput and reliability and just treats data as sequences of bytes.

[+] lcnPylGDnU4H9OF|3 years ago|reply

Actually, this helps a lot. I'd never heard of RocksDB and I'm barely familiar with InnoDB and hopefully I am not wrong to compare the two.

[+] tomhallett|3 years ago|reply

100% agreed. TIL that mysql uses RocksDB under the hood.

Here's another example of a realtime database which uses RocksDB under the hood: https://rockset.com/blog/how-we-use-rocksdb-at-rockset/

[+] jeffbee|3 years ago|reply

I think the use of bare RocksDB is more common than the use of MyRocks.

[+] unknown|3 years ago|reply

[deleted]

[+] rajko_rad|3 years ago|reply

Two more examples to check out: Yugabyte also persists with rocksDB https://www.yugabyte.com/blog/how-we-built-a-high-performanc...

And this is very cool, distributed SQLite with FDB: https://univalence.me/posts/mvsqlite

[+] eatonphil|3 years ago|reply

Thank you, edited to include Yugabyte!

[+] samsquire|3 years ago|reply

With RockSet's converged indexes and an SQL query optimiser you can build an SQL database.

https://rockset.com/blog/converged-indexing-the-secret-sauce...

Rockset's converged indexes + denormalisation means you can have fast querying.

[+] aviramha|3 years ago|reply

Great article! One cool thing about RocksDB it's actually even used in other KV databases such as Redis on Flash https://redis.com/blog/hood-redis-enterprise-flash-database-...

[+] dboreham|3 years ago|reply

The article misses the point. All data storage and query systems end up architected in layers. Upper layers deal with higher abstractions (objects, rows, whatever). Lower layers deal with simpler functions, closer to the hardware. The upper layers are consumers of the lower layers. This is where "embedded KV stores" like LevelDB, RocksDB, etc come from. They began as the embedded storage layer for some bigger thing. Every product you think of as a database or document store is built like this, including MySQL and PostgreSQL and Oracle. Such a storage layer, shipped as an independent library, is how you (or anyone) builds your own database-ish thing. That's what the article should say.

The list of examples are odd. For instance MongoRocks is cited for using RocksDB, but actual stock MongoDB uses Wired Tiger, which isn't mentioned.

Disclosure: I played a part in the late-beginning of this space when Netscape funded Sleepycat to develop BerkeleyDB. dbm and ndbm existed beforehand, but BerkeleyDB used in LDAP servers is I think the genesis point for this pattern as it exists today.

[+] eatonphil|3 years ago|reply

Yup, FB's ZippyDB [0] is another example mentioned in the article.

[0] https://engineering.fb.com/2021/08/06/core-data/zippydb/

Edit: I've added Redis Enterprise Flash to the list now. Thanks!

[+] ramoz|3 years ago|reply

Should see a rise in embedded KV popularity in correlation with ML applications. Storing embeddings in something like leveldb in formats such as flatbuffer offer high-performance solutions for online prediction (i.e. for mapping business values to their embedding format on the fly to send off to some model for inference).

[+] jupp0r|3 years ago|reply

Would that be on mobile devices for offline usage? I'm thinking that for typical backend use cases one would use a dedicated key value store service, right?

[+] porker|3 years ago|reply

I've heard this a lot recently about storing embeddings. As someone who has dabbled in ML I don't understand what it means. Can you point me to a good overview of the topic please?

[+] tristan957|3 years ago|reply

I work on a storage engine at $dayJob. We have created a connector for MongoDB, although for a very ancient version. We are currently working with $cloudProvider to use our storage engine in their cloud DBaaS offerings.

This field is pretty interesting when you're talking about performance vs space amp vs write amp vs read amp.

[+] adammarples|3 years ago|reply

Plug for my python dict wrapper https://github.com/adammarples/rocksdbdict

[+] kefir|3 years ago|reply

Apache Ignite 3 also uses RocksDB as a pluggable storage https://www.gridgain.com/resources/blog/apache-ignite-3-alph...

[+] eatonphil|3 years ago|reply

Thanks! Adding this.

[+] rad_gruchalski|3 years ago|reply

This is a good read. By the way, Kafka Streams is also built on top of RocksDB. Not strictly a database but relevant to a certain extent.

[+] x3n0ph3n3|3 years ago|reply

My team has a use-case that involves a precomputed RocksDB database saved on an AWS EFS volume that is mounted on a lambda with 100's-1000's of invocations per second. It allows for some extremely fast querying of relatively static data. Another process is responsible for periodically updating the database and writing it back to the EFS volume.

[+] didgetmaster|3 years ago|reply

I am building a general-purpose data management system called Didgets (https://didgets.com/) that extensively uses KV stores that I invented. Since it was primarily designed to be a file system replacement, I used them for attaching contextual meta-data tags to file objects.

My whole container started to look like a sparsely populated relational table where every row/column intersection could have multiple values (e.g. a photo could have a tag for every person in the picture attached). I started experimenting with using the KV stores as columns to form regular relational tables.

It turns out that it was relatively easy and was extremely fast. I started building tables with 50+ million rows and many columns and performing queries against them. Benchmarking the system against other databases revealed that it was very fast (and didn't need separate indexes to accomplish this).

Here is a video showing how it does a bunch of queries 10x faster than the same data stored in a highly indexed table in Postgres: https://www.youtube.com/watch?v=OVICKCkWMZE

[+] LAC-Tech|3 years ago|reply

When I read about event sourcing, my mind immediately went to how that would map to a K/V database. Has anyone done this in production?

Also - no mention of LMDB? RocksDB and LMDB feel like the ones that stand out in that field - levelDB definitely had a reputation for corrupting data.

[+] legulere|3 years ago|reply

The article explains how you do primary key indices with key-value-stores. But how do you do secondary indexes?

[+] morelisp|3 years ago|reply

"Time is a flat circle." - someone at Sleepycat, probably.

[+] NetOpWibby|3 years ago|reply

You should add RethinkDB! I moved to it from MongoDB years ago.

[+] orthecreedence|3 years ago|reply

Are you still using it? How is the pace going on the community-supported version? I stopped using it after the company folded, but I do kind of miss it. Definitely one of the more interesting designs, and light years beyond what MongoDB was at the time.

[+] eatonphil|3 years ago|reply

No I don't think that's relevant. They implement their own btree it seems [0].

They don't use a key-value store library.

I know it's a bit of a fine line. But I'm talking about standalone libraries people embed across different applications/databases. That's what RocksDB/LevelDB/Pebble are.

[0] https://github.com/rethinkdb/rethinkdb/tree/v2.4.x/src/btree

[+] jeffbee|3 years ago|reply

RethinkDB is utterly defunct as a project, has not had a substantive release in years, and in my experience just flat out doesn't work. And let's don't even discuss Mongo. Asking yourself to choose between these is like selecting your favorite brand of thumbtack to step on.

[+] eis|3 years ago|reply

TiKV is not an embedded key-value store, it is distributed.

[+] eatonphil|3 years ago|reply

Thanks! Fixed and attributed you at the end.

70 comments