I feel like this is missing any mention of the history of KV stores. Unix came with an embedded database (dbm) from the early days (1979) [0] which was rewritten at Berkeley into the more popular bdb in the 80s. [1] Sendmail was one of the more common programs that used it. And then when djb built his replacement for sendmail, qmail, he invented cdb. [2]
This codebase shows how SSTables, WAL, memtables, recordio, skiplists, segment files, and other storage engine components work in a digestible way. Includes a demo database showing how it all comes together to make a RocksDB / LevelDB competitor (not really).
I get it that you can replace storage engine and you can theoretically get more performance, but in practice compatibility and standardization is more important, because a lot of products (including third-party) will already use PostgreSQL/MongoDB/Redis, so it's no-brainer to use it as well for your solution.
However for me to pick RocksDB or some other, new, shining database/storage engine, there would have to be more compelling reasons.
Unless you are building a database, these embedded KV store libraries are less likely to be the best solution the job. If you are considering them for an app that isn't a database, you should also take a long, hard look at SQLite first.
What's also interesting is the trend of newer distributed "database systems" like Vitess[0] or SpiceDB[1] that forego embedded KV stores and instead reuse existing SQL databases as their "embedded database". Vitess leverages MySQL and SpiceDB leverages MySQL, PostgreSQL, CockroachDB, or Spanner. Systems built this way get to leverage many high-level features from existing databases systems such that they can focus on innovating in even higher-level functionality. In the case of Vitess, it's scaling, distributing, and schema management of MySQL. In the case of SpiceDB, it's building a database specifically optimized for querying access control data in a way that can coordinate with causality across multiple services.
Like S3 or Redis, RocksDB is much more performant when you don't need the query engine and want to have highly compact storage with fast lookups and high write throughput.
Storage engines are different levels of complexity based on the query requirements. Simple K/V stores can run circles around Postgres/MySQL as long as you don't need the extra features.
In your list RocksDB is most like Redis, but even faster because the data doesn't have to leave the process.
Think of it as a high performance sports car like a Ferrari. It's not good at taking the kids to school or buying groceries. But if you need to prioritise performance at the expense of all other considerations then it's exactly what you need.
IMO it's just confusing to call both, say, RocksDB and MySQL "databases". They sit at different levels of the stack and it is easier to just think of them as entirely different things, your "SQL database" and your "storage engine". So your stack looks like
Application
|
MySQL
|
RocksDB
|
Filesystem
In general the MySQL layer is doing all the convenient stuff for application developers like supporting different queries and datatypes. The RocksDB layer is optimizing for performance metrics like throughput and reliability and just treats data as sequences of bytes.
The article misses the point. All data storage and query systems end up architected in layers. Upper layers deal with higher abstractions (objects, rows, whatever). Lower layers deal with simpler functions, closer to the hardware. The upper layers are consumers of the lower layers. This is where "embedded KV stores" like LevelDB, RocksDB, etc come from. They began as the embedded storage layer for some bigger thing. Every product you think of as a database or document store is built like this, including MySQL and PostgreSQL and Oracle. Such a storage layer, shipped as an independent library, is how you (or anyone) builds your own database-ish thing. That's what the article should say.
The list of examples are odd. For instance MongoRocks is cited for using RocksDB, but actual stock MongoDB uses Wired Tiger, which isn't mentioned.
Disclosure: I played a part in the late-beginning of this space when Netscape funded Sleepycat to develop BerkeleyDB. dbm and ndbm existed beforehand, but BerkeleyDB used in LDAP servers is I think the genesis point for this pattern as it exists today.
Should see a rise in embedded KV popularity in correlation with ML applications. Storing embeddings in something like leveldb in formats such as flatbuffer offer high-performance solutions for online prediction (i.e. for mapping business values to their embedding format on the fly to send off to some model for inference).
Would that be on mobile devices for offline usage? I'm thinking that for typical backend use cases one would use a dedicated key value store service, right?
I've heard this a lot recently about storing embeddings. As someone who has dabbled in ML I don't understand what it means. Can you point me to a good overview of the topic please?
I work on a storage engine at $dayJob. We have created a connector for MongoDB, although for a very ancient version. We are currently working with $cloudProvider to use our storage engine in their cloud DBaaS offerings.
This field is pretty interesting when you're talking about performance vs space amp vs write amp vs read amp.
My team has a use-case that involves a precomputed RocksDB database saved on an AWS EFS volume that is mounted on a lambda with 100's-1000's of invocations per second. It allows for some extremely fast querying of relatively static data. Another process is responsible for periodically updating the database and writing it back to the EFS volume.
I am building a general-purpose data management system called Didgets (https://didgets.com/) that extensively uses KV stores that I invented. Since it was primarily designed to be a file system replacement, I used them for attaching contextual meta-data tags to file objects.
My whole container started to look like a sparsely populated relational table where every row/column intersection could have multiple values (e.g. a photo could have a tag for every person in the picture attached). I started experimenting with using the KV stores as columns to form regular relational tables.
It turns out that it was relatively easy and was extremely fast. I started building tables with 50+ million rows and many columns and performing queries against them. Benchmarking the system against other databases revealed that it was very fast (and didn't need separate indexes to accomplish this).
Here is a video showing how it does a bunch of queries 10x faster than the same data stored in a highly indexed table in Postgres: https://www.youtube.com/watch?v=OVICKCkWMZE
Are you still using it? How is the pace going on the community-supported version? I stopped using it after the company folded, but I do kind of miss it. Definitely one of the more interesting designs, and light years beyond what MongoDB was at the time.
No I don't think that's relevant. They implement their own btree it seems [0].
They don't use a key-value store library.
I know it's a bit of a fine line. But I'm talking about standalone libraries people embed across different applications/databases. That's what RocksDB/LevelDB/Pebble are.
RethinkDB is utterly defunct as a project, has not had a substantive release in years, and in my experience just flat out doesn't work. And let's don't even discuss Mongo. Asking yourself to choose between these is like selecting your favorite brand of thumbtack to step on.
[+] [-] mprovost|3 years ago|reply
[0] https://en.wikipedia.org/wiki/DBM_(computing)
[1] https://en.wikipedia.org/wiki/Berkeley_DB
[2] https://cr.yp.to/cdb.html
[+] [-] Xeoncross|3 years ago|reply
This codebase shows how SSTables, WAL, memtables, recordio, skiplists, segment files, and other storage engine components work in a digestible way. Includes a demo database showing how it all comes together to make a RocksDB / LevelDB competitor (not really).
[+] [-] artificial|3 years ago|reply
[0] https://pragprog.com/titles/tjgo/distributed-services-with-g...
[1] https://github.com/dgraph-io/badger
[+] [-] tjungblut|3 years ago|reply
[+] [-] Adiqq|3 years ago|reply
I don't work with a lot of data, but typically my decisions base on basic factors and purpose:
PostgreSQL - SQL, structured data, cannot scale horizontally
MongoDB - NoSQL, unstructured data
Redis - key-value, distributed cache
I get it that you can replace storage engine and you can theoretically get more performance, but in practice compatibility and standardization is more important, because a lot of products (including third-party) will already use PostgreSQL/MongoDB/Redis, so it's no-brainer to use it as well for your solution.
However for me to pick RocksDB or some other, new, shining database/storage engine, there would have to be more compelling reasons.
[+] [-] jzelinskie|3 years ago|reply
What's also interesting is the trend of newer distributed "database systems" like Vitess[0] or SpiceDB[1] that forego embedded KV stores and instead reuse existing SQL databases as their "embedded database". Vitess leverages MySQL and SpiceDB leverages MySQL, PostgreSQL, CockroachDB, or Spanner. Systems built this way get to leverage many high-level features from existing databases systems such that they can focus on innovating in even higher-level functionality. In the case of Vitess, it's scaling, distributing, and schema management of MySQL. In the case of SpiceDB, it's building a database specifically optimized for querying access control data in a way that can coordinate with causality across multiple services.
[0]: https://github.com/vitessio/vitess
[1]: https://github.com/authzed/spicedb
[+] [-] Xeoncross|3 years ago|reply
Storage engines are different levels of complexity based on the query requirements. Simple K/V stores can run circles around Postgres/MySQL as long as you don't need the extra features.
[+] [-] zarzavat|3 years ago|reply
Think of it as a high performance sports car like a Ferrari. It's not good at taking the kids to school or buying groceries. But if you need to prioritise performance at the expense of all other considerations then it's exactly what you need.
[+] [-] eis|3 years ago|reply
[+] [-] lacker|3 years ago|reply
Application
|
MySQL
|
RocksDB
|
Filesystem
In general the MySQL layer is doing all the convenient stuff for application developers like supporting different queries and datatypes. The RocksDB layer is optimizing for performance metrics like throughput and reliability and just treats data as sequences of bytes.
[+] [-] lcnPylGDnU4H9OF|3 years ago|reply
[+] [-] tomhallett|3 years ago|reply
Here's another example of a realtime database which uses RocksDB under the hood: https://rockset.com/blog/how-we-use-rocksdb-at-rockset/
[+] [-] jeffbee|3 years ago|reply
[+] [-] unknown|3 years ago|reply
[deleted]
[+] [-] rajko_rad|3 years ago|reply
And this is very cool, distributed SQLite with FDB: https://univalence.me/posts/mvsqlite
[+] [-] eatonphil|3 years ago|reply
[+] [-] samsquire|3 years ago|reply
https://rockset.com/blog/converged-indexing-the-secret-sauce...
Rockset's converged indexes + denormalisation means you can have fast querying.
[+] [-] aviramha|3 years ago|reply
[+] [-] dboreham|3 years ago|reply
The list of examples are odd. For instance MongoRocks is cited for using RocksDB, but actual stock MongoDB uses Wired Tiger, which isn't mentioned.
Disclosure: I played a part in the late-beginning of this space when Netscape funded Sleepycat to develop BerkeleyDB. dbm and ndbm existed beforehand, but BerkeleyDB used in LDAP servers is I think the genesis point for this pattern as it exists today.
[+] [-] eatonphil|3 years ago|reply
[0] https://engineering.fb.com/2021/08/06/core-data/zippydb/
Edit: I've added Redis Enterprise Flash to the list now. Thanks!
[+] [-] ramoz|3 years ago|reply
[+] [-] jupp0r|3 years ago|reply
[+] [-] porker|3 years ago|reply
[+] [-] tristan957|3 years ago|reply
This field is pretty interesting when you're talking about performance vs space amp vs write amp vs read amp.
[+] [-] adammarples|3 years ago|reply
[+] [-] kefir|3 years ago|reply
[+] [-] eatonphil|3 years ago|reply
[+] [-] rad_gruchalski|3 years ago|reply
[+] [-] x3n0ph3n3|3 years ago|reply
[+] [-] didgetmaster|3 years ago|reply
My whole container started to look like a sparsely populated relational table where every row/column intersection could have multiple values (e.g. a photo could have a tag for every person in the picture attached). I started experimenting with using the KV stores as columns to form regular relational tables.
It turns out that it was relatively easy and was extremely fast. I started building tables with 50+ million rows and many columns and performing queries against them. Benchmarking the system against other databases revealed that it was very fast (and didn't need separate indexes to accomplish this).
Here is a video showing how it does a bunch of queries 10x faster than the same data stored in a highly indexed table in Postgres: https://www.youtube.com/watch?v=OVICKCkWMZE
[+] [-] LAC-Tech|3 years ago|reply
Also - no mention of LMDB? RocksDB and LMDB feel like the ones that stand out in that field - levelDB definitely had a reputation for corrupting data.
[+] [-] legulere|3 years ago|reply
[+] [-] morelisp|3 years ago|reply
[+] [-] NetOpWibby|3 years ago|reply
[+] [-] orthecreedence|3 years ago|reply
[+] [-] eatonphil|3 years ago|reply
They don't use a key-value store library.
I know it's a bit of a fine line. But I'm talking about standalone libraries people embed across different applications/databases. That's what RocksDB/LevelDB/Pebble are.
[0] https://github.com/rethinkdb/rethinkdb/tree/v2.4.x/src/btree
[+] [-] jeffbee|3 years ago|reply
[+] [-] eis|3 years ago|reply
[+] [-] eatonphil|3 years ago|reply