top | item 16552283

(no title)

I agree. However, I stumbled into the world of KV stores (like RocksDB, LMDB, LevelDB, etc..) last year, and what is most surprising is that they all stop in the same place. I understand that they should do one thing and one thing well, but it is still disappointing when you have to implement things like replication, sharding, and indexing yourself.

There really aren't even that many DBMS that are KV (like redis) out there to handle it either. They are normally much more complicated (like adding SQL layer on top of it).

discuss

hyc_symas|8 years ago

BerkeleyDB has a replication engine. IMO that's too much of a kitchen-sink approach. Judging by how many of those KV stores utterly fail to store data reliably, that's already a hard enough problem to solve. Focusing on the local storage is a clearly delineated realm of responsibility. Distribution obviously belongs to a higher logical layer.

Indexing requires knowledge of a higher level data model. (Again, BerkeleyDB has built in support for secondary indexing, but last time I checked it was a quite braindead and slow implementation. Faster to build your own indices instead, using the other facilities provided.)

With that said, while a KV store has no logical data model to apply to index generation, it can at least provide primitives for you to construct your own indices. BerkeleyDB and LMDB do this.

Distribution with transaction support may require help from the storage engine (offering something resembling multi-phase commit). BerkeleyDB provides this already; LMDB will probably provide this in 1.0.

An argument could be made that the storage engine should be able to handle replication/distribution even without understanding the higher level/application data model. BerkeleyDB does this with page-level replication. IME this results in gratuitously verbose replication traffic, as every high level operation plus all of its dependent index updates etc. are replicated as low level disk/page offset operations. IMO it makes more sense to leave this to a higher layer because you can just replicate logical operations, and save a huge amount of network overhead.

As for the possible higher layers - antoncohen's response below gives a few examples. There are plenty of higher level DBMSs implemented on top of LMDB, providing replication, sharding, etc.

digikata|8 years ago

I think we're just getting to the point where it may become more common to separate the simpler problem of a single node, non-distributed 'store' with some kv interface below, and then build more complex distributed algorithms in a layer or two above. You can see some of the larger monolithic codebases that had to start by having their own code all the way up and down the stack, but now are starting to experiment with backend store interfaces so you can trade-off some of the strengths and weaknesses of various local store performance areas.

Along the same lines, a few newer codebases for distributed stores seem to be building with those delineations in mind. Another comment brought up Tidb/Tikv for example. Tikv iirc uses RocksDB as its local store.

hyc_symas|8 years ago

"Just getting to the point" ? OpenLDAP has been architected this way for ~20 years. I think the same could be said for MySQL, as well as SQLServer (built on top of ESENT/JET). Large monolithic data stores are an obvious anti-pattern, reflects short-sighted design process.

antoncohen|8 years ago

RocksDB, LMDB, and LevelDB are basically low level disk representations used by the databases that do things like provide network access, sharding, and replication. OpenLDAP (LMDB), MySQL (MyRocks), Bigtable (LevelDB-like), Riak (LevelDB), etc.

Many are or can be used as key-value stores. MySQL actually has a memcached compatible KV store, using InnoDB for storage. Postgres has HStore. A lot of the distributed databases roughly fall into the category of KV stores: HBase, Riak, Cassandra, DynamoDB, etc.