top | item 22299625

(no title)

> Actually, I always believed that the internal key-value store that they use would never scale to represent table workloads.

Care to elaborate here? As someone working at that layer of the system, our RocksDB usage is but a blip in any execution trace (as it should be, any network overhead you have given it's a distributed system would dominate single-node key-value perf). That aside, plenty of RDBMS systems are designed such that they sit atop internal key-value stores. See MySQL+InnoDB[0], or MySQL+RocksDB[1] used at facebook.

[1]: https://en.wikipedia.org/wiki/InnoDB

[0]: https://myrocks.io/

discuss

ahachete|6 years ago

Don't get me wrong, both RocksDB and the work done by CCDB is pretty cool.

Yet I still believe that layering a row model as the V of a K-V introduces by definition inefficiencies when accessing columnar data in a way that row stores do, as compared to a pure row storage. Is not that it can't work, but that I believe it can never be as efficient as a more row-oriented storage (say like Postgres).

irfansharif|6 years ago

I have no idea what you're saying. What's a "row-oriented storage" if not storing all the column values of a row in sequence, in contrast to storing all the values in a column across the table in sequence (aka "column store"). What does the fact that it's exposed behind a KV interface have to do with anything? What's "more" about Postgres' "row-orientedness" compare to MySQL?

In case you didn't know, a row [idA, valA, valB, valC] is not stored as [idA: [valA, valB, valC]]. It's more [id/colA: valA, id/colB: valB, id/colC: valC] (modulo caveats around what we call column families[0], where you can have it be more like option (a) if you want). My explanation here is pretty bad, but [1][2] go into more details.

[0]: https://www.cockroachlabs.com/docs/stable/column-families.ht...

[1]: https://www.cockroachlabs.com/blog/sql-in-cockroachdb-mappin...

[2]: https://github.com/cockroachdb/cockroach/blob/master/docs/te...