YugaByteDB – A Transactional Database with Cassandra, Redis and PostgreSQL APIs

[+] lukeqsee|7 years ago|reply

Does anyone know a comparison between this and CockroachDB? Or have experience running either in production?

They seem to compare against other databases, but not against Cockroach which seems to be the biggest competitor. I'm looking at implementing a global-scale database cluster with very specific requirements, and YugaByte seems to meet those but a comparison against CockroachDB seems warranted.

Edit: just noticed they do compare features against CockroachDB: https://docs.yugabyte.com/latest/comparisons/#distributed-sq..., but they don't have an in-depth comparison.

[+] atombender|7 years ago|reply

YugaByte only added distributed transactions in the new 1.1 release (previously, it only supported single-row transactions). The architecture is a variation on 2PC and seems sound, but I think it's fair to say that it's early days. Meanwhile, Cockroach and TiDB both have battle-tested implementations.

YugaByte is pretty quirky. Rather than settle on a native data model, they offer several "personalities" that mimic other products (Cassandra, Redis and PostgreSQL), and you can mix and match them. But they've implemented each API with their own set of weird warts. For example, their CQL implementation has CREATE INDEX, but it does not index existing data [1]. You have to either create the secondary index before inserting data, or force a reindex of everything with a dummy UPDATE statement. Who would ship such a product?

More warts/Cassandraisms: UPDATE is actually an upsert; an update that doesn't match a row will insert it. And SELECT ... WHERE expressions can only use AND expressions (!) [2].

Hopefully the forthcoming SQL API should be saner, but it's very limited at the moment. It does not seem to support joins, transactions or indexes, for example. Meanwhile, CockroachDB and TiDB both have rich SQL implementations, including joins and aggregations, with cost-based query optimizers that can take advantage of multiple indexes and table statistics.

It seems more appropriate to compare YugaByte with distributed key/value stores like FoundationDB, Cassandra, Scylla and Redis.

[1] https://docs.yugabyte.com/latest/explore/transactional/secon...

[2] https://docs.yugabyte.com/v1.0/api/cassandra/dml_select/#roo...

[+] manigandham|7 years ago|reply

Cockroach treats the entire installation as one giant cluster, with optional 'locality' feature for each node to distribute data. Yugabyte has regional clusters, which can replicate from a single master write cluster. CRDB is working on ways to get fast local reads for different regions.

Other than that, Yugabyte also supports more protocols like Redis and Cassandra, and enhancements within them. It's really more of a distributed key/value store like FoundationDB but with protocol layers on top included out of the box.

[+] ddorian43|7 years ago|reply

Postgres layer is still extremly beta (has no update support etc.)

But it should become good fast since they use fdw api (storage engine api in pg12) and reuse most of pg code.

[+] rkarthik007|7 years ago|reply

Hi @lukeqsee, we are working on just this, please stay tuned!

[+] lykr0n|7 years ago|reply

When I see databases like this pop up, a part of me wonders why they don't devote effort to develop a plugin for MariaDB or PostgreSQL? Follow in the steps of Citus Data.

[+] morgo|7 years ago|reply

I can't speak to PostgreSQL, but I work on the TiDB team having previously worked on the MySQL team.

Ensuring compatibility from a clean slate is hard, but to make MySQL distributed requires more than the current storage engine API provides. Thus, you would be likely be creating a fork and no longer getting the benefits of having an upstream[1].

The surgery you need to perform on the code would also reduce you from full compatibility (switching from pessimistic to optimistic locking means some statements no longer work). There are some great side benefits to a clean slate too: In TiDB we're able to use modern languages such as Go for the TiDB Server, and Rust for TiKV.

([1] Phrasing this question to upstream: I think they'd like to offer such functionality, but introducing this level of backward incompatibility is hard. They are also very concerned about performance regression bugs, which would be very hard to prevent with all the refactoring required.)

[+] isoos|7 years ago|reply

Several months ago, setup of a Citus Cluster seemed to be a not straightforward, while at the same time I was able to setup and run a CockroachDB cluster in less than half hour with docker. I've checked YugaByte documentation about setup, and CockroachDB still won in operation complexity.

I doubt that the same will be done with PostgreSQL or MariaDB anytime soon. People are begging for a good and easy psql HA cluster setup for ages, and it is just not happening (yeah, they are getting closer and closer), while such DBs fill a need with good results.

[+] rkarthik007|7 years ago|reply

CTO of YugaByte here... this is indeed our plan. We are working with the community on pluggable storage coming out next year, and in the slightly longer term want to make YB a plugin (extension in the case of PG) model. While a reasonable amount of work is needed in order to get there, this is definitely the direction!

[+] unknown|7 years ago|reply

[deleted]

[+] manigandham|7 years ago|reply

Citus is a great product, but it's still fundamentally hamstrung by the foundations of the underlying database.

CockroachDB gives you proper data replication and high-availability at a high density. Postgres doesnt, and requires an entirely separate cluster with fragile replication and manual cutover.

[+] ralfn|7 years ago|reply

Those are a lot of claims. Databases that make half those claim, turn out to be misrepresenting some of them. So far its empty promises on a box.

When can we expect independent 3rd party evaluation of all these claims? The most famous one is Jespen:

https://jepsen.io/analyses

Because this all sounds way too good to be true. What's the catch? How stable is this? Are the claims tested other than in theory?

[+] manigandham|7 years ago|reply

They made a post about testing with Jepsen: https://blog.yugabyte.com/jepsen-testing-on-yugabyte-db-data...

[+] sunnycpp|7 years ago|reply

Design is very similar to Hbase minus the dependency on Zookeeper and HDFS. RocksDB has been very smartly modified to use it as an optimized storage layer.

[+] unknown|7 years ago|reply

[deleted]

31 comments