Distributed Databases Should Work More Like CDNs

astine|8 years ago

Sooo, being able to distribute data globally is good for performance? Who knew? The thing about distributed systems, including distributed databases is that they need to navigate around the the CAP theorem(Consistency, Availability, Partition Tolerance, pick two, essentially,) and every solution is ultimately a trade off. This article would be a lot more interesting if it showed how CockroachDB made a better trade off than the other solutions listed.

Chickenosaurus|8 years ago

I think the PACELC theorem [1] should be preferred to the CAP theorem as it is more precise. It states in case of a partition (P), there is a trade-off between availability (A) and consistency (C). Else (E), there is a trade-off between latency (L) and consistency (C).

[1] https://en.m.wikipedia.org/wiki/PACELC_theorem

AYBABTME|8 years ago

The article they link at the end explains this: https://www.cockroachlabs.com/docs/stable/transactions.html

marknadal|8 years ago

Also a distributed database engineer, and while I disagree with CockroachDB's CAP Theorem tradeoff decisions, they are definitely well reasoned and principled.

RethinkDB was Master-Slave (strongly consistent) with an amazing developer community and actually survived Aphyr's tests better than most systems.

CockroachDB is also in the Master-Slave (strongly consistent) camp, but more enterprise focused, and therefore will probably not fail, however their Aphyr report worked... but was unfortunately very slow. But hey, being strongly consistent is hard and correctness is a necessary tradeoff from performance.

Other systems, like Cassandra, us (https://github.com/amark/gun), Couch/Pouch, etc. are all in the Master-Master camp, and thus AP not CP. Our argument is that while P holds, realtime sync with Strong Eventual Consistency is good enough yet has all the performance benefits, for everything except for banking. Sure, use Rethink/Cockroach for banking (heck, better yet, Postgres!), but for fun-and-games you can do banking on top of AP systems if you use a CRDT or blockchain (although that kills performance, CRDTs don't) on top.

So yeah, I agree with you about CAP Theorem and stuff, disagree with Cockroach's particular choice - but they do have some pretty great detailed explainers/documentation on their view, and therefore should be treated seriously and not written off.

skybrian|8 years ago

From a bit of reading, it looks like they do it with optimistic concurrency, probably at the expense of write throughput. When there is contention (writing the same keys) there may be a lot of retries.

It seems like a good solution for data that doesn't change too quickly?

philip1209|8 years ago

When I think of distributed databases, I think of Bitcoin - and that's not fast :-)

dantiberian|8 years ago

I love CockroachDB, but feel like this article was a bit misleading in it's claims. I started out writing a comment response, but it got so long that it turned into a blog post: https://danielcompton.net/2018/02/08/add-context-cockroachdb....

cagenut|8 years ago

this is true, and why fastly is kindof like a globally distributed nearly-cache-coherent key value store that people use as a CDN.

there's a great talk on how its done with a custom gossip protocol implementation: https://www.youtube.com/watch?v=HfO_6bKsy_g

loiselleatwork|8 years ago

This is super interesting; thanks for the pointer.

dstroot|8 years ago

CDN: no trade offs. Faster everywhere. More reliable overall

Cockroach DB: trade some performance for geographic redundancy. The trade off may work in your favor - e.g. read heavy workloads (or not).

I plugged in CDB I place of Postgres for some testing this week, was surprised it worked so well.

icebraining|8 years ago

no trade offs

This is almost never the case, and CDNs are no exception. A CDN like Cloudflare that reuses your domain(s) means that they become just a useless point that your dynamic requests have to travel to and from the main server. A CDN that uses its own domains requires extra DNS queries, extra TCP & SSL connection setup, etc, plus it only starts loading when the browser has started processing the HTML.

dasil003|8 years ago

At my previous company we built our own CDN because we were streaming video to a long tail of global users for a limited library. The problem with off-the-shelf CDNs was they couldn't keep our content cached for the long tail, but for the same price we could replicate our limited library to bare metal. Traditional caching could then go on top of that.

This was a huge win for QoS on cache misses, which were a significant portion of our traffic. There are tons of tradeoffs to make this happen which is why we couldn't find an off-the-shelf solution to deliver the same results.

zzzcpan|8 years ago

Unless you build your own DNS routed CDN, it actually reduces reliability. And to make it faster everywhere you trade immediate consistency for eventual consistency.

zzzcpan|8 years ago

> When partitions heal, you might have to make ugly decisions: which version of your customer’s data to you choose to discard? If two partitions received updates, it’s a lose-lose situation.

When partitions heal you simply merge all versions through conflict-free replicated data types. No ugly decisions, no sacrificing neither latency nor consistency. We call it strong eventual consistency [1] nowadays. And it's exactly like CDNs, except more reliable.

I'm wondering, since CockroachDB keeps lying about and attacking eventual consistency in these PR posts, the whole "consistency is worth sacrificing latency" mantra might not work in practice after all. People just don't buy it, they want low latency, they want something like CDNs, something fast and reliable, something that just works. Something that CockroachDB can never deliver.

[1] https://en.wikipedia.org/wiki/Eventual_consistency#Strong_ev...

imtringued|8 years ago

My application is just plain old CRUD.

What if two users want to change e.g. the telephone number of an existing record during a network partition.

There just is no obvious way to merge a telephone number. One of them is correct, the other is incorrect.

Can CRDTs solve my simple problem?

riku_iki|8 years ago

> they want something like CDNs, something fast and reliable, something that just works.

There are other DB systems with such properties "they" can use, e.g. Cassandra. But CockroachDB also gives ACID transactions, which may be important for others.

pdeva1|8 years ago

Am I wrong, or does seems the article not really tell how CDB deals with the latency issue, especially with regards to writes?

If the write has to be consistent and available across multiple regions, it will need to synchronously replicate that write to all the regions, thus incurring the same performance penalty as RDS or any other consistent database.

irfansharif|8 years ago

I doubt this particular article was intended to address how CRDB handles writes cross-region, synchronous replication, etc. You'll find articles touching on varying aspects of what you're looking for if you dig through some of the earlier posts on their blog[1] or their FAQ[2].

[1]: https://www.cockroachlabs.com/blog

[2]: https://www.cockroachlabs.com/docs/stable/frequently-asked-q...

daxfohl|8 years ago

Does GDPR really require all Europeans' data to stay on EU soil without explicit consent?

itcmcgrath|8 years ago

Seems to be a common misconception. Look at EU Model Contract Clauses: e.g. https://cloud.google.com/terms/eu-model-contract-clause

dantiberian|8 years ago

I don't think so, you need consent to process personal data anywhere in the world, and you can only transfer it outside of the EU if there are appropriate safeguards - https://gdpr-info.eu/art-46-gdpr/

unknown|8 years ago

[deleted]

jjevanoorschot|8 years ago

I used CockroachDB for a university project, and while I think it looks very promising, I found the tooling and documentation to be a bit lacking. I wouldn't use it in production yet. However, when CockroachDB matures a bit I can really see it take off.

timkpaine|8 years ago

Imagine a globally replicated and version controlled object store. You could use this for data, assets, source code, anything you like. Would be incredibly useful for both the development side of things, as well as production.

rhencke|8 years ago

Like IPFS?

https://github.com/ipfs/ipfs

prepend|8 years ago

We could call it the World Wide Web.

rakoo|8 years ago

https://docs.datproject.org/#

ddorian43|8 years ago

Why not store the data only where you live ? Worst case meteor strikes and you die with your data.

eip|8 years ago

>Imagine a globally replicated and version controlled object store.

S3?

dman|8 years ago

Timdb ?

supergirl|8 years ago

what a dumb article about nothing. hey, a database replicates stuff but so does a CDN! what a brilliant insight. let's make an article with this title but fill it with marketing text.

hexspeaker|8 years ago

If this interests you, you'll probably enjoy reading Google's paper on Spanner. Cockroachdb was heavily influenced by it.

https://research.google.com/archive/spanner.html

philbogle|8 years ago

As long as we're advertising, consider: https://cloud.google.com/spanner/

russell_h|8 years ago

I'm curious, what sort of read latency is achievable with CockroachDB? Does it support some notion of tunable read consistency in order to achieve lower read latency at the expense of consistency?

susegado|8 years ago

Reads from CockroachDB go through a lease holder for the piece of data being read without needing confirmation from any replicas about consistency, so there is no overhead from replication. But read latency can be affected by writes on the data (conflicts), because it is fundamentally a consistent system (serializable). This is not tunable.

appdrag|8 years ago

Very interesting! Have you used it at big scale for production or not yet?

HumanDrivenDev|8 years ago

The only distributed database I'm familiar with is CouchDB. Can anyone give me a birds eye view on how Cockroach is different?

tyingq|8 years ago

Unfortunately, it's not on the list for the table of feature comparisons, but here's a start: https://www.cockroachlabs.com/docs/stable/cockroachdb-in-com...

Might be helpful if you already know the couchdb answer for each feature.

anon1253|8 years ago

so immutable, like datomic? https://www.infoq.com/articles/Datomic-Information-Model

Or more like federated SPARQL on RDF? https://www.w3.org/TR/sparql11-federated-query/

yuz|8 years ago

ddorian43|8 years ago

CDN post with no performance talk (beside keyword).

Never mention lower performance (even on single-node). Add to that aws-vps with pseudo-cores and spectre-upgrade and good luck with your tps-reports.

vog|8 years ago

Would you mind to elaborate? Your criticism is so condensed that I'm unable to make a lot of sense of it.

83 comments