Sooo, being able to distribute data globally is good for performance? Who knew? The thing about distributed systems, including distributed databases is that they need to navigate around the the CAP theorem(Consistency, Availability, Partition Tolerance, pick two, essentially,) and every solution is ultimately a trade off. This article would be a lot more interesting if it showed how CockroachDB made a better trade off than the other solutions listed.
I think the PACELC theorem [1] should be preferred to the CAP theorem as it is more precise. It states in case of a partition (P), there is a trade-off between availability (A) and consistency (C). Else (E), there is a trade-off between latency (L) and consistency (C).
Also a distributed database engineer, and while I disagree with CockroachDB's CAP Theorem tradeoff decisions, they are definitely well reasoned and principled.
RethinkDB was Master-Slave (strongly consistent) with an amazing developer community and actually survived Aphyr's tests better than most systems.
CockroachDB is also in the Master-Slave (strongly consistent) camp, but more enterprise focused, and therefore will probably not fail, however their Aphyr report worked... but was unfortunately very slow. But hey, being strongly consistent is hard and correctness is a necessary tradeoff from performance.
Other systems, like Cassandra, us (https://github.com/amark/gun), Couch/Pouch, etc. are all in the Master-Master camp, and thus AP not CP. Our argument is that while P holds, realtime sync with Strong Eventual Consistency is good enough yet has all the performance benefits, for everything except for banking. Sure, use Rethink/Cockroach for banking (heck, better yet, Postgres!), but for fun-and-games you can do banking on top of AP systems if you use a CRDT or blockchain (although that kills performance, CRDTs don't) on top.
So yeah, I agree with you about CAP Theorem and stuff, disagree with Cockroach's particular choice - but they do have some pretty great detailed explainers/documentation on their view, and therefore should be treated seriously and not written off.
From a bit of reading, it looks like they do it with optimistic concurrency, probably at the expense of write throughput. When there is contention (writing the same keys) there may be a lot of retries.
It seems like a good solution for data that doesn't change too quickly?
This is almost never the case, and CDNs are no exception. A CDN like Cloudflare that reuses your domain(s) means that they become just a useless point that your dynamic requests have to travel to and from the main server. A CDN that uses its own domains requires extra DNS queries, extra TCP & SSL connection setup, etc, plus it only starts loading when the browser has started processing the HTML.
At my previous company we built our own CDN because we were streaming video to a long tail of global users for a limited library. The problem with off-the-shelf CDNs was they couldn't keep our content cached for the long tail, but for the same price we could replicate our limited library to bare metal. Traditional caching could then go on top of that.
This was a huge win for QoS on cache misses, which were a significant portion of our traffic. There are tons of tradeoffs to make this happen which is why we couldn't find an off-the-shelf solution to deliver the same results.
Unless you build your own DNS routed CDN, it actually reduces reliability. And to make it faster everywhere you trade immediate consistency for eventual consistency.
> When partitions heal, you might have to make ugly decisions: which version of your customer’s data to you choose to discard? If two partitions received updates, it’s a lose-lose situation.
When partitions heal you simply merge all versions through conflict-free replicated data types. No ugly decisions, no sacrificing neither latency nor consistency. We call it strong eventual consistency [1] nowadays. And it's exactly like CDNs, except more reliable.
I'm wondering, since CockroachDB keeps lying about and attacking eventual consistency in these PR posts, the whole "consistency is worth sacrificing latency" mantra might not work in practice after all. People just don't buy it, they want low latency, they want something like CDNs, something fast and reliable, something that just works. Something that CockroachDB can never deliver.
> they want something like CDNs, something fast and reliable, something that just works.
There are other DB systems with such properties "they" can use, e.g. Cassandra. But CockroachDB also gives ACID transactions, which may be important for others.
Am I wrong, or does seems the article not really tell how CDB deals with the latency issue, especially with regards to writes?
If the write has to be consistent and available across multiple regions, it will need to synchronously replicate that write to all the regions, thus incurring the same performance penalty as RDS or any other consistent database.
I doubt this particular article was intended to address how CRDB handles writes cross-region, synchronous replication, etc. You'll find articles touching on varying aspects of what you're looking for if you dig through some of the earlier posts on their blog[1] or their FAQ[2].
I don't think so, you need consent to process personal data anywhere in the world, and you can only transfer it outside of the EU if there are appropriate safeguards - https://gdpr-info.eu/art-46-gdpr/
I used CockroachDB for a university project, and while I think it looks very promising, I found the tooling and documentation to be a bit lacking. I wouldn't use it in production yet. However, when CockroachDB matures a bit I can really see it take off.
Imagine a globally replicated and version controlled object store. You could use this for data, assets, source code, anything you like. Would be incredibly useful for both the development side of things, as well as production.
what a dumb article about nothing. hey, a database replicates stuff but so does a CDN! what a brilliant insight. let's make an article with this title but fill it with marketing text.
I'm curious, what sort of read latency is achievable with CockroachDB? Does it support some notion of tunable read consistency in order to achieve lower read latency at the expense of consistency?
Reads from CockroachDB go through a lease holder for the piece of data being read without needing confirmation from any replicas about consistency, so there is no overhead from replication. But read latency can be affected by writes on the data (conflicts), because it is fundamentally a consistent system (serializable). This is not tunable.
astine|8 years ago
Chickenosaurus|8 years ago
[1] https://en.m.wikipedia.org/wiki/PACELC_theorem
AYBABTME|8 years ago
marknadal|8 years ago
RethinkDB was Master-Slave (strongly consistent) with an amazing developer community and actually survived Aphyr's tests better than most systems.
CockroachDB is also in the Master-Slave (strongly consistent) camp, but more enterprise focused, and therefore will probably not fail, however their Aphyr report worked... but was unfortunately very slow. But hey, being strongly consistent is hard and correctness is a necessary tradeoff from performance.
Other systems, like Cassandra, us (https://github.com/amark/gun), Couch/Pouch, etc. are all in the Master-Master camp, and thus AP not CP. Our argument is that while P holds, realtime sync with Strong Eventual Consistency is good enough yet has all the performance benefits, for everything except for banking. Sure, use Rethink/Cockroach for banking (heck, better yet, Postgres!), but for fun-and-games you can do banking on top of AP systems if you use a CRDT or blockchain (although that kills performance, CRDTs don't) on top.
So yeah, I agree with you about CAP Theorem and stuff, disagree with Cockroach's particular choice - but they do have some pretty great detailed explainers/documentation on their view, and therefore should be treated seriously and not written off.
skybrian|8 years ago
It seems like a good solution for data that doesn't change too quickly?
philip1209|8 years ago
dantiberian|8 years ago
cagenut|8 years ago
there's a great talk on how its done with a custom gossip protocol implementation: https://www.youtube.com/watch?v=HfO_6bKsy_g
loiselleatwork|8 years ago
dstroot|8 years ago
Cockroach DB: trade some performance for geographic redundancy. The trade off may work in your favor - e.g. read heavy workloads (or not).
I plugged in CDB I place of Postgres for some testing this week, was surprised it worked so well.
icebraining|8 years ago
This is almost never the case, and CDNs are no exception. A CDN like Cloudflare that reuses your domain(s) means that they become just a useless point that your dynamic requests have to travel to and from the main server. A CDN that uses its own domains requires extra DNS queries, extra TCP & SSL connection setup, etc, plus it only starts loading when the browser has started processing the HTML.
dasil003|8 years ago
This was a huge win for QoS on cache misses, which were a significant portion of our traffic. There are tons of tradeoffs to make this happen which is why we couldn't find an off-the-shelf solution to deliver the same results.
zzzcpan|8 years ago
zzzcpan|8 years ago
When partitions heal you simply merge all versions through conflict-free replicated data types. No ugly decisions, no sacrificing neither latency nor consistency. We call it strong eventual consistency [1] nowadays. And it's exactly like CDNs, except more reliable.
I'm wondering, since CockroachDB keeps lying about and attacking eventual consistency in these PR posts, the whole "consistency is worth sacrificing latency" mantra might not work in practice after all. People just don't buy it, they want low latency, they want something like CDNs, something fast and reliable, something that just works. Something that CockroachDB can never deliver.
[1] https://en.wikipedia.org/wiki/Eventual_consistency#Strong_ev...
imtringued|8 years ago
What if two users want to change e.g. the telephone number of an existing record during a network partition.
There just is no obvious way to merge a telephone number. One of them is correct, the other is incorrect.
Can CRDTs solve my simple problem?
riku_iki|8 years ago
There are other DB systems with such properties "they" can use, e.g. Cassandra. But CockroachDB also gives ACID transactions, which may be important for others.
pdeva1|8 years ago
If the write has to be consistent and available across multiple regions, it will need to synchronously replicate that write to all the regions, thus incurring the same performance penalty as RDS or any other consistent database.
irfansharif|8 years ago
[1]: https://www.cockroachlabs.com/blog
[2]: https://www.cockroachlabs.com/docs/stable/frequently-asked-q...
daxfohl|8 years ago
itcmcgrath|8 years ago
dantiberian|8 years ago
unknown|8 years ago
[deleted]
jjevanoorschot|8 years ago
timkpaine|8 years ago
rhencke|8 years ago
https://github.com/ipfs/ipfs
prepend|8 years ago
rakoo|8 years ago
ddorian43|8 years ago
eip|8 years ago
S3?
dman|8 years ago
supergirl|8 years ago
hexspeaker|8 years ago
https://research.google.com/archive/spanner.html
philbogle|8 years ago
russell_h|8 years ago
susegado|8 years ago
appdrag|8 years ago
HumanDrivenDev|8 years ago
tyingq|8 years ago
Might be helpful if you already know the couchdb answer for each feature.
anon1253|8 years ago
Or more like federated SPARQL on RDF? https://www.w3.org/TR/sparql11-federated-query/
yuz|8 years ago
ddorian43|8 years ago
Never mention lower performance (even on single-node). Add to that aws-vps with pseudo-cores and spectre-upgrade and good luck with your tps-reports.
vog|8 years ago