top | item 19007702

(no title)

andydb | 7 years ago

FaunaDB has to make painful (to applications) tradeoffs between latency and consistency in global scenarios:

1. All read-write transactions pay global latency to a central sequencer. So, yes, FaunaDB is strictly serializable, but at the cost of high latency of read-write transactions.

2. Read transactions have to choose between:

  a. Strict serializability but high latency
  b. Stale reads but low latency

Anomalies absolutely can happen in FaunaDB if applications use option 2b. However, most users will not appreciate the subtlety here, and some will unwittingly go into production with consistency bugs in their application that only manifest under stress conditions (like data centers going down and clogged network connections). Their only other option is 2a, and that is just a no-go for global scenarios. You can't route your regional reads through a central sequencer that might be located on the other side of the world.

Your argument reduces to: "FaunaDB has no anomalies in global scenarios! That is, as long as you're OK with a global round-trip for every read-write and read-only transaction...". FaunaDB has not solved the consistency vs. latency tradeoff problem, but has simply given the application tools to manage it. A heavy burden still rests on the application.

By contrast, Spanner users get both low latency and strict serializability for partitioned reads and writes (that's and, not or, like FaunaDB). CockroachDB users get low latency and "no stale reads" for partitioned reads and writes. Partitioning your tables/indexes to get both high consistency and low latency is a requirement, but it's not difficult to do this in a way that gives these benefits to the majority of your latency-sensitive queries, if not all of them. After all, this is what virtually every global company does today - they partition data by region, so that each region gets low latency and high consistency. You only pay the global round-trip cost when you want to query data located across multiple regions, which is rare by DBA design.

The main point of the article is that the "no stale reads" isolation level is almost as strong as "strict serializability", and is identical in virtually every real-world application scenario. This means CockroachDB is equivalent to Spanner for all intensive purposes.

discuss

evanweaver|7 years ago

Re. 1, this is a common misunderstanding. The FaunaDB sequencer is not physically centralized: it is logically centralized, but physically partitioned. Transactions find the nearest sequencer and commit to the closest majority set of logs for that sequencer, which usually is equivalent to the nearest majority set of datacenters (specialized topologies can do a little bit better). This gives predictable, moderate latency for all read-write transactions regardless of data layout or transaction complexity.

Re. 2, I understand your perspective, but I disagree that the worse-is-better argument is valid. Google Spanner offers exact-staleness and bounded-staleness snapshot reads, almost identical to FaunaDB. The reason is that the 10ms clock ambiguity window is still too long for many users to wait for serializability. Like FaunaDB, Spanner users must use the consistency levels correctly or anomalies will result, but these anomalies typically only occur in read-modify-write scenarios that are not wrapped in a transaction. Doing that in any database (including in CockroachDB) creates anomalies at any isolation level, because it defeats conflict detection.

But it turns out that waiting out the clock ambiguity window in the public cloud is actually worse than routing to every partition leader all the time. So CockroachDB offers neither snapshot isolation reads, nor serializable follower reads, nor strict serializability for writes. It is not equivalent to Spanner. As you explain, CockroachDB applications have no choice but to avoid creating transactions that read or write data partitioned in other datacenters. Transactions that do are much higher latency than Spanner and FaunaDB both. This mandatory partitioning is a non-relational experience—the data must be laid out in the way it will be queried—and it is harder than understanding an additional consistency level and taking advantage of it when appropriate.

Like you say, FaunaDB has "given the application the tools" to manage global latency.

andydb|7 years ago

Re. 1, doesn't the majority set of logs for a given logical sequencer need to overlap with the majority set of logs for each other logical sequencer? If not, I don't see how you guarantee strict serializability without clocks. But if so, then all you're saying is that you're "averaging" the latencies to get better predictability. So if the average latency between any 2 DC's is 100ms, but the max latency between the two furthest DC's is 300ms, then you'll get closer to 100ms everywhere in the world by using this scheme. That's a good thing, but it's still 100ms!

However, since I may be misunderstanding something, a concrete example is helpful. Say you have 3 DC's in each of 3 regions (9 total DC's): Europe, Asia, and the US, with 10ms of intra-region latency and 100ms of inter-region latency. Say I want to commit a read-write transaction. What commit latency should I expect?

Re. 2, the reason Spanner (and soon CockroachDB) offer bounded-staleness snapshot reads is not due to the clock ambiguity window. You're working with old information from a paper published years ago. At this point, my understanding is that Google has reduced the window from ~7ms down to 1-2ms (maybe even less). Furthermore, these are reads, which don't need to wait in Spanner to begin with. There are at least 2 scenarios where bounded-staleness reads are useful:

  1. When you have heavy read load and want to distribute that load across all of your replicas.
  2. When you want to read from the nearest replica in order to reduce latency, in cases where replicas are geographically distributed.

You also have a misconception about CockroachDB. In practice, CockroachDB almost never waits when reading. It only does so when it tries to read a record at time T1, and notices that another transaction has written to that same record at time T2 (i.e. later in time, within the clock uncertainty window). So unless you're trying to read heavily contended records, you'll rarely see any wait at all.

You're also incorrect about the reasons for CockroachDB's lack of support for follower reads. Here is an RFC describing how bounded-staleness follower reads will work in an upcoming version:

  https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20180603_follower_reads.md

The real reason that CockroachDB does not yet support follower reads is that other things took priority in the development schedule. It's always been possible to support them. I expect they'll work with similar utility and latency to Spanner and FaunaDB follower reads.

As for your assertion that "Ones that do are much higher latency than Spanner and FaunaDB both", it's just ... wrong. Perhaps you're thinking of older versions, before CockroachDB had transaction pipelining:

  https://www.cockroachlabs.com/blog/transaction-pipelining/

Furthermore, the end of the article alludes to an upcoming feature called "parallel commits", in which most distributed transactions will have commit latency = ~1 round of distributed consensus.