top | item 43840652

(no title)

luhn | 10 months ago

It's not mentioned in the headline and not made super clear in the article: This is specific to multi-AZ clusters, which is a relatively new feature of RDS, and differ from multi-AZ instance that most will be familiar with. (Clear as mud.)

Multi-AZ instances is a long-standing feature of RDS where the primary DB is synchronously replicated to a secondary DB in another AZ. On failure of the primary, RDS fails over to the secondary.

Multi-AZ clusters has two secondaries, and transactions are synchronously replicated to at least one of them. This is more robust than multi-AZ instances if a secondary fails or is degraded. It also allows read-only access to the secondaries.

Multi-AZ clusters no doubt have more "magic" under the hood, as its not a vanilla Postgres feature as far as I'm aware. I imagine this is why it's failing the Jepsen test.

discuss

ants_a|10 months ago

Interesting why this magic would be needed. Vanilla Postgres does support quorum commit which can do this. You can also set up the equivalent multi-AZ cluster with Patroni, and (modulo bugs) it does the necessary coordination to make sure to promote primaries in a way that does not lose transactions or makes visible a transaction that is not durable.

There still is a Postgres deficiency that makes something similar to this pattern possible. Non-replicated transactions where the client goes away mid-commit become visible immediately. So in the example, if T1 happens on a partitioned leader, disconnects during commit, T2 also happens on a partitioned node, and T3 and T4 happen later on a new leader, you would also see the same result. However, this does not jive with the statement that fault injection was not done in this test.

Edit: did not notice the post that this pattern can be explained by inconsistent commit order on replica and primary. Kind of embarrassing given I've done a talk proposing how to fix that.

sontek|10 months ago

Link the talk video

ashu1461|10 months ago

Have one question

So if snapshot violation is happening inside Multi-AZ instances, it can happen with a single region - multiple read replica kind of setup as well ? But it might be easily observable in Multi-AZ setups because the lag is high ?

luhn|10 months ago

A synchronous replica via WAL shipping is a well-worn feature of Postgres. I’d expect RDS to be using that feature behind the scenes and would be extremely surprised if that has consistency bugs.

Two replicas in a “semi synchronous” configuration, as AWS calls it, is to my knowledge not available in base Postgres. AWS must be using some bespoke replication strategy, which would have different bugs than synchronous replication and is less battle-tested.

But as nobody except AWS knows the implementation details of RDS, this is all idle speculation that doesn’t mean much.

unknown|10 months ago

[deleted]

x0x0|10 months ago

it's the 2nd sentence in the article:

> We show that Amazon RDS for PostgreSQL multi-AZ clusters violate Snapshot Isolation

you kind of have to expect people to read

evil-olive|10 months ago

I think it's still an important clarification, because for years you've had a choice in RDS (classic RDS, not Aurora) between "single-AZ" and "multi-AZ" instances, with the general rule of thumb that production workloads should always be multi-AZ.

however, "multi-AZ" has been made ambiguous, because there are now multi-AZ instances and multi-AZ clusters.

...and your multi-AZ "instance", despite being not a multi-AZ "cluster" from AWS's perspective, is still two nodes that are "clustered" together and treated as one logical database from the client connection perspective.

see [0] and scroll down to the "availability and durability" screenshot for an example.

0: https://aws.amazon.com/blogs/aws/amazon-rds-multi-az-db-clus...