top | item 47131464

(no title)

levkk | 6 days ago

So many, where to begin.

1. People don't design schemas to be sharded, although many gravitate towards a common key, e.g. user_id or country_id or tenant_it or customer_id. Once that happens, sharding becomes easier.

2. Postgres provides a lot of guarantees that are tricky to maintain when sharded: atomic changes, referential integrity, check constraints, unique indexes (and constraints), to name a few. Those have to be built separately by a sharding layer (like PgDog) and have trade-offs, usually around performance. It's a lot more expensive to check a globally enforced constraint than a local one (network hops aren't free).

3. Online migrations from unsharded to sharded can be tricky: you have to redistribute terabytes of data while the DB continues to serve writes. You can't lose a single row - Postgres is used as a store of record and this can be a serious issue with business impact.

We're taking increasingly bigger bites at this apple. We started with basic query routing and are now doing query rewrites as well. We didn't handle data movements previously and now have almost fully automatic resharding. It takes time, elbow grease and most importantly, willing and courageous early adopters to whom we owe a huge debt of gratitude.

discuss

written-beyond|5 days ago

That's was my second question, how on earth can you replicate real world Postgres workloads that benefit the most from sharing.

Are there some specific standard Postgres test suites you run PgDog through to ensure it's compliant with Postgres standards?

You've mentioned NoSQL quite a bit, what sort of techniques do shard-able NoSQL database employ which makes sharding inherently easier? Do you attempt to emulate some of those techniques in PGDog?

Lastly how do you solve the problem of Postgres constraints, from what I've understood PgDog runs standard Postgres instances as the shard, if let's say one table in shard 1 has a foreign key to a record in shard 2 how do you prevent Postgres from rejecting that record since it technically doesn't exist on it's current shard?

levkk|5 days ago

> Are there some specific standard Postgres test suites you run PgDog through to ensure it's compliant with Postgres standards?

That's right. We have many levels of testing: unit, integration, and acceptance, where we run the same query against an unsharded Postgres database and PgDog, and compare the result.

> what sort of techniques do shard-able NoSQL database employ which makes sharding inherently easier?

They remove features. For example, most of them don't support joins, so each table can be stored anywhere in the cluster with no data locality restrictions. There are no foreign key constraints either, or even transaction support. The list goes on. Ultimately, NoSQL databases are just K/V stores, with a fancy API. Scaling K/V is a solved problem.

> one table in shard 1 has a foreign key to a record in shard 2 how do you prevent Postgres from rejecting that

We don't, at least not yet. We can and will build a more sophisticated query engine that will validate constraints, but it may not always be completely atomic or performant. Cross-shard queries are expensive, because of the laws of physics. For example, if a query is executed outside of a transaction, validating the constraint could introduce a race condition, while in non-sharded Postgres, all queries run inside implicit transactions.