top | item 47131119

(no title)

Can you elaborate a bit more on the challenges faced in making Postgres shard-able?

I remember that adding sharing to Postgres natively was an uphill battle. There were a few companies who has proprietary solutions for it. What you've been able to achieve is nothing less than a miracle.

discuss

levkk|6 days ago

So many, where to begin.

1. People don't design schemas to be sharded, although many gravitate towards a common key, e.g. user_id or country_id or tenant_it or customer_id. Once that happens, sharding becomes easier.

2. Postgres provides a lot of guarantees that are tricky to maintain when sharded: atomic changes, referential integrity, check constraints, unique indexes (and constraints), to name a few. Those have to be built separately by a sharding layer (like PgDog) and have trade-offs, usually around performance. It's a lot more expensive to check a globally enforced constraint than a local one (network hops aren't free).

3. Online migrations from unsharded to sharded can be tricky: you have to redistribute terabytes of data while the DB continues to serve writes. You can't lose a single row - Postgres is used as a store of record and this can be a serious issue with business impact.

We're taking increasingly bigger bites at this apple. We started with basic query routing and are now doing query rewrites as well. We didn't handle data movements previously and now have almost fully automatic resharding. It takes time, elbow grease and most importantly, willing and courageous early adopters to whom we owe a huge debt of gratitude.

written-beyond|5 days ago

That's was my second question, how on earth can you replicate real world Postgres workloads that benefit the most from sharing.

Are there some specific standard Postgres test suites you run PgDog through to ensure it's compliant with Postgres standards?

You've mentioned NoSQL quite a bit, what sort of techniques do shard-able NoSQL database employ which makes sharding inherently easier? Do you attempt to emulate some of those techniques in PGDog?

Lastly how do you solve the problem of Postgres constraints, from what I've understood PgDog runs standard Postgres instances as the shard, if let's say one table in shard 1 has a foreign key to a record in shard 2 how do you prevent Postgres from rejecting that record since it technically doesn't exist on it's current shard?