top | item 47137110

(no title)

written-beyond | 5 days ago

That's was my second question, how on earth can you replicate real world Postgres workloads that benefit the most from sharing.

Are there some specific standard Postgres test suites you run PgDog through to ensure it's compliant with Postgres standards?

You've mentioned NoSQL quite a bit, what sort of techniques do shard-able NoSQL database employ which makes sharding inherently easier? Do you attempt to emulate some of those techniques in PGDog?

Lastly how do you solve the problem of Postgres constraints, from what I've understood PgDog runs standard Postgres instances as the shard, if let's say one table in shard 1 has a foreign key to a record in shard 2 how do you prevent Postgres from rejecting that record since it technically doesn't exist on it's current shard?

discuss

order

levkk|5 days ago

> Are there some specific standard Postgres test suites you run PgDog through to ensure it's compliant with Postgres standards?

That's right. We have many levels of testing: unit, integration, and acceptance, where we run the same query against an unsharded Postgres database and PgDog, and compare the result.

> what sort of techniques do shard-able NoSQL database employ which makes sharding inherently easier?

They remove features. For example, most of them don't support joins, so each table can be stored anywhere in the cluster with no data locality restrictions. There are no foreign key constraints either, or even transaction support. The list goes on. Ultimately, NoSQL databases are just K/V stores, with a fancy API. Scaling K/V is a solved problem.

> one table in shard 1 has a foreign key to a record in shard 2 how do you prevent Postgres from rejecting that

We don't, at least not yet. We can and will build a more sophisticated query engine that will validate constraints, but it may not always be completely atomic or performant. Cross-shard queries are expensive, because of the laws of physics. For example, if a query is executed outside of a transaction, validating the constraint could introduce a race condition, while in non-sharded Postgres, all queries run inside implicit transactions.

written-beyond|5 days ago

Aaah you've got me excited and thinking about all sorts of ways this can fix the issue. I really appreciate your time for answering my questions, it's all very interesting.

Can't PgDog pull in the query planning and execution part from Postgres, and maintain a cache of the different indexes that are available pulled in from the different postgres shards and then follow through on the execution. This way PgDog could technically scale up to as many instances and keep postgres instances themselves as just a persistence backend?

However, I understand that at that point you're basically just making an entirely new database not really a sharding support service on top of postgres, you'd need to attempt to maintain feature parity with postgres which can turn into a maintenance pain.

Do you have any insights on how platforms like planetscale or cockroach are doing some of this stuff?