top | item 47132551

(no title)

levkk | 6 days ago

A couple options come to mind:

1. Replicate shards into one beefy database and use that. Replication is cheaper than individual statements, so this can work for a while. The sink can be Postgres or another database like Clickhouse. At Instacart, we used Snowflake, with an in-house CDC pipeline. It worked well, but Snowflake was only usable for offline analytics, like BI / batch ML, and quite expensive. We'll add support for this eventually; we're getting pretty good at managing logical replication, including DDL changes.

2. Use the shards themselves and build a decent query engine on top. This is the Citus way and we know it's possible. Some queries could be expensive, but that's expected and can be solved with more compute.

In our architecture, shards going down for maintenance is an incident-level event, so we expect those to be up at all times, and failover to a standby if there is an issue. These days, most maintenance tasks can be done online in-place, or with blue/green, which we'll support as well. Zero downtime is the name of the game.

discuss

No comments yet.