garrettf | 4 years ago | on: Lessons learned from sharding Postgres at Notion
garrettf's comments
garrettf | 4 years ago | on: Lessons learned from sharding Postgres at Notion
Luckily we did not have many join queries prior to sharding. The few cross-database joins were trivial to implement as separate queries in application logic.
Our main concern here was referential consistency: if you need to write to multiple databases at once, what happens if one of your writes fails? This was not a problem in practice for us since (1) our unsharded data is relatively static and (2) there are very few bidirectional pointers between data across databases.
Long term there are more interesting problems to solve when distributing unsharded data globally. However, given that our unsharded data is less dynamic and consistency is less critical, we have many levers we can pull here.
garrettf | 4 years ago | on: Lessons learned from sharding Postgres at Notion
All in the application layer! All of our server code runs from the same repo, and every Postgres query gets routed through the same module. This means that it was relatively easy to add a required "shard key" argument to all of our existing queries, and then within our Postgres module consult an in-app mapping between shard key range and DB+schema.
Plumbing that shard key argument through the application was more difficult, but luckily possible due to the hierarchical nature[0] of our data model.
> Did you consider using several databases on the same Postgres host
If I recall correctly, you cannot use a single client connection to connect to multiple databases on the same host, and so this could have ballooned our connection counts across the application. This is not something we explored too deeply though, would love to hear about potential benefits of splitting tables in this way.
garrettf | 4 years ago | on: Lessons learned from sharding Postgres at Notion
At the end of the day, this is something we could have explored in more depth, but we were ultimately comfortable with the risk tradeoff of migrating all users at once vs. the consequences of depending on the monolith for longer, largely thanks to the effort we put into validating our migration strategy.
[0] https://blog.sentry.io/2015/07/23/transaction-id-wraparound-...
garrettf | 4 years ago | on: Lessons learned from sharding Postgres at Notion
I’m happy to answer questions about the project here, feel free to reply below.
We were on a tight timeline due to impending TXID wraparound. Switching database technologies would have required us to rewrite our queries, reexamine all of our indexes, and then validate that queries were both correct and performant on the new database. Even if we had the time to derisk those concerns, we'd be moving our most critical data from a system with scaling thresholds and failure modes we understand to a system we are less familiar with. Generally, we were already familiar with the performance characteristics of Postgres, and could leverage a decent amount of monitoring and tooling we built atop it.
There is nothing inherent about non-relational DBs that make them unsuitable for our workload. If we were designing our data storage architecture from scratch, we'd consider databases that are optimized for our heaviest queries (vs. the flexibility afforded by Postgres today). A large number of those are simple key-value lookups, and a plain key/value store like DynamoDB is great for that. We're considering these alternatives going forward, especially as we optimize specific user workloads.
Re: Cloud Spanner: we didn't consider a cross-cloud migration at the time due to the same time constraints. Still sounds like a wonderful product, we were just not ready at the time.