top | item 37618573

(no title)

berkle4455 | 2 years ago

I think with any database the solution is simply backups no? backups that preferably aren’t tied to the hosted solution at all.

discuss

order

panyam|2 years ago

Actually not quite. Backups (assuming you are doing single node) still needs you to decide on your SLOs - RTO (how long it takes to restore) and RPO (how much data lose you can suffer) numbers. On the instant snazy end you have streaming backups and recovery and then on the other extreme you have backup once in N hours/days and restore taking how ever long it takes to restore (so you have customer outages you need to negotiate.

Now let us involve multi node, (both replication and partitioning of shards). As shards go and up and down ensuring data is in sync etc is a hard consistency problem and needs man years of operational excellence and bug fixing.

So when people think databases - they think of the cool stuff - the database engine that does relational algebra and handles SQL queries. That is (IMO) only 1% of a practical, performant, reliable database (offering).

api|2 years ago

Maybe if you are gigantic, but there is a long tail of people with <1TB database needs that don’t really need shards and can be well served by a fail over cluster with a master and one or two replicas that can become masters.

These days you don’t really need shards until you hit many terabytes or even more depending on your read and especially write load. NVMe storage is really fast and lots of RAM for caching has become cheap.

tetha|2 years ago

Also what about the customer that deleted an important thing 6 weeks ago and absolutely needs it recovered? BTW, it's just one tentant in that DB, the other shouldn't be recovered, naturally.

crabbone|2 years ago

Backups? Do you want to share your idea about how you'd do backups? Especially to a distributed database?

Here are some of the questions you'll have to answer and some options you will have to consider before you go there:

Let's start with the heavy stuff: consistency groups. I.e. groups of bulk storage that underlines your entire infrastructure that ensure that your application and database(s) all recover to the shared state once they crash. To better explain this concept, consider this: you have an application that works with two databases, let's say a document database to store documents uploaded by users (which are later parsed by the application and transformed into records in a relational database). Now, each database provides best consistency guarantees... but they still can fail independently and subsequently recover to different state, where, for example, the document database can be ahead of the relational one (and lose some data). Similar problems face sharded databases.

How geographically far are you going to send your backups? You see, the closer to the working server they are, the higher is the chance you'll lose them together. But, here's the problem: the further away the backups are, the lower is your ability to keep the backup up-to-date with the database, and, subsequently, more data to lose.

Well, backups inherently lose data (for the time between the last backup and the time of the crash). So, if you don't want to lose data at all, you probably want replication rather than backups. And you probably want online replication (but then the distance between the replicas is even more important than in the case with backups).

Also, backups are huge. If you want to ship them outside of the facilities of the storage vendor... that's going to be expensive.

Another point to consider: databases provide consistency guarantees, but does your database provide consistency guarantees you want? Is every relation encoded by using foreign keys, or does the application have some knowledge of how to interpret pieces of data and stitch them together into relationships unknown to your database? Are you sure that every operation that requires atomicity is implemented in a database rather than application (which doesn't enforce atomicity)? What if you stick a backup (recovery point) in a precise moment when your application was doing something that was meant to be atomic, but the application author didn't know how to express in SQL (because in their fear of technology they chose to use Hybernate or SQLAlchemy etc.)? And if you do so, it spoils your backup...

gbartolini|2 years ago

I actually do not understand the point here. And maybe you are not very familiar with the concept of transactions. Backups can only account for committed transactions.

However, we are talking about Postgres, here, not a generic database. PostgreSQL natively provides continuous backup, streaming replication, including synchronous (controlled at transaction level), cascading, and logical. You can easily implement with Postgres, even in Kubernetes with CloudNativePG, architectures with RPO=0 (yes, zero data loss) and low RTO in the same Kubernetes cluster (normally a region), and RPO <= 5 minutes with low RTO across regions. Out of the box, with CloudNativePG, through replica clusters.

We are also now launching native declarative support for Kubernetes Volume Snapshot API in CloudNativePG with the possibility to use incremental/differential backup and recovery to reduce RTO in case of very large databases recovery (like ... dozens of seconds to restore 500GB databases).

So maybe it is time to reconsider some assumptions.