Updating a 50 terabyte PostgreSQL database (2018)

[+] lelandbatey|4 years ago|reply

This article mentions that a key piece of software they use in all this is Slony[0], software which allows replicating one primary/master to many secondaries/slaves, even when those replicas have different versions. That's pretty cool, but I'd like to instead draw your attention (dear reader) to the phenomenal writing happening in Slonys description of itself. It describes, in plain but technical language what Slony is, then immediately jumps into addressing "why should you use this?" and does so very directly. It reiterates the problem it solves, mentions other potential solutions by name to contextualize the current solution space, then explains in a nice bulleted list exactly the situations and niches that Slony is tailored for.

Oh my heavens, I wish every piece of technology described itself this way!

[0] - Slony -- https://www.slony.info/

[+] xblau|4 years ago|reply

Archived link, since it appears to be down at the moment: https://web.archive.org/web/20211031124351/https://www.slony...

[+] leifg|4 years ago|reply

One of the things I always wonder with giant relational database is. How much of the "typical relational stuff" are they actually using?

Do they have constraints on rows? Are they using views or do they just denormalize and duplicate? Do they use joins at all? Are they even doing more than 1 thing in a transaction?

[+] mulmen|4 years ago|reply

My only experience with databases of that size is for data analysis so yeah, constraints are relaxed. But even at that point ideas like normalization are critical to extracting performance out of large datasets.

Normalization is a performance optimization. Denormalization is a development shortcut. Neither is right or wrong but I would be surprised if a 50TB OLTP database wasn’t already highly normalized.

If it isn’t then my next guess is that it could be made smaller or more performant if it was.

We used to be proud of server uptime, back when we gave them names. Today if you have a server up for 900 days you’re going to be accused of malpractice.

Similar for data. We used to be proud of being able to keep “big” data online, but I’m no longer impressed. You’re just spending money. Did you actually solve the business case as efficiently as possible given available resources? Do you need 50TB in the first place?

[+] blacklight|4 years ago|reply

As a former Adyen employee, I can confirm that they use the relational features a lot. It's a ledger with millions (probably billions by now) of financial records, and it's heavily optimized for flexible querying. You have raw tables of transactions that are linked to many other tables (accounts, exchange rates, risk, transaction accept/capture/settle records, batches etc.), and you can ideally query the state of any transaction by just joining those tables on the db. What they also have is plenty of aggregate tables materialized by consumers and compactors, which are used a lot for reporting and auditing - for instance, daily transacted amount broken down by account and currency, or monthly revenue reports broken down by marketplace and currency. Additionally, being a financial institution they are also compelled to hold these records for several years, so there aren't many options for reducing the size of the db by aggregating/moving old records.

[+] idiocrat|4 years ago|reply

One question to ask is if they truly need all the 50 TB data in one database instance at once.

Some old inactive data can be moved into partitions, then detached and moved to a different DB instance, as an archiving step.

It reads like their processed payments (5000 tx/second) are very suitable for daily, monthly, yearly archiving.

For analytics, a summary stab can be kept in place of the detailed (archived) transactions.

For statutory reporting, the slow archives can be accessed on demand.

[+] simonw|4 years ago|reply

I didn't understand how they avoid downtime during the upgrade - they start with "Stop traffic to the database cluster" and then perform a bunch of steps that each take 5-10 minutes or longer before starting traffic again - so do they just accept the downtime and schedule it in advance, or did I miss a detail?

[+] gbrown_|4 years ago|reply

Whilst others have mentioned their application is architected to cope with this I wanted to echo the sentiment. This post feels very much like a “draw the rest of the owl” meme.

[+] rattray|4 years ago|reply

Yeah, they architect their application to accept DB downtime – but I'm sure their services are still degraded to some degree or another during this, and they aren't clear how much total DB downtime they need for this (and how that time scales across various axes).

Overall my takeaway is basically "if you want to upgrade a large Postgres db, you'll need like an hour of planned downtime and a lot of careful work" which… doesn't make me excited to upgrade big postgres db's in production.

[+] vonnieda|4 years ago|reply

> One other detail to note is that we built our software architecture in such a way that we can stop traffic to our PostgreSQL databases, queue the transactions, and run a PostgreSQL update without affecting payments acceptance.

[+] throwaway29303|4 years ago|reply

(2018)

[+] wiz21c|4 years ago|reply

was wondering why they were using old PG's :-)

[+] LoungeFlyZ|4 years ago|reply

Irony. I think the HN post took down the Slony website.

[+] mulmen|4 years ago|reply

http://howfuckedismydatabase.com/postgres/slony.php

[+] paozac|4 years ago|reply

50TB is not so big these days. I read that in 2008 (!) Yahoo had a 2+ PB PG database. What is the largest you know of, 14 years later?

[+] GordonS|4 years ago|reply

50TB is big. Bigger is possible I'm sure, but I'd guess 99.something% of all PG databases are less than 50TB.

If someone here commented they had a 2PB database, I guarantee someone else here would be like "pfft, that's not big"...

[+] ok_dad|4 years ago|reply

Around ~2005 I took a tour of the [a well known government organization] and they were bragging about several-PB-sized databases at the time. Interestingly, there was a TON of server racks there in a bomb-proof building with tons of security, and they were all IBM servers (a supercomputer maybe?), if I remember correctly. Also, there was one small server rack that was painted differently from the rest (it looked like something made in-house), and we asked what it was, and the tour guide (a PhD computer scientist) said that technically it doesn't exist and he couldn't talk about it even though it was super cool. Now that I know what they were doing around that time (and probably still today) I am kinda scared at the implications of that tour guide's statement and what that one tiny rack was for. I'm glad I never went to work in their organization, since that tour was meant to recruit some of us a few years down the road.

[+] Gigachad|4 years ago|reply

How are people dealing with databases this large? At work we have a mysql db with a table that has 130M records in it and a count(*) on the table takes 100 seconds. Anything but a simple look up by id is almost unworkable. I assumed this was normal because its too big. But am I missing something here? Are SQL databases capable of actually working fast at 50TB?

[+] hsbauauvhabzb|4 years ago|reply

Security and incident response systems ingesting log files from other systems can get big, add in ‘must store for $x years’ compliance fuzz and you might hit some big numbers

[+] ddorian43|4 years ago|reply

Was it a single server?

[+] jfbaro|4 years ago|reply

Thanks for sharing this. Really interesting. But a basic question, why not upgrading to PG 13 instead? I am curious about the reasons for staying on an older version of PostgreSQL.

[+] jfbaro|4 years ago|reply

I see it now that it's a 2018 article. Makes sense!

[+] hawk_|4 years ago|reply

Interesting. Does PG13 let then do this without downtime?

[+] barrkel|4 years ago|reply

Upgrading, not updating.

[+] unknown|4 years ago|reply

[deleted]

[+] abagheri43|4 years ago|reply

[deleted]

[+] abagheri43|4 years ago|reply

[deleted]

[+] wruza|4 years ago|reply

While others saying 50T DB is relatively normal, personally I can't imagine myself dealing with that sort of data which is always online. If I were them, I'd ask myself, could we archive this data into split chunks? Is this archive even required to be online? What is the size of a subset that has actuality? And so on. Of course they have answers to that and they are generally "no", but my imagination just stops here.

Some time ago when I studied bitcoin ledger structure, I was confused about how it does sum up all transactions to get the balance so quickly, like we in accounting usually do, with the help of some period caching, which is another point of failure and maintenance. Bitcoin is a large enough database to not do that easily. Few docs later I realized that it doesn't do this:

  mining -> A 1
  A -> B 0.2
  (sum(A) == 0.8)

And instead it does this:

  mining -> A 1
  A -> (B 0.2; A 0.8)
  (last(A) == 0.8)

No sums required, all balances are checked by essentially "lookup the latest appearance in a db", where lookup also involves some merkle trees optimization, which I didn't understand enough to remember.

[+] samhw|4 years ago|reply

> I was confused about how it does sum up all transactions to get the balance so quickly, like we in accounting usually do

For what it's worth, in a system design context, this is called event sourcing (in more general terms it's just a fold, though the idea of event sourcing doesn't preclude caching). I worked at a bank for a few years and this was how we calculated balances too.

> And instead it does this:

I don't quite follow you here. It does something distinct from either a fold or caching the current total? (Do you mean that each transaction encodes the resultant balance?)

60 comments