(no title)
luhn | 9 months ago
I did a double take at this. At the onset of the article, the fact they're using a distributed database and the mention of a "mid 6 figure" DB bill made me assume they have some obscenely large database that's far beyond what a single node could do. They don't detail the Postgres setup that replaced it, so I assume it's a pretty standard single primary and a 100 million row table is well within the abilities of that—I have a 150 million row table happily plugging along on a 2vCPU+16GB instance. Apples and oranges, perhaps, but people shouldn't underestimate what a single modern server can do.
hliyan|9 months ago
Secondly, we did most of these things using SQL, Bash scripts, cron jobs and some I/O logic built directly into the application code. They were robust enough to handle some extremely mission critical systems (a failure could bring down a US primary market and if it's bad enough, you hear it on the news).
hylaride|9 months ago
For tables with a lot of updates, Postgres used to fall over with data fragmentation, but that's mostly been moot since SSDs became standard.
It's also easier than ever to stream data to separate "big data" DBs for those separate use cases.
electroly|9 months ago
I had a 200 billion row table that was operationally still manageable but, IMO, I had allowed to grow out of control. The enterprise storage costs a fortune. Should have nipped that in the bud by 20 billion at the latest.
paulddraper|9 months ago
It all depends though, sometimes 1b is passe.
But 100m is a good point to consider what comes next.
thehappyfellow|9 months ago
At $WORK, we write ~100M rows per day and keep years of history, all in a single database. Sure, the box is big, but I have beautiful transactional workloads and no distributed systems to worry about!
rastignack|9 months ago
wvh|9 months ago
I'm just saying, simple is nice and fast when it works, until it doesn't. I'm not saying to make everything complex, just to remember life is a survivor's game.
icedchai|9 months ago
The largest table was over 100 million rows. Some migrations were painful, however. At that time, some of them would lock the whole table and we'd need to run them overnight. Fortunately, this was for an internal app so we could do that.
luhn|9 months ago
throwaway7783|9 months ago
SchemaLoad|9 months ago
thomasfromcdnjs|9 months ago
That sounds insane for a crud app with one million users.
What am I missing?
gbear605|9 months ago
ies7|9 months ago
imhoguy|9 months ago
sgarland|9 months ago
williamdclt|9 months ago
ringeryless|9 months ago
monero-xmr|9 months ago
h1fra|9 months ago
javier2|9 months ago
sethammons|9 months ago
TheNewsIsHere|9 months ago
There was something like 120 million rows in the database. It ran on a single VM. It really needed the indexes, but once those were built it just sang.
This was easily 10+ years ago.
casper14|9 months ago
luhn|9 months ago
williamdclt|9 months ago
Obv it depends on your query patterns
perrygeo|9 months ago
Your table on a small VPS (which I concur is totally reasonable, am running something similar myself): Let's say your VPS costs $40/mo x 12 = $480/yr. Divide into 150 million. You get 312,500 rows per dollar.
I'd wager you server was faster under normal load too. But is it webscale? /s
There's waste, then there's "3 orders of magnitude" waste. The pain is self-inflicted. Unless you have actual requirements that warrant a complex distributed database, you should "just use postgres".
And just to calibrate everyone's expectations, I've seen a standard prod setup using open source postgres on AWS EC2s (1 primary, 2 replicas, 1 haproxy+pgbouncer box to load balance queries) that cost ~ $700k annually. This system was capable of handling 1.2 million rows inserted per second, while simultaneously serving thousands of read queries/s from hundreds of internal apps across the enterprise. The cost effectiveness in their case came out to ~ 20k rows per dollar, lower than your VPS since the replicas and connection pooling eat into the budget. But still: 2 orders of magnitude more cost effective than the hosted distributed hotness.