top | item 11662536

PostgreSQL Scalability: Towards Millions TPS

521 points| lneves | 10 years ago |akorotkov.github.io | reply

210 comments

order
[+] _w3e8|10 years ago|reply
Postgres has been my DB of choice for nearly a decade. The only times I wind up working with another db are because:

(1 it is a better technical fit for a very specific problem

(2 there is already a legacy db in place

I have been voted down at a couple of startups that wanted to run a "MEAN" stack, invariably all of those startups moved from MongoDB or shutdown.

The only time I will advocate for anything other than Postgres is when Wordpress is involved. If the data model is simple enough then MySQL is more than up for the task, and it avoids an additional database dependency.

Thankfully all the ORM's that are worth using support MySQL and Postgres, so using both is very doable.

### Useful Postgres (or SQL in general) tools/libraries :

Bookshelf ORM http://bookshelfjs.org/

PostgREST automated REST API https://github.com/begriffs/postgrest

Sqitch git style migration management http://sqitch.org/

[+] jasonmp85|10 years ago|reply
Andres is a great coworker to have at Citus Data, though I first ran into him on the mailing lists shortly after starting at Citus myself. I was tasked with figuring out "why do certain read-only workloads fail miserably under high concurrency?"

I had never touched PostgreSQL before, nor any Linux performance tools, but I noticed that replacing certain buffer eviction locks with atomic implementations could drastically help this particular case. I emailed the list about it and Andres was someone who chimed in with helpful advice. I wrote up what I'd discovered in my deep dive here: http://tiny.cc/postgres-concurrency

Turns out Andres was already working on a "better atomics" patch to provide easier methods of using atomic operations within PostgreSQL's code base (my patch was a quick hack probably only valid on x86, if that). It's been useful in removing several performance bottlenecks and—two years in—it looks like it's still paying off.

[+] snhkicker|10 years ago|reply
I don't know much but to me it seems PostgreSQL is probably one of the most open and supporting communities maybe this is reason alot of new faces are looking at it including me.
[+] graffitici|10 years ago|reply
Are there any best practices for using PostgreSQL for storing time series data? Would it be comparable in performance to some of the NoSQL solutions (like Cassandra) for reasonable loads?
[+] bnchrch|10 years ago|reply
PostgreSQL has continued to be one the best open source en-devours so far. An amazingly smart and welcoming community that turns out arguabley the best in class relational database. Kudos and keep innovating team PG.
[+] devy|10 years ago|reply

     In partnership with IBM we researched PostgreSQL scalability on modern Power8 servers. 
That statement and the linked Russian blog white paper[1] makes it seem like a Power8 specific and Power8 is a "a massively multithreaded chip"[2]. I wonder how far off it would be to x86-64?

[1] https://habrahabr.ru/company/postgrespro/blog/270827/

[2] https://en.wikipedia.org/wiki/POWER8

[+] dimfeld|10 years ago|reply
The article says a few lines down, "The optimization #1 appears to give huge benefit on big Intel servers as well, while optimization #2 is Power-specific. After long rounds of optimization, cleaning and testing #1 was finally committed by Andres Freund."

Also, the chart indicates that the benchmarks shown were run on Xeon chips.

[+] jrcii|10 years ago|reply
Slightly tangential but I'm genuinely curious, does any have a theory as to why nearly every RDBMS post on Hacker News is about Postgres and almost never MySQL or MariaDB? Considering the relative obscurity of the former it seems somewhat inexplicable.
[+] esaym|10 years ago|reply
There's been a large migration off of mysql to postgres simply because Oracle got the rights to mysql when they purchased Sun. That is what made me consider, ditching mysql.

The final straw came when I stood up a mysql 5.6 instance to use as a data warehouse for about 5TB of data (15 billion rows). To my horror after spending a few weeks on this project I discovered that mysql only supported a small subset of the SQL language. Most of the needed sql features for data warehouse queries were simply not there (no WITH clause, no ROW_NUMBER(), only one sequence per table, no schema support, and many others that SQL report ninjas love to use)

With that I had to go back to management and convince them to put more time into the project so we could try again with postgresql. There was a large learning curve to go to postgres. It was just different in many areas, but at the same time, reminded me alot of IBM DB2 (both were a product of the 1980's if you look at the history). Postgres fit into our plans nicely, it actually had a full blown intelligent query parser and optimizer unlike mysql. From what I learned, mysql's purpose was to stop people from using text files as storage points in the early 2000's. But once you graduate from that, it's onward to postgres from there.

[+] combatentropy|10 years ago|reply
> why nearly every RDBMS post on Hacker News is about Postgres and almost never MySQL

You'll see the same phenomenon on Slashdot.

MySQL is popular among a subset of programmers: web developers. In corporations, Microsoft SQL and Oracle are more popular. Further, MySQL is popular among a subset of web developers: those who use PHP. Among web developers who use Python, Postgres seems more popular.

My suspicion is that MySQL's popularity is tied to those web-hosting plans (1 GB of storage! 1 TB of bandwidth! 10 databases! $10/month!). They were almost always MySQL databases. Web hosts that offered Postgres databases were few and far between. This article corroborates my theory: http://rodner.blogspot.com/2008/01/what-makes-mysql-so-popul.... MySQL's company (which at the time was "MySQL" not Oracle) pushed it in at a critical time in the development of the web.

I have written applications that use: Postgres, MySQL, SQLite, Microsoft SQL, and Oracle. My favorite by far is Postgres. The strangest thing to me is not that MySQL is more popular than Postgres, but that anyone uses Microsoft or Oracle at all. Not only do they cost a lot, but from a purely technical standpoint they are worse.

[+] fbernier|10 years ago|reply
I guess it depends on your own personal background, but PostgreSQL is far from 'obscure' and much more feature complete than MySQL nowadays.
[+] cwbrandsma|10 years ago|reply
MySQL is considered more "old school" and not great at being standards compliant when installed with default settings. See https://twitter.com/robconery/status/189086889486192640

Also, MySQL is run by Oracle...and Oracle a horrible company run by horrible people that behave horribly, and should never ever be trusted in any form. Any developer relying or Oracle code should not be trusted ever again, and Oracle consultants are the vilest form of life on earth.

Did I mention I don't like Oracle?

[+] matthewmacleod|10 years ago|reply
Postgres is widely regarded to be a good example of an open-source project – technically excellent, with a good codebase and community. MySQL has a history of being less technically sound – although it's certainly improved, I imagine it's left a bit of a legacy. In addition, the purchase of MySQL by Oracle probably makes many developers more skeptical of using it.

IME there isn't really a compelling case to use MySQL over Postgres, unless there's a specific environmental constraint.

[+] dorfsmay|10 years ago|reply
Your perception of PostgreSQL being obscure in comparison to MySQL is interesting... Most projects I've worked on use pg. My perception is that projects using MySQL do so because of history (project started in php and that was the default stack, it was faster at the time (before transactions), etc...).
[+] MBCook|10 years ago|reply
Besides the large popularity of Postgres, it seems to me (totally anecdotally) that they are more open about development than MySQL is. Or perhaps better at getting new developments noticed through PR than MySQL.

When a new version of MySQL comes out it's big news, but it seems like lots of the little day-to-day as it gets developed updates in Postgres get posted.

[+] wolf550e|10 years ago|reply
MySQL is the PHP of relational databases. You can do most things with it, but you don't choose it when you have the option to choose something better.
[+] ddorian43|10 years ago|reply
Because.... people that use postgresql are ~usually better developers, they care more + post more + upvote more.
[+] morgo|10 years ago|reply
lots of things happening in the MySQL world:

- a couple of weeks ago we released a document store (new protocol + connectors) based on protobufs

- last year 5.7 was released. Many new features, I have a list here: http://www.thecompletelistoffeatures.com

[+] skybrian|10 years ago|reply
Relative obscurity depends on the community. While there are a lot of MySQL installations, I'm under the impression there are more Postgres fans than MySQL fans on Hacker News these days.
[+] technion|10 years ago|reply
MySQL gets more than its share of popularity and discussion elsewhere. It's still the only supported Wordpress backend, which means it's the defacto web service backend for a large portion of the Internet. It's hard for more advanced users to find anything interesting and new about it.
[+] twic|10 years ago|reply
Are the MySQL or MariaDB communities publishing interesting stuff like this? That's a genuine question - i'm no longer a MySQL user, so i don't pay any attention to what's going on myself.
[+] Alex3917|10 years ago|reply
It's because Python tends to be the most common programming language for startups due to being easy to hire for, having good library support for a wide range of use cases, being good for web development, etc. And MySQL is focused on catering to enterprise customers who don't use Python. The last time I heard, they had something like one person working part time on Python support, so the drivers weren't nearly as reliable as the Postgres tooling.
[+] tormeh|10 years ago|reply
How does PostgreSQL compare to VoltDB?

I'm trying to get a handle on the different databases, and VoltDB sounds exciting, but everyone's talking about PostgreSQL. Then there's Mnesia which I hear is, as all things Erlang, excellent, though it's kinda tied to Erlang.

I know it's hard to say what's best, but what would you say is the best DB for a completely new multilingual project that needs throughput but prioritizes low latency, for example?

Also, VoltDB is licensed under AGPL. Does this mean that it can't be used in commercial projects? Or is it OK as long as the other components are on different servers or similar?

[+] matthewmacleod|10 years ago|reply
You almost certainly want Postgres, unless you have a compelling, specific use case to use a database optimised for a specific workload, or for some reason Postgres isn't usable in your specific environment. It's a reliable, well-designed general-purpose RDBMS which will scale up pretty well to cope with fairly large workloads; performance and latency will not be problems with a sensible schema.

VoltDB, as an example, is very different: it seems to be designed for simple OLTP workloads. It's an in-memory database, which offers opportunities for impressive performance, but if you have a large amount of data, you'll need a large amount of memory. And horizontal scaling is cool, but cross-partition operations will incur significant overhead

(IANAL, but the AGPL requires network users of software be able to download the source. Since a presumably proprietary application is the client in this case, this isn't likely to be an issue.)

[+] Alex3917|10 years ago|reply
> How does PostgreSQL compare to VoltDB?

If you don't know the difference, you probably want Postgres.

VoltDB is a specialty database for things like high frequency trading. It wouldn't make sense to use for, say, a consumer app or web startup.

[+] rdtsc|10 years ago|reply
WhatsApp I hear uses Mnesia. If you use Erlang and can fit everything in memory that does look pretty nice. It integrates right into the language.
[+] ddorian43|10 years ago|reply
Mongodb is also in agpl. But the drivers probably aren't. So you can use it and be fine. And if you make changes to VoltDb you have to share them.

11.5 How much data can be stored in Mnesia?

Dets uses 32 bit integers for file offsets, so the largest possible mnesia table (for now) is 4Gb.

[+] billrobertson42|10 years ago|reply
It's easier to think of Mnesia as more of a persistent distributed hash map rather than a full fledged database.
[+] Ono-Sendai|10 years ago|reply
Sounds like the padding stuff is a false sharing issue. They might want to look into putting 128 bytes of padding between the data structures as well: http://www.forwardscattering.org/post/29
[+] anarazel|10 years ago|reply
That's what the discussed padding patch does (except to only padding to 64bytes on most platforms). There's a downside though - on low concurrency the padding reduces the cache hit ratio sufficiently enough to cause a slowdown.

Given we're in code freeze anyway, I've not spent a lot of though on that yet; but I suspect that rearchitecting things so the lines are dirtied fewer times, is the better fix; with less potential for regressions at lower client counts.

[+] jlgaddis|10 years ago|reply
Man, how I wish WordPress had originally chosen to use PostgreSQL instead of MySQL back in the day.
[+] chc|10 years ago|reply
AFAIK WordPress was based on a stack that worked well on shared hosts of its day. They didn't really choose a lot of that stuff so much as have it chosen for them by the hosting community.
[+] jcoffland|10 years ago|reply
Why? WordPress would not be any better or easier to use unless you already had PostgreSQL installed. Frankly for a use case as simple as WordPress you don't need to over optimize your database.
[+] hinkley|10 years ago|reply
Always nice to hear about throughput improvements in Postgres.

How do these changes affect more heterogeneous workflows, of mixed reads and writes? A little better? A lot better? A little worse?

[+] anarazel|10 years ago|reply
Very dependent on the workload. The optimization isn't specific to reads or writes, but in many cases your bottleneck when writing will be elsewhere.
[+] hbrid|10 years ago|reply
Postgres is still going through the motions of a transaction for every query you issue it even if nothing else but that transaction is happening on the server. So obviously if you add extra load in the form of writes, you may slow your reads down, but this was not a full benchmark, but instead a comparison of the same workload running against multiple versions of Postgres.
[+] znpy|10 years ago|reply
Dumb question: in this context, "TPS" means... ?

T<what?> Per Second?

[+] mrjsson|10 years ago|reply
Transactions Per Second
[+] ionheart|10 years ago|reply
Big thanks for the community for the hardwork! I wonder if we set the "sync=off" in the test, will it be way higher than the OP results?
[+] gnarbarian|10 years ago|reply
The more efficient we can be at completing TPS reports the better. I must spend upwards of 40% of my time on them.