top | item 577373

Ask HN: When to use MySQL vs PgSQL?

36 points| Bjoern | 17 years ago | reply

Dear HN,

first of all thank you that you are reading this. Out of caution, please let me remind you that I don't want you to take this question and to start a "flame war". Arguments with facts would really be constructive.

I am sincerely interested if my current assumptions are true or where I am wrong.

Background:

My current employer asked me about MySQL and I mentioned those points below which were later countered with this:

http://pastie.org/456941

(Large) Parts of this gave me the classical "WTF" moment. So I thought I ask you about this.

Detailed arguments against MySQL from my side:

A long time ago I saw this nice PDF where there were some measurements between PgSQL and MySQL. http://people.freebsd.org/~kris/scaling/7.0%20and%20beyond.pdf Title: "FreeBSD 7.0 and Beyond" on Page 9 following there is a Case Study of MySQL vs. PgSQL.

This was for me the first reason to really take a closer look at MySQL.

Secondly I know that MySQL uses two different internal SQL engines. A X/Open XA distributed transaction processing (DTP) support; two phase commit as part of this, using Oracle's InnoDB engine

Full-text indexing and searching using MyISAM engine

Should Oracle at some point decide to revoke the usage rights of InnoDB for the open source community then users of MySQL would need to switch to MyISAM which is, frankly said, terribly slow.

http://en.wikipedia.org/wiki/MySQL#Criticism Regarding the licensing renewal which is necessary. Now even Oracle is planning to buy Sun, which would make me think hard why they would renew the license of InnoDB for MySQL anyway? (creating competition - "multi-year" extension of their licensing agreement, why would they continue?)

The next reason was that I have no theoretical, but empirical proof (I didn't enjoy debugging that) that I end up more often with table data corruption if I run MySQL. Ok, I can repair it, but why should that take up more of my time away doing other more productive things? (It is even mentioned in the Criticism point of the wikipedia link above).

Also critical bugs get fixed slowly. E.g. It took them from 2003-2008 to fix this bug. http://bugs.mysql.com/bug.php?id=989

Dear HN, please help me with this question. Again, thank you for your time.

42 comments

[+] mdasen|17 years ago|reply

Use a good abstraction library and start coding. Seriously.

PostgreSQL tends to be what is favored by people here. I used to use it for everything until I tried to set up a replicated cluster with it. PostgreSQL's replication is severely lacking. Slony-I, the most mature of the bunch, is very difficult to set up (with many steps needed for every single table) and a lot slower than MySQL's replication (due to its use of SQL and triggers rather than binary log shipping as well as the fact that its design causes communication costs to grow quadratically).

However, PostgreSQL's query planner is a lot better than MySQL's (especially if you're doing something like subqueries) and I find that complex queries run decently faster. The community process is also a lot more attractive.

To address the Oracle question:

Oracle cannot withdraw InnoDB. It's GPL licensed. Anyone can fork it should Oracle decide they don't want to play ball - just as anyone can fork MySQL. In fact, there are already forks underway including Drizzle. So, that isn't a big deal. The bigger deal would be if Oracle decided to halt future development. I'm guessing they won't since it would simply mean they would loose control to the community that still has rights to the GPL'd code.

In the end, don't worry about this issue. It distracts from what is really important: actually creating something. They both work fine. There are plenty of abstractions that will allow you to create code that will work with either with no modifications. Do that, build your application, and let the pundits from either side debate this issue until they're blue in the face while you're actually creating useful things.

[+] jasonkester|17 years ago|reply

Your job as a developer is to pick a technology and build your application. Building/implementing an entire extra layer onto your project just so you don't have to make a choice is a bad move. The technical term, I believe, is Yak Shaving.

[+] kingkongrevenge|17 years ago|reply

> Use a good abstraction library and start coding. Seriously.

I'm pretty skeptical of this approach. Cross platform development is a pain in the ass, whether we're talking databases or operating systems. I say pick the most powerful database you can and marry it. Use its features to the fullest and ignore portability. I suggest db2 or oracle.

[+] aditya|17 years ago|reply

MySQL

Pros:

* Easier to replicate (and hence to scale for some people)

* Easier to find DBA's for

* More popular (twitter, sixapart, etc. use them)

Cons:

* MyISAM is slow as hell

* InnoDB is better but may have licensing issues

* Oracle owns them now, unclear whether it is for better or for worse

PgSQL:

* Better performance of DB engine

* Excellent query planner and analyzer

* Rock-solid in most aspects, with the core team dedicated to putting out a good product

Cons:

* Lack of a canonical, good, production-ready replication method (current choices: Slony, pgPool, Continuent, Londiste)

* Hard to find good DBA's

* Less popular to some extent

There you go. Maybe it boils down to performance (pgsql) vs replication (mysql), but that's just from what I know...

[+] brianm|17 years ago|reply

Was going to post a response to this, but you nailed it.

Anecdotally, we moved off postgres because we could not hire (full time, contact, or consult) anyone who would be considered an expert at postgres replication, and the cost/benefit of switching off compared to growing such an expert favored switching off.

Postgres is my go-to database where a regular backup, rather than warm/live replica, meets availability needs.

[+] apinstein|17 years ago|reply

I've been using postgres in production situations since 1998, at 4-5 different companies. I've never had a problem with it.

OTOH, the many times I've tried to use mysql I've run into gotchas even in development that were unexpected and obnoxious. Also, I don't really like the dual gpl/commercial license thing.

My faves about Postgres:

- Very mature OSS development community with amazing devs/community support

- Ridiculously stable

- BSD license

- I practically never run into a postgres bug.

- GIS support

- ltree data type (native trees with powerful query syntax)

- Clustered indexes

- Very fast under load

- Simple to manage

- Lots of tuning options

To sum up, I've used it forever and it's never caused me a problem. I am a postgres fanboy.

[+] artificer|17 years ago|reply

Since one can easily find comparisons, benchmarks, gotchas etc ( you can always start here: http://www.wikivs.com/wiki/MySQL_vs_PostgreSQL ) the one thing that made me decide in favor of PostgreSQL is this (quoting a blogger):

MySQL is an open-source PRODUCT (and can be bought as such). Postgres is an open-source PROJECT (and can't be bought, and will be around as long as there is a sizable developer pool).

http://www.jiaozhoujob.com/newsprint-995.html

EDIT: PostgreSQL also seems to take support very seriously. Their latest security advisory has a patch back to version 7.4, which was released in November 2003.

[+] jcoby|17 years ago|reply

PostgreSQL gets a lot of flack for not having replication. It's been my experience that it just doesn't matter in practice. Let me explain.

MySQL's performance sucks. It's great lightly loaded and with little concurrency. Add load and it noses over. (I've read that this is mostly because of the malloc they use. Regardless of the cause, it's a problem.)

I tested MySQL 5 vs PG 8.3 to handle session storage. This is about as simple of a test as possible. PostgreSQL outperformed MySQL by a significant margin across every load scenario, sometimes up to 2x the capabilities. PG's performance flatlined with load. MySQL flatlined and then hit a wall and dropped significantly as load and concurrency rose.

So the first thing someone does when MySQL noses over is to buy faster hardware. But that doesn't really help as much as it should. So they add replication to be able to distribute load across multiple servers. That's all fine and dandy until you realize that for every rep node you add, you are adding overhead for writes. And the headaches of keeping MySQL's brain dead replication in sync. Eventually you spend all of your time waiting for replication to finish or rebuilding a failed sync or chasing down ghosts in the system.

Back to postgres. Its performance scales nearly linearly with load and number of cores. Adding more hardware resources actually improves performance accordingly. So you end up being able to stay on one machine longer.

If you're truly going to have more load than one server can handle, break the data set up into smaller pieces (sharding).

Postgres is an awesome database. It behaves consistently and correctly. It has a lot of polish when you get into it. psql is a pleasure to use. It supports more of the SQL standard than MySQL. It's query analyzing tools are great. The only real pain point is that you have to do a dump/load to upgrade point releases (8.1 -> 8.2).

With all of that said, MySQL is great for smaller sites. It's available everywhere and every framework and language supports it. There are a million people who can administer it. Postgres is a lot harder to find competent DBAs.

PostgreSQL has been improving significantly throughout the 8.x series. Autovacuum has made life easier for everyone. Full text search is now built in. It now supports recursive queries and a bunch more.

[+] silentbicycle|17 years ago|reply

While this is somewhat more subjective, documentation and tutorials I've read for PostgreSQL consistently seem better informed about relational databases than MySQL's. I keep seeing cases where somebody is having performance issues with MySQL, and it turns out they neglected to do something basic like indexing their tables. This is probably partially due to its reputation of being good for the web, and thus getting a used by a lot of people who kinda-sorta know PHP but know nothing about the relational paradigm, but MySQL also has a history of ignoring fundamental relational concepts (http://www.reddit.com/r/programming/comments/764fp/mysql_vs_...), so it's not a coincidence.

Also, it irritates me when people write off relational databases because of a bad experience naively using MySQL, but that's neither here nor there.

You should probably program for a neutral wrapper and do actual testing to determine which will work better for you, though. I like PostgreSQL, and have had very good results with it, but I've never come close to needing the sort of dramatic scalability that people seem to worry so much about.

[+] olefoo|17 years ago|reply

> I've never come close to needing the sort of dramatic scalability that people seem to worry so much about.

s/worry/fantasize/

[+] cperciva|17 years ago|reply

If you asked me a year ago, I would have said "use whichever you're most familiar with". Right now, given the unclear situation with MySQL development (will the real MySQL please stand up?) I'd say to go with postgresql.

[+] astrodust|17 years ago|reply

MySQL has the advantage of being better supported by the developer community, but otherwise of being a technically inferior product to PgSQL.

If anything, the trouble with PgSQL is it's too feature rich to be easily understood by the typical developer. It's more DBA-grade than MySQL.

If you're building a real database driven application, though, having an engine like PgSQL is a real asset since it gives you a lot of capability within the database itself that MySQL does not easily provide.

That being said, MySQL is a fantastic engine for some applications that use it as a fancy file system. For example, message boards, blogs and chat rooms are often driven by MySQL because it's nearly foolproof to install and is mostly self-maintaining.

MyISAM is actually very fast database engine, but that comes at the price of forsaking transactions, foreign keys, and overall data integrity. For some applications this doesn't really matter and "good enough" will do.

While you can outgrow MySQL's feature set, it will take a long time to exceed the capabilities of Postgres. Most likely by the time you do, you can afford to hire a DBA so it won't be your problem.

[+] amalcon|17 years ago|reply

There used to be a tremendous gap between these two databases. MySQL had an incredible dearth of features (subselects, triggers, TRANSACTIONS, and others were missing). PostgreSQL had the features, but it was relatively unpopular, so the language bindings were generally immature and not well documented. Then there was Interbase/Firebird, which was even less popular but was supposed to have an even better feature set (I never used it).

These days, both have come a long way. MySQL has most of the necessary features, and PostgreSQL has perfectly good bindings in most popular languages. It doesn't matter so much which you use any more. The reputations remain, however, that MySQL is lacking critical features and that PostgreSQL is black magic.

The legal issues around PostgreSQL are probably a little more stable at the moment, given the Oracle question (it now owns the trademarks). If Oracle starts causing any serious problems for MySQL, you can bet "YourSQL" will appear very quickly (sort of like how Iceweasel did). I wouldn't worry too much about it. Pick one, and start coding.

[+] sanswork|17 years ago|reply

I use MySql for all my projects these days. In the past I've used both Sybase(when working at one of the big investment banks) and Informix(when working at one of the worlds leading customer tracking systems) and postgres for a number of small person projects.

My current project is a logging/reporting system which has between 50-75 million new rows per day amongst 2-3 tables in MySql with very few problems on a single Amazon EC2 instance(I've had to switch to partitioned tables for reporting recently though). My last employer did a tracking system with over 1 billion impressions a day on MySql with no problems(Multiple servers) Including near realtime reporting(delayed by about an hour).

Thats my plus side for MySql. I really have no negatives for Postgres but I haven't used it much so I can't honestly report from either side. Others I am sure will be able to add more from that side of the argument.

[+] bryanalves|17 years ago|reply

My opinion/sense is that Postgres is better in every way, except replication. Which ironically makes it far worse than mysql since you can't easily use it for huge datasets.

[+] donw|17 years ago|reply

Replication does nothing to enhance searches over large datasets, and in the case of master-slave replication, you don't even have a second server to query.

Not to mention, that MySQL replication is known to be horribly fragile, and silently corrupts data -- which is why you need separate tools (like Maatkit) to know when it's time to rebuild the slave.

[+] olefoo|17 years ago|reply

There are several replication solutions for PostgreSQL but they all require a level of sophistication and configuration from the administrator. One of the biggest issues I have seen in the field with setting up replication for postgres is that people design databases without primary keys and with duplicate rows. A DBA who understands the relational model wouldn't do that in the first place; and none of the common solutions for replicating a postgres database will let you copy a database with broken referential integrity.

It's a feature.

[+] emmett|17 years ago|reply

With the 3.0 release of londiste, that's no longer true as far as I can tell. Postgresql now has a great replication solution.

[+] modoc|17 years ago|reply

What are the issues you've seen with Slony-I and/or PGCluster?

[+] bradgessler|17 years ago|reply

If you're a Rails Developer; MySQL does NOT support transactional migrations, which means if something screws up during a deployment your database might be left in an inconsistent state.

If a migration screws up and you're running Postgres, the schema (and data) will just roll-back.

[+] unknown|17 years ago|reply

[deleted]

[+] madkangas|17 years ago|reply

This question is 11 hrs old, and nobody has mentioned Postgres' need for periodic "VACUUM" operations? Seriously?

If you are running a system with significant data churn (think: a Reuters news feed expiring at N days), Postgres is at a massive disadvantage if you're concerned with system throughput. Postgres 8.4's rewritten Free Space Map looks promising, but 8.4 was just announced in beta on 4/15/2009. In 8.3 or earlier, you can expect significant latency if "VACUUM" is going to touch any significant percentage of your table size.

Given sufficient data churn with a Postgres system in a processing pipeline, we can be talking about an hour or more for the "VACUUM" operation per evening. Yes, I speak from experience. The MySQL (5.0.45 InnoDB) systems downstream from these Postgres DBs have zero measurable latency due to space-recovery operations.

[+] modoc|17 years ago|reply

Another vote for Postgres. MyISAM is a joke for serious use, and InnoDB isn't better/faster than postgres in my experience.

Postgres has felt much more mature and more focused on delivering things you'd expect in a real database (FKs, etc..) for a longer time.

[+] jasonkester|17 years ago|reply

I've never used PostgreSQL, but have spent my fair share of time programming against MySQL. It never fails to amaze me how far behind the curve MySQL is, that it still has table corrupting bugs happening frequently in the wild, and how unconcerned its developer pool is about this state.

So from that background, if I were forced to choose between the two for a new project, I'd go with Postgre. I know nothing about it at all, except that it seems to have a good reputation, and it's not MySQL.

Disclaimer: I spend the bulk of my time in SQL Server, and I have trouble understanding why anybody would use anything else out of choice.

[+] Zarathu|17 years ago|reply

These days, it doesn't quite matter. Especially if you're using something like Rails, which can switch between the two very easily. Just modify your config if you later feel you want out.

[+] alanthonyc|17 years ago|reply

I'm glad you brought this up because I'm struggling with the same question. Since I'm developing in Django, I am currently abstracting out the db layer while I do the programming. However, I will eventually want to make a selection for when I launch the app and have been trying to decide between these two.

I'm currently leaning towards PostGres but am interested in hearing the pros and cons of both. The recent purchase of Sun by Oracle makes things a little more interesting.

[+] lacker|17 years ago|reply

For Django work I prefer postgres just because more Django projects use postgres, so when you have issues it's more likely that someone else on a message board has faced exactly the same thing. My projects haven't come near the limit of a single server, though.