Ask HN: When to use MySQL vs PgSQL?
first of all thank you that you are reading this. Out of caution, please let me remind you that I don't want you to take this question and to start a "flame war". Arguments with facts would really be constructive.
I am sincerely interested if my current assumptions are true or where I am wrong.
Background:
My current employer asked me about MySQL and I mentioned those points below which were later countered with this:
http://pastie.org/456941
(Large) Parts of this gave me the classical "WTF" moment. So I thought I ask you about this.
Detailed arguments against MySQL from my side:
A long time ago I saw this nice PDF where there were some measurements between PgSQL and MySQL. http://people.freebsd.org/~kris/scaling/7.0%20and%20beyond.pdf Title: "FreeBSD 7.0 and Beyond" on Page 9 following there is a Case Study of MySQL vs. PgSQL.
This was for me the first reason to really take a closer look at MySQL.
Secondly I know that MySQL uses two different internal SQL engines. A X/Open XA distributed transaction processing (DTP) support; two phase commit as part of this, using Oracle's InnoDB engine
Full-text indexing and searching using MyISAM engine
Should Oracle at some point decide to revoke the usage rights of InnoDB for the open source community then users of MySQL would need to switch to MyISAM which is, frankly said, terribly slow.
http://en.wikipedia.org/wiki/MySQL#Criticism Regarding the licensing renewal which is necessary. Now even Oracle is planning to buy Sun, which would make me think hard why they would renew the license of InnoDB for MySQL anyway? (creating competition - "multi-year" extension of their licensing agreement, why would they continue?)
The next reason was that I have no theoretical, but empirical proof (I didn't enjoy debugging that) that I end up more often with table data corruption if I run MySQL. Ok, I can repair it, but why should that take up more of my time away doing other more productive things? (It is even mentioned in the Criticism point of the wikipedia link above).
Also critical bugs get fixed slowly. E.g. It took them from 2003-2008 to fix this bug. http://bugs.mysql.com/bug.php?id=989
Dear HN, please help me with this question. Again, thank you for your time.
[+] [-] mdasen|17 years ago|reply
PostgreSQL tends to be what is favored by people here. I used to use it for everything until I tried to set up a replicated cluster with it. PostgreSQL's replication is severely lacking. Slony-I, the most mature of the bunch, is very difficult to set up (with many steps needed for every single table) and a lot slower than MySQL's replication (due to its use of SQL and triggers rather than binary log shipping as well as the fact that its design causes communication costs to grow quadratically).
However, PostgreSQL's query planner is a lot better than MySQL's (especially if you're doing something like subqueries) and I find that complex queries run decently faster. The community process is also a lot more attractive.
To address the Oracle question:
Oracle cannot withdraw InnoDB. It's GPL licensed. Anyone can fork it should Oracle decide they don't want to play ball - just as anyone can fork MySQL. In fact, there are already forks underway including Drizzle. So, that isn't a big deal. The bigger deal would be if Oracle decided to halt future development. I'm guessing they won't since it would simply mean they would loose control to the community that still has rights to the GPL'd code.
In the end, don't worry about this issue. It distracts from what is really important: actually creating something. They both work fine. There are plenty of abstractions that will allow you to create code that will work with either with no modifications. Do that, build your application, and let the pundits from either side debate this issue until they're blue in the face while you're actually creating useful things.
[+] [-] jasonkester|17 years ago|reply
[+] [-] kingkongrevenge|17 years ago|reply
I'm pretty skeptical of this approach. Cross platform development is a pain in the ass, whether we're talking databases or operating systems. I say pick the most powerful database you can and marry it. Use its features to the fullest and ignore portability. I suggest db2 or oracle.
[+] [-] aditya|17 years ago|reply
Pros:
* Easier to replicate (and hence to scale for some people)
* Easier to find DBA's for
* More popular (twitter, sixapart, etc. use them)
Cons:
* MyISAM is slow as hell
* InnoDB is better but may have licensing issues
* Oracle owns them now, unclear whether it is for better or for worse
PgSQL:
* Better performance of DB engine
* Excellent query planner and analyzer
* Rock-solid in most aspects, with the core team dedicated to putting out a good product
Cons:
* Lack of a canonical, good, production-ready replication method (current choices: Slony, pgPool, Continuent, Londiste)
* Hard to find good DBA's
* Less popular to some extent
There you go. Maybe it boils down to performance (pgsql) vs replication (mysql), but that's just from what I know...
[+] [-] brianm|17 years ago|reply
Anecdotally, we moved off postgres because we could not hire (full time, contact, or consult) anyone who would be considered an expert at postgres replication, and the cost/benefit of switching off compared to growing such an expert favored switching off.
Postgres is my go-to database where a regular backup, rather than warm/live replica, meets availability needs.
[+] [-] apinstein|17 years ago|reply
OTOH, the many times I've tried to use mysql I've run into gotchas even in development that were unexpected and obnoxious. Also, I don't really like the dual gpl/commercial license thing.
My faves about Postgres:
- Very mature OSS development community with amazing devs/community support
- Ridiculously stable
- BSD license
- I practically never run into a postgres bug.
- GIS support
- ltree data type (native trees with powerful query syntax)
- Clustered indexes
- Very fast under load
- Simple to manage
- Lots of tuning options
To sum up, I've used it forever and it's never caused me a problem. I am a postgres fanboy.
[+] [-] artificer|17 years ago|reply
MySQL is an open-source PRODUCT (and can be bought as such). Postgres is an open-source PROJECT (and can't be bought, and will be around as long as there is a sizable developer pool).
http://www.jiaozhoujob.com/newsprint-995.html
EDIT: PostgreSQL also seems to take support very seriously. Their latest security advisory has a patch back to version 7.4, which was released in November 2003.
[+] [-] jcoby|17 years ago|reply
MySQL's performance sucks. It's great lightly loaded and with little concurrency. Add load and it noses over. (I've read that this is mostly because of the malloc they use. Regardless of the cause, it's a problem.)
I tested MySQL 5 vs PG 8.3 to handle session storage. This is about as simple of a test as possible. PostgreSQL outperformed MySQL by a significant margin across every load scenario, sometimes up to 2x the capabilities. PG's performance flatlined with load. MySQL flatlined and then hit a wall and dropped significantly as load and concurrency rose.
So the first thing someone does when MySQL noses over is to buy faster hardware. But that doesn't really help as much as it should. So they add replication to be able to distribute load across multiple servers. That's all fine and dandy until you realize that for every rep node you add, you are adding overhead for writes. And the headaches of keeping MySQL's brain dead replication in sync. Eventually you spend all of your time waiting for replication to finish or rebuilding a failed sync or chasing down ghosts in the system.
Back to postgres. Its performance scales nearly linearly with load and number of cores. Adding more hardware resources actually improves performance accordingly. So you end up being able to stay on one machine longer.
If you're truly going to have more load than one server can handle, break the data set up into smaller pieces (sharding).
Postgres is an awesome database. It behaves consistently and correctly. It has a lot of polish when you get into it. psql is a pleasure to use. It supports more of the SQL standard than MySQL. It's query analyzing tools are great. The only real pain point is that you have to do a dump/load to upgrade point releases (8.1 -> 8.2).
With all of that said, MySQL is great for smaller sites. It's available everywhere and every framework and language supports it. There are a million people who can administer it. Postgres is a lot harder to find competent DBAs.
PostgreSQL has been improving significantly throughout the 8.x series. Autovacuum has made life easier for everyone. Full text search is now built in. It now supports recursive queries and a bunch more.
[+] [-] silentbicycle|17 years ago|reply
Also, it irritates me when people write off relational databases because of a bad experience naively using MySQL, but that's neither here nor there.
You should probably program for a neutral wrapper and do actual testing to determine which will work better for you, though. I like PostgreSQL, and have had very good results with it, but I've never come close to needing the sort of dramatic scalability that people seem to worry so much about.
[+] [-] olefoo|17 years ago|reply
s/worry/fantasize/
[+] [-] cperciva|17 years ago|reply
[+] [-] astrodust|17 years ago|reply
If anything, the trouble with PgSQL is it's too feature rich to be easily understood by the typical developer. It's more DBA-grade than MySQL.
If you're building a real database driven application, though, having an engine like PgSQL is a real asset since it gives you a lot of capability within the database itself that MySQL does not easily provide.
That being said, MySQL is a fantastic engine for some applications that use it as a fancy file system. For example, message boards, blogs and chat rooms are often driven by MySQL because it's nearly foolproof to install and is mostly self-maintaining.
MyISAM is actually very fast database engine, but that comes at the price of forsaking transactions, foreign keys, and overall data integrity. For some applications this doesn't really matter and "good enough" will do.
While you can outgrow MySQL's feature set, it will take a long time to exceed the capabilities of Postgres. Most likely by the time you do, you can afford to hire a DBA so it won't be your problem.
[+] [-] amalcon|17 years ago|reply
These days, both have come a long way. MySQL has most of the necessary features, and PostgreSQL has perfectly good bindings in most popular languages. It doesn't matter so much which you use any more. The reputations remain, however, that MySQL is lacking critical features and that PostgreSQL is black magic.
The legal issues around PostgreSQL are probably a little more stable at the moment, given the Oracle question (it now owns the trademarks). If Oracle starts causing any serious problems for MySQL, you can bet "YourSQL" will appear very quickly (sort of like how Iceweasel did). I wouldn't worry too much about it. Pick one, and start coding.
[+] [-] sanswork|17 years ago|reply
My current project is a logging/reporting system which has between 50-75 million new rows per day amongst 2-3 tables in MySql with very few problems on a single Amazon EC2 instance(I've had to switch to partitioned tables for reporting recently though). My last employer did a tracking system with over 1 billion impressions a day on MySql with no problems(Multiple servers) Including near realtime reporting(delayed by about an hour).
Thats my plus side for MySql. I really have no negatives for Postgres but I haven't used it much so I can't honestly report from either side. Others I am sure will be able to add more from that side of the argument.
[+] [-] bryanalves|17 years ago|reply
[+] [-] donw|17 years ago|reply
Not to mention, that MySQL replication is known to be horribly fragile, and silently corrupts data -- which is why you need separate tools (like Maatkit) to know when it's time to rebuild the slave.
[+] [-] olefoo|17 years ago|reply
It's a feature.
[+] [-] emmett|17 years ago|reply
[+] [-] modoc|17 years ago|reply
[+] [-] bradgessler|17 years ago|reply
If a migration screws up and you're running Postgres, the schema (and data) will just roll-back.
[+] [-] unknown|17 years ago|reply
[deleted]
[+] [-] madkangas|17 years ago|reply
If you are running a system with significant data churn (think: a Reuters news feed expiring at N days), Postgres is at a massive disadvantage if you're concerned with system throughput. Postgres 8.4's rewritten Free Space Map looks promising, but 8.4 was just announced in beta on 4/15/2009. In 8.3 or earlier, you can expect significant latency if "VACUUM" is going to touch any significant percentage of your table size.
Given sufficient data churn with a Postgres system in a processing pipeline, we can be talking about an hour or more for the "VACUUM" operation per evening. Yes, I speak from experience. The MySQL (5.0.45 InnoDB) systems downstream from these Postgres DBs have zero measurable latency due to space-recovery operations.
[+] [-] modoc|17 years ago|reply
Postgres has felt much more mature and more focused on delivering things you'd expect in a real database (FKs, etc..) for a longer time.
[+] [-] jasonkester|17 years ago|reply
So from that background, if I were forced to choose between the two for a new project, I'd go with Postgre. I know nothing about it at all, except that it seems to have a good reputation, and it's not MySQL.
Disclaimer: I spend the bulk of my time in SQL Server, and I have trouble understanding why anybody would use anything else out of choice.
[+] [-] Zarathu|17 years ago|reply
[+] [-] alanthonyc|17 years ago|reply
I'm currently leaning towards PostGres but am interested in hearing the pros and cons of both. The recent purchase of Sun by Oracle makes things a little more interesting.
[+] [-] lacker|17 years ago|reply