top | item 5122299

MySQL - Do Not Pass This Way Again

224 points| craigkerstiens | 13 years ago |grimoire.ca | reply

156 comments

order
[+] viraptor|13 years ago|reply
I'm really not sure what to think of that article. On one hand side, I definitely agree with it and I've experienced many issues with MySQL.

On the other, there are so many... strange points, it's hard for me to trust the author about the parts that are new to me. Things I've found weird so far are:

- "my favourite example being a fat-fingered UPDATE query where a mistyped = (as -, off by a single key) caused 90% of the rows in the table to be affected," - if I ever run "rm -rf . /" fat-fingering the space, I'm going to blame myself only - not fileutils or bash - this has nothing to do with the database

- (about backups) "Unless you meticulously lock tables or make the database read-only for the duration," - this is not trivial, but logging onto slave and doing "FLUSH TABLES WITH READ LOCK, sync, snapshot, UNLOCK TABLES" is not rocket science either. And it's well documented on their "backup methods" page.

- "It's unrealistic to expect every single user to run SHOW CREATE TABLE before every single query, or to memorize the types of every column in your schema, though." - ... yeah... we shouldn't ask them to remember the syntax either - just keep guessing until you get everything right ;)

- "Foreign keys are ignored if you spell them certain, common, ways" - another case of "I want to use the wrong syntax, but still get the right answer"

I really wish he limited himself to hard facts - the main idea of the article wouldn't suffer at all. There are enough things to hate in MySQL without going into the subjective and "inconvenient, but still ok" parts.

[+] rosser|13 years ago|reply
"Foreign keys are ignored if you spell them certain, common, ways" - another case of "I want to use the wrong syntax, but still get the right answer"

Both forms of declaring foreign keys are SQL-standard compliant syntax. That MySQL silently ignores the "inline" version is the "wrong" here, not the syntax.

From MySQL's own docs:

"Furthermore, InnoDB does not recognize or support “inline REFERENCES specifications” (as defined in the SQL standard) where the references are defined as part of the column specification. InnoDB accepts REFERENCES clauses only when specified as part of a separate FOREIGN KEY specification. For other storage engines, MySQL Server parses and ignores foreign key specifications." [1]

[1] http://dev.mysql.com/doc/refman/5.5/en/create-table.html

[+] CaptainZapp|13 years ago|reply
"I'm going to blame myself only - not fileutils or bash - this has nothing to do with the database"

I disagree.

  UPDATE MyTab SET column = 'value' WHERE otherColumn = 1
is a valid SQL statement, while

   UPDATE MyTab SET column = 'value' WHERE otherColumn - 1
is not and should throw a syntax error and not perform a mass update due to some bullshit auto conversion.

What would rile me most is that even if you're a very careful person and do a

BEGIN TRAN

before going for the update (so you can ROLLBACK if in doubt), you may not notice that something went wrong and COMMIT anyway.

A database should always be fail safe; period.

[+] matwood|13 years ago|reply
"I want to use the wrong syntax, but still get the right answer"

The point is that wrong syntax should error instead of letting the user think it worked. Problems like this means the user has no idea if the FK was setup without testing it. IMHO, there is nothing okay with silently failing.

[+] jacques_chester|13 years ago|reply
As the nice man said:

> MySQL is riddled with these sorts of surprises, and apologists lean very heavily on the "that's documented" excuse for its bad behaviour.

For a database, defaulting to safe behaviour should be the default, not something you need to look up.

[+] epo|13 years ago|reply
The first thing you found wierd relies on selective quotation, you omitted "because of implicit string to integer conversion". The second is an apologist "but its well documented" defence. Your third point is changing the subject, he was criticising implicit type conversion, you again igore that. The fourth point is not about wanting to use the wrong syntax it is about MySQL acepting valid syntax and ignoring it.

So your whole post is nothing more than simple fanboy apologism.

[+] dangrossman|13 years ago|reply
I appreciate this well-argued piece of persuasive writing for not choosing MySQL, but the premise is surprising -- I don't recall seeing, on Hacker News or elsewhere, any writeups from companies that chose MySQL then ran into significant problems they had to architect around, nor writeups from companies that chose MySQL then had to rip everything out to switch to something in the same family of solutions (Postgres, Oracle, etc).
[+] zzzeek|13 years ago|reply
this is my favorite MySQL "decision", that the GROUP BY keyword by default (that is, unless you turn it off with the late-added magic flag ONLY_FULL_GROUP_BY) will gladly select an essentially "random" (well, the first row based on INSERT order, which in SQL is as good as random) row for you:

    mysql: create table data (token_a varchar(10), token_b varchar(10));
	Query OK, 0 rows affected (0.05 sec)

    mysql: insert into data (token_a, token_b) values ('A', 'A');
	Query OK, 1 row affected (0.01 sec)

    mysql: insert into data (token_a, token_b) values ('A', 'B');
	Query OK, 1 row affected (0.00 sec)

    mysql: insert into data (token_a, token_b) values ('B', 'B');
	Query OK, 1 row affected (0.00 sec)

    mysql: insert into data (token_a, token_b) values ('B', 'A');
	Query OK, 1 row affected (0.00 sec)

    mysql: select * from data group by token_a;
	+---------+---------+
	| token_a | token_b |
	+---------+---------+
	| A       | A       |
	| B       | B       |
	+---------+---------+
	2 rows in set (0.00 sec)
Note here the value we get for "token_b" is based on whether or not "A" or "B" were inserted first. The second "token_b" for each "token_a" (as well as any number of other rows that might follow it for that "token_a") is just discarded.

The scary thing is that I semi-regularly come across applications in Very Important Industries that have large amounts of SQL that rely upon this behavior of "picking any old row" for you, rather than selecting a MAX() or MIN() of some column and then joining to a subquery of the GROUP BY + aggregate....because joining to a subquery in MySQL also performs like crap.

[+] tveita|13 years ago|reply
I have encountered situations where this is convenient, and I have yet to see it cause any bugs or problems.

In the places I have seen it used, the 'arbitrary' column typically has the same value for the entire group, e.g. for efficiently selecting distinct texts based on their hash values.

PostgreSQL has a similar feature using SELECT DISTINCT ON: http://www.postgresql.org/docs/9.2/static/queries-select-lis...

[+] taligent|13 years ago|reply
You say ONLY_FULL_GROUP_BY was late added. But there are posts talking about this setting in 2002.
[+] sadmysqluser|13 years ago|reply
Ronald Bradford's http://www.slideshare.net/ronaldbradford/my-sql-idiosyncrasi... is worth reviewing for anyone who runs MySQL.

I especially like how he explains SQL_MODE bit by bit and ends up recommending

        SQL_MODE =
           ALLOW_INVALID_DATES, ANSI_QUOTES, ERROR_FOR_DIVISION_ZERO,
           HIGH_NOT_PRECEDENCE,IGNORE_SPACE,NO_AUTO_CREATE_USER,
           NO_AUTO_VALUE_ON_ZERO, NO_BACKSLASH_ESCAPES, NO_DIR_IN_CREATE,
           NO_ENGINE_SUBSTITUTION, NO_FIELD_OPTIONS,NO_KEY_OPTIONS,NO_TABLE_OPTIONS, 
           NO_UNSIGNED_SUBTRACTION,NO_ZERO_DATE, NO_ZERO_IN_DATE,
           ONLY_FULL_GROUP_BY, PAD_CHAR_TO_FULL_LENGTH (5.1.20), PIPES_AS_CONCAT,
           REAL_AS_FLOAT, STRICT_ALL_TABLES, STRICT_TRANS_TABLES
I also recommend "MySQL 5.1 vs. MySQL 5.5: Floats, Doubles, and Scientific Notation" http://blog.mozilla.org/it/2013/01/17/mysql-5-1-vs-mysql-5-5... for anyone who working with non-integer numerics.
[+] danso|13 years ago|reply
Ugh, maybe my Internet is slow, but I can't seem to get the page to load. Maybe there should be an unofficial rule that if you're going to write a worthwhile article about database and, I assume since I can't yet read it, performance, you should enable caching on your blog

Edit: here's the raw text version stored on github https://raw.github.com/ojacobson/grimoiredotca/master/wiki/m...

[+] 16s|13 years ago|reply
MySQL is the visual basic of SQL databases. Anyone can set one up and use it.

The problem is that many non-technical people use MySQL and then think they know all about DBs. Ask them what ACID is, or about foreign key constraints. You'll get blank stares. If you know what those things are and value them, you probably don't use MySQL.

[+] AaronBBrown|13 years ago|reply
Can you provide some citations to this claim? I know a lot of people who understand ACID, Foreign Keys, and have experience with other DBs (like PostgreSQL) and still often choose MySQL based on its merits.
[+] troels|13 years ago|reply
I've been using MySql for the most of my career, and I did learn about relational algebra, normal forms and whatnot back in school, thank you. It's my experience that most programmers I have worked with knows about this stuff too.

That aside, MySql does have transactions and it does have foreign key constraints. Innodb has been around for a long time, you know.

[+] MichaelGG|13 years ago|reply
As I've said before, MySQL's data handling is very much like VB's "On Error Resume Next".
[+] taligent|13 years ago|reply
I really don't know where to begin with such a stupid comment.

How about you start with listing which of these companies needs educating: http://www.mysql.com/customers/

Facebook ? Twitter ? Amazon ? Flickr ?

[+] Happymrdave|13 years ago|reply
I worked at one data-heavy startup where things were on MySQL and even with a lot of consulting by Percona, eventually it just couldn't keep up with our needs, and the project was ported to PostgreSQL. I've worked on other projects that were less data intensive and MySQL worked fine though.

If given a choice, I'll take PostgreSQL any day, but I do understand that people are hesitant to change database whey they don't need to. If you are encountering trouble though, by all means, move on.

[+] ybrs|13 years ago|reply
i read until the backup process and gave up reading through, for cold backups of online databases, you should use percona's extrabackup, its open source and free and works perfectly. http://www.percona.com/doc/percona-xtrabackup/ if you are using mysql a simple googling "hot backup mysql" will lead to that, i guess author didn't even bother to search.

besides, you need to use replication and connect a couple of slaves if you care about being online and backup from one of the slaves - which is a common practice for all databases not for mysql. if you are trying to dump from the master without slaves good luck with any database.

[+] zorlem|13 years ago|reply
XtraBackup has its limitations wrt to locking when you've got a mix between InnoDB and MyISAM tables.

The problem with taking a backup from a MySQL slave is that the data is not guaranteed to be identical to the master, thanks to the subtle problems and peculiarities of the MySQL master-slave replication (some of them are described in the OP). For precisely this reason I install an automated job that periodically checksums the tables and sends the results to the DBA role.

[+] mscarborough|13 years ago|reply
This person doesn't even offer a solution?

How is it that people who have blogs that take 30 seconds to load continue to give performance advice that gets upvoted?

The funny thing is that this blog's performance is based on some cookie. If I reload in Chrome? 2 sec. If I reload in "Incognito Chrome", it's again really slow.

So seriously, just stop with these authoritative blog posts when you don't even know what you're talking about.

[+] jacques_chester|13 years ago|reply
It's not performance advice, though.

It's a list of known problems with MySQL, most of which violate the "Principle of Least Surprise".

Some of the problems have performance implications. The query planner stuff, for example.

> The funny thing is that this blog's performance is based on some cookie. If I reload in Chrome? 2 sec ... just stop ... when you don't even know what you're talking about.

It may interest you to learn that web browsers have local caches. Incognito Mode does not have such a cache, and so must refetch pages from scratch on each view.

[+] fdr|13 years ago|reply
When criticizing a piece of software as popular as MySQL, offering a solution is an exercise in futility: it only leads people to think that the author is wrong because the alternative posited is wrong for them. See the endless MySQL vs. PostgreSQL discussions (which are not entirely without merit: the two system do have overlap in targeted users).

Instead, this author his simply focused on numerous detailed aspects of MySQL that he/she seems to think would be seen by most people as particularly insane, without need for comparison. I think there's a place for that.

[+] mdellabitta|13 years ago|reply
This is an ad hominem argument. His statements about MySQL should be judged on their own merits, regardless of the load times of his blog.
[+] lucian1900|13 years ago|reply
Given such a detailed article, the solution is obviously another DB. Postgres lacks most of the MySQL flaws described.
[+] saosebastiao|13 years ago|reply
It took less than a second for it to load on my browser. And why should he offer a solution? The point of his rant isn't that you should use a particular solution, but rather that you should avoid MySQL at all costs.
[+] elchief|13 years ago|reply
the solution is Postgres
[+] matt2000|13 years ago|reply
I'd be very interested to hear from people using other databases on whether their DB of choice is much better. I've been using MySQL for a while, have been burned by a few things, but figured it was mainly my fault. If indeed there are better options I'd love to hear the details.

(Just to be clear, I know about other databases, just aren't sure if any are that much better in real world use).

[+] sadmysqluser|13 years ago|reply
I've never recommended MySQL for anything where the data is important, but I've had to help several businesses deal with existing installations. Here's a few teachable moments:

One customer who ran MySQL on windows experienced data loss after a historic MySQL crash left them with a corrupted MyISAM table, which caused mysqldump to SILENTLY exit before the dump was complete. Their dumps only had the tables up to the corrupted one. They had been saving those dumps for months, but never testing them. When the server eventually gave up the ghost they found that none of their dumps were good. Lesson: 1. ALWAYS test your dumps. 2. NEVER trust MyISAM.

Another customer, smart enough to use InnoDB but not smart enough to change the MySQL defaults let one database get too large and found they couldn't shrink any of their databases because unless you use innodb_file_per_table InnoDB won't ever relinquish space even if you delete all the rows in your tables even after "optimize table". Lesson: 1. ALWAYS use innodb_file_per_table if you plan to keep the database for a long time. 2. Periodically run mysqlcheck optimize or optimize table to actually reclaim deleted space.

I've also helped customers rewrite their applications to avoid the "Total number of locks exceeds lock table size" problem. What is that? Well, even if you use database level locks the number of rows a transaction can alter is still limited by the server's memory buffer pool size. Lesson: 1. Know your working set. Make sure your application checks the server's buffer pool size via SHOW VARIABLES and is smart enough to avoid large transaction updates. 2. Use a large buffer pool size whenever you can.

Coming from a PostgreSQL background, these came as total surprises to me. I've never been completely happy with PostgreSQL's transaction id wraparound issues but I have yet to experience anything like the problems I've observed with MySQL.

[+] saurik|13 years ago|reply
Many of the complaints I hear people make about the entire concept of an "RDBMS" (often then to motivate why the NoSQL solution they decided to start using is better) are actually MySQL-specific issues that do not affect PostgreSQL (or Oracle, or usually SQL Server; I only mention PostgreSQL, as you wanted a concrete experience); one key example is "if you want to change your schema, it requires locking the entire system and rewriting the table" <- no, as the schema is just metadata; you should be able to do these things under first-class transactions, and PostgreSQL supports this just fine.
[+] contingencies|13 years ago|reply
Used MySQL for 10+ years. I found it great for most purposes. I don't feel I left MySQL because of MySQL's failings, but that of all monolithic RDBMS.
[+] dendory|13 years ago|reply
I started 7 years ago with SQLite and still have all my sites and webapps running on that. Works wonderfully for me. Meanwhile I must have seen close to a dozen data storage systems become popular then be replaced by the next big thing, from MySQL to NoSQL and everything in between.
[+] TazeTSchnitzel|13 years ago|reply
I love SQLite. It's simple, it doesn't make a fuss, and it does what you'd expect. And better yet, since it's a single file application database instead of a database running on a server, support for it is built in to Python, and you using it is as simple as an import statement and .connect() - no server to configure.
[+] brazzy|13 years ago|reply
MySQL became popular rather more than7 years ago, and still is.
[+] exabrial|13 years ago|reply
>>It's good enough. No it ain't. There are plenty of other equally-capable data storage systems that don't come with MySQL's huge raft of edge cases and quirks.

Actually, it is good enough. Good enough to powere billions of websites. Good enough not to pay for Oracle, DB2, or trying to cram some half-finished nosql mess in where a relational database works better.

MySQL isn't an end all, but please, don't pretend that NoSQL holds all the answers.

[+] pippy|13 years ago|reply
I've been hearing bad things about MySQL, so I've been avoiding it as of late.

So far my experience has been subpar.

PostgreSQL is pedantic with data insertion, almost to a fault. This costs me development time. (Also I have no idea what my users will do, and I'd rather have faulty data inserted than none at all. If it's for a client asking about a product, this could cost money). Yet purists claim this is a great feature. It's also about twice as slow as MySQL (admittedly this is likely due to the maturity of the environment I'm working in).

I personally like PostgreSQL. However I see it more as a guilty nerd pleasure rather than a development time effective solution.

MSSQL is very nice, and my experience has been the best. Microsoft's tools are top quality. You'll find yourself very productive; creating advanced SQL views, mirroring, and snapshots. However MSSQL reeks of vender lockin, I had to virtualise the MSSQL tools, and getting the drivers to work on Linux took almsot two days of googling. Despite ease of use, the vendor lockin doesn't make MSSQL worth it.

[+] sergiosgc|13 years ago|reply
I don't know where the "twice as slow" number for postgresql comes from. My the experience is different. Wildly different. For simple "keystore" loads, both are at the same performance class. For anything more complicated, typical relational database scenarios, mysql folds and is one or two orders of magnitude slower.

The query planner's inability to push restrictions down the tree into subqueries alone prevents many typical real world queries.

The way I see it, mysql is trapped between NoSQL and proper relational databases

[+] jeffdavis|13 years ago|reply
"PostgreSQL is pedantic with data insertion, almost to a fault."

The PostgreSQL philosophy isn't about being "pedantic", but it is very different from MySQL. I assume that you have much more experience with MySQL; maybe you are trying too hard to use postgres in the mysql way rather than the postgres way?

Personally, I don't think it's a good idea to migrate usually for this reason. The entire project development always has lots of built-in assumptions about how the DB will be used, so the new system is almost never quite the right fit. It can only work in trivial cases, or when you have the right expectations.

[+] sadmysqluser|13 years ago|reply
If you would rather have faulty data than none when something's wrong with your application, then MySQL is your DB of choice.
[+] wereHamster|13 years ago|reply
If you must use MySQL, at least switch to MariaDB. Those Oracle folks can not be trusted anymore, not even with a toaster.
[+] apapli|13 years ago|reply
I originally was using MySQL to learn with rails because it was so simple to set up on my Mac.

The only reason I migrated (quite early I may add) is that at the time Heroku pretty much mandated I move to postgres.

I'm glad I made the move, but I'd say awareness of the alternatives is the limiting factor. The brand awareness MySQL has is pretty big compared with many others. I wonder how much impact Heroku's decision to support postgres has helped those similar to myself drop mysql.

[+] nnnnni|13 years ago|reply
"I'm going to rant against MySQL, but I'm not going to suggest a better alternative."
[+] jmix|13 years ago|reply
In what entitled universe do you live in where a guy who carefully and patiently points out problems is also obligated to solve every single one of them?

Also, do you really need someone to spell out the alternatives to MySQL? There are too many to list.

[+] twerquie|13 years ago|reply
"Not MySQL" is good advice. Look into alternatives, there are many good ones.
[+] jacques_chester|13 years ago|reply
This is a great article.

But I really wish it had sources for each of the claims. I would be interested to read the relevant documentation, because some of these directly describe problems I've had with running a Wordpress installation.

And I've been blaming Wordpress for it. There's possibly a big mea culpa brewing; but I'd really like to look at the specifics.

[+] paul_f|13 years ago|reply
There is a bit of an elitist attitude in this point of view. Of course MySQL is not perfect. That's a strawman argument, nobody is claiming it is.

For 99+% of all applications that need a simple database, it is more than "good enough".

[+] redegg|13 years ago|reply
This looks similar to the MongoDB FUD from a year or two ago. Nevertheless, I don't like MySQL and prefer PostgreSQL for all my projects.