top | item 20342060

Why we're ending support for MySQL

397 points| bjoko | 6 years ago |about.gitlab.com

218 comments

order
[+] shanemhansen|6 years ago|reply
MySQL is a decent nosql option with mature replication. If you care about your data, expect MySQL to be your foe rather than your friend.

Do you expect your triggers to fire? You are SOL https://bugs.mysql.com/bug.php?id=11472

Do you expect to be able to insert utf8 into utf8 columns? Don't be naive. Read the docs and you'll discover utf8mb4 or as I like to refer to it: mysql_real_utf8.

How about transactions? Surely a real SQL database supports transactions right? Or at the very least if it doesn't it would complain loudly when you use a transaction where not supported right? Once again this behavior is helpfully documented https://dev.mysql.com/doc/refman/8.0/en/commit.html

Do you expect to be able to put a value into MySQL and get that same value back out? You are SOL. But it's documented.

I can honestly say that the only appropriate time to use MySQL is when you can't use postgres and you are willing to move from a RDBMS to something that requires significantly more work to prevent data corruption.

I respect the hell out of MySQL engineers in terms of raw performance and the engineering of Innodb. I'm sure they aren't happy there's a million broken applications relying on defaults they can't change. I'm sure they aren't happy that fundamental limits in how pluggable storage engines work make ACID transactions not possible in the general case.

Postgres, for all of it's issues, feels like a database that has your back.

[+] dano|6 years ago|reply
I moved away from MySQL in 2004 to Postgres and never looked back. Thanks for articulating these things once again for the community.
[+] stingraycharles|6 years ago|reply
This seems overly negative and a lot of FUD. Some of these issues are minor, and/or when aware of it are easy to work around. MySQL is much more than a "nosql with mature replication".
[+] tracker1|6 years ago|reply
Changing the defaults across a major release is something they've had several opportunities to do, time and again they have kept a LOT of weird behaviors.
[+] BorRagnarok|6 years ago|reply
If it would only run flawlessly on Windows, I would gladly run Postgres. It doesn't though, so I chose MySQL for my project. This was two years ago however, things might have changed.

*edit: I recognize a lot of what you mentioned though, and MySQL sure has its draw backs. Still it seems to work fine for a lot of people.

[+] nqzero|6 years ago|reply
i'm looking for a test suite that is used to verify query isolation and correctness at high levels of concurrency. ie do you know of something that you could run against postgres to quantitatively say that it is "a database that has your back", and then run against mysql and see that it fails ?
[+] brighter2morrow|6 years ago|reply
O mainly use MySQL because of phpMyAdmin. Is there anything comparable for Postgres?
[+] msla|6 years ago|reply
Good to know they still support MariaDB, then.
[+] deftnerd|6 years ago|reply
I mean, I'm not too upset that they're focusing on one DB engine, but their reasons are a bit facetious.

> There are lots of great use cases for MySQL, our specific needs just didn't seem to be a good fit. Our use of MySQL had a number of limitations. In most cases, it wasn't as simple as adding support to MySQL, but that by bending MySQL we typically broke PostgreSQL. To name a few limitations:

> We can't support nested groups with MySQL in a performant way

All they had to do was implement a nested set pattern for their groups [1]

> We have to use hacks to increase limits on columns and this can lead to MySQL refusing to store data

A hack? Their DB creation schema specified a TEXT column when it should have been a LONGTEXT column. Using LONGTEXT is not a hack, it's a choice when your data is more than 65535 characters, and they made the wrong choice out of ignorance.

> MySQL can't add TEXT type column without length specified

That's just incorrect. What they MEANT to say is that they had a column to store filenames that was a varchar(255) column and people were running out of space with long directory paths and filenames. They could have moved to a TEXT column, but didn't because they thought it couldn't be indexed without specifying a length... But they were wrong, you CAN index a TEXT column without specifying the TEXT column length, you just have to specify the length of the substring you want to index.

Alternatively, since this is filepaths and filenames, they could have used a nested set pattern again and gotten 255 characters for each component of the path and a lot more feature options for their search system!

> MySQL doesn't support partial indexes

This is true, but is it really a show stopper?

> As a side note – we don't use MySQL ourselves

I think this is the real reason. They just didn't have the necessary talent to implement the features correctly. Wrong schema specifications and not knowing to implement nested set patterns is a sign that they don't have a knowledgeable DBA on staff.

[1] https://en.wikipedia.org/wiki/Nested_set_model

[+] koolba|6 years ago|reply
>> MySQL doesn't support partial indexes

> This is true, but is it really a show stopper?

I can’t comment on their data model but in my own experience partial indexes are an invaluable feature. In particular partial unique indexes as it both expands the guarantees of your data model that can be enforced by the database while taking up (what could be) considerably less space.

>> As a side note – we don't use MySQL ourselves

> I think this is the real reason. They just didn't have the necessary talent to implement the features correctly. Wrong schema specifications and not knowing to implement nested set patterns is a sign that they don't have a knowledgeable DBA on staff.

This must be the big one.

Given enough experts and a drive to make it work, technology finds a way.

Given no experts and not enough users to complain or care about breaking compatibility, technology finds the door.

[+] ken|6 years ago|reply
> All they had to do was implement a nested set pattern for their groups

I've been told a million times that SQL is declarative, meaning you say what you want, not how to get it.

I've also been told a million times that I need to write my queries in a specific way, or use a specific data structure, or add hints for the query optimizer. Otherwise it'll pick the wrong "how" for your "what".

Is there any practical difference between a declarative language with a Sufficiently Dumb Compiler, and an imperative language with weak features and an awkward syntax? However I'm forced to write it, we all know I'm really trying to do is get it to LOOP first over these items here, and not those there.

What's the point of being declarative if common tasks require us to bend over backwards to design our schema/queries/indices/hints in exactly the right way, in order for it to be performant on two popular SQL databases (when that's even possible)?

Even if the vaguely-English-like syntax that's completely unlike any other computer language weren't problematic for those reasons, it seems that the lack of abstraction is a complete buzzkill. These two databases require different implementations for "nested groups", but there's no (remotely portable) way to define a CREATED NESTEDGROUP to allow for a similar interface.

Of all the languages I have to use, SQL are the ones I hate most. From decades of writing SQL, I can say GitLab got one thing absolutely right: it's easiest to just treat it as a family of incompatible proprietary languages. It's easier for me to target JS and the JVM from the same program than two different SQL databases from the same queries.

[+] dahart|6 years ago|reply
If the real reason is they don’t want it, who cares what the other reasons are?

In your critique of their argument, I don’t see a single reason why they should use MySQL. So why should they continue to use MySQL? Why should they solve nested groups and long text columns? What does that buy them?

> All they had to do was implement a nested set pattern for their groups [1]

I admit I have a sensitivity to knee-jerk “can’t you just” comments, but reading the first paragraph of your link, this doesn’t sound like an automatic win. Is it possible that they tried this and found out that it actually couldn’t be supported in a performant way? (Edit: turns out they did exactly that https://news.ycombinator.com/item?id=20344575) Either way, it sounds tricky... so back to the real question... why bother?

“Updating requires renumbering and is therefore expensive. Refinements that use rational numbers instead of integers can avoid renumbering, and so are faster to update, although much more complicated.”

[+] necovek|6 years ago|reply
>> MySQL doesn't support partial indexes

> This is true, but is it really a show stopper?

They are invaluable for performance as well (somebody mentioned unique partial indexes as constraints). They generally let you avoid partitioning your data (eg. partioned tables) and getting the same performance benefits by simply having conditional indexes.

They are sometimes the difference between a sequential scan of a 150M row table and an index lookup in <10ms (especially if the index is in the memory cache).

Once you try them and get the benefit, you can't go back to not having them.

[+] jrochkind1|6 years ago|reply
Eh, maybe all of those things CAN be done in MySQL, but they are done differently. I think it's totally legit that it takes a lot more development hours to support two DBs than one.

And they said this clearly, they're not hiding that this is the motivation.

Creating the abstraction architectures to support more than one "thing" tends to more than double development effort to support two "things". (Although then lets you add additional more-than-two "things" with less incremental cost. But there's a lot of cost to more than one thing in the first place).

[+] YorickPeterse|6 years ago|reply

    > All they had to do was implement a nested set pattern for their groups
The nested set pattern was considered at the time we added support for nested groups, or improved it with CTEs (not sure which one of the two it was). The biggest drawback of nested sets is that adding sub groups can now become expensive. The storage needs are also far from ideal.

Using PostgreSQL CTEs allowed us to work around all of this, at the cost of not supporting MySQL. This seems like a fairly reasonable trade-off, but I might be biased as I implemented it [1].

    > A hack? Their DB creation schema specified a TEXT column when it should
    > have been a LONGTEXT column. Using LONGTEXT is not a hack, it's a choice
    > when your data is more than 65535 characters, and they made the wrong
    > choice out of ignorance.
It's not ignorance, it's MySQL coming up with bizarre limits for the "TEXT" type. In MySQL, the limit for TEXT is 64 KB. In PostgreSQL, IIRC it is 1 GB. Looking back there may have been better decisions, but it's always easy to judge in hindsight.

More importantly, moving away from MySQL allows us to stop worrying about this at all.

    > Alternatively, since this is filepaths and filenames, they could have used
    > a nested set pattern again and gotten 255 characters for each component of
    > the path and a lot more feature options for their search system!
At the cost of requiring more storage space, and writes taking (potentially) much longer. You may want to mention that, instead of acting as if nested sets are a silver bullet.

    > This is true, but is it really a show stopper?
Yes.

    > Wrong schema specifications and not knowing to implement nested set
    > patterns is a sign that they don't have a knowledgeable DBA on staff.
You may want to do some more research before going down the path of suggesting GitLab employees lack knowledge. For the last two years or so we've had various engineers with excellent database knowledge working on GitLab, myself included (though I don't consider myself a PostgreSQL expert).

Some of the weirder decisions were made before the right people were hired, and often these decisions are difficult to improve upon. Sometimes removing support for something is a much more efficient way of spending your time. Removing MySQL support in GitLab is one such case.

[1]: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/10885

[+] noneeeed|6 years ago|reply
All their more general points in the post are perfectly valid arguments for focusing on a single database that I've heard from other projects, but naming specific technical issues like that actually makes it seam more nit-picky and less valid.

I'm not sure many people would have found anything to argue with them about if they had simply said "Tagetting multiple DBs requires more resources to develop and test, limits our ability to make the most of what Postgres offers, and increases the cost of support. The proportion of users using MySQL is decreasing, so we have decided to focus on Postgres to give the maximum benefit to the largest group. We understand this may cause issues for some users so here is a comprehensive migration guide/toolset".

[+] rhacker|6 years ago|reply
I remember like 10 years ago it was all the rage to support 45 different databases. I'm so glad that time has passed. It wasn't feasible. Yes you can orm the crap out of things so you never see a line of SQL but you end up with index tuning on multiple database platforms. I noticed the trend thankfully just sort of walked itself away about the time non-traditional databases entered the picture and well it was pretty much impossible to support Mongo and Mysql (reasonably). I'm assuming more things are going back to Mysql/Postgres/Sybase/DB2 again, but I hope more companies and OSS projects stick to just one this time?

The bottom line argument is this: When you make the decision to support multiple databases think of the 10-20 years of changes your product will exist through (and possibly a lot longer) and realize how much extra work that is going to be even if you use ORM. Then if you come to the decision Gitlab is - are you going to want to write the porting software for your users or just let them be shit-out-of-luck?

[+] mutt2016|6 years ago|reply
Unpopular opinion, and anecdotal data point: I stopped using Solaris and MySQL when oracle took ownership.

Linux and postgres is my new standard.

[+] petepete|6 years ago|reply
I'm not sure that's an unpopular opinion in these parts.
[+] yjftsjthsd-h|6 years ago|reply
... where is that an unpopular opinion? Really crusty enterprise?
[+] mmphosis|6 years ago|reply
Sadly, Oracle Java could also be in that blacklist. I've tried going through the hoops of downloading Oracle's "community" versions of MySQL in the past -- Java 6 for Android development IIRC. Today, I appear to have openjdk version "1.8.0_212", is this a fork?
[+] Can_Not|6 years ago|reply
That's not an unpopular opinion, that's a best practice.
[+] lifeisstillgood|6 years ago|reply
They just decided to use boring technology (http://boringtechnology.club/). Given they have limited resource spreading it across two databases seems a miss, and taking that time and attention and putting it back into git-ty stuff looks like a win.
[+] mattbessey|6 years ago|reply
Does this really count as using boring technology? they're choosing the younger of the two DBs and to use some of the latest features e.g lateral joins
[+] sytse|6 years ago|reply
I love that website! I suggested we link it from the GitLab boring solutions sub-value.
[+] nickcw|6 years ago|reply
> MySQL can't add TEXT type column without length specified

Mysql does have a TEXT type with a max limit of 65536 chars (you can have MEDIUMTEXT or LONGTEXT if you want more).

You don't supply a length when you use TEXT

https://dev.mysql.com/doc/refman/8.0/en/blob.html

[+] pizza234|6 years ago|reply
Their statement is somewhat ambiguous. They may have tried to express that "you can't add a column of the TEXT class without specifying the exact type".

So they could be "right", but expressed the concept in an imprecise form.

In a way, I think this shows their lack of expertise with MySQL - I reckon that somebody with a long experience would have phrased the concept more precisely.

[+] aeyes|6 years ago|reply
Why is such a major change implemented in a minor version?
[+] silverwind|6 years ago|reply
Probably because marketing drives their versioning (e.g. major versions are reserved to shiny new features), not semver.
[+] sytse|6 years ago|reply
It was announced in the major version of 12.0

I think we wanted to make it easy to upgrade to 12 before making the switch.

[+] ShakataGaNai|6 years ago|reply
Good for them. Both MySQL and PostgreSQL are Good^. But supporting both doesn't work and it totally makes sense to focus on one. Personally, I don't generally use PGSQL but my GitLab instance does. Why? It's easy to run a dedicated PGSQL instance for GitLab using Docker. Or... Using AWS RDS. Or... Any number of ways.

(^ Good is relative. People have their opinions on which is subjectively better but at the end of the day, lots of big sites and applications use both options to great success.)

[+] xenator|6 years ago|reply
I think here can be a good place to share my story why I never use MySQL in my projects. Once I was working for the company where I was an architect for some new project. And we have few gigabytes of data from a few sources. But with plans to gather much more in the near future. I came up with some database structure for the first draft of the early stage of development. It was nether big nether complicated for my scale comparing to the other projects I was working before. Since I'm python expert I choose SQLAlchemy for ORM layer and use its features to create the initial database schema. Postgres was a natural choice for me and almost fit to my requirements. I was worried about performance in the future but decided that we always can optimize when we have something. As I said it was early stage and we were on the research stage.

Everything was smooth development-wise but the database was pretty slow since we most use desktop-grade computers for prototyping. 3 weeks before the first internal release owners ask me to change the database to MySQL. I was against this step, we don't have time or resources for the experiments. But point was that company have MySQL experts that have more than 10 years of the optimization experience. I simply have no choice. They force me to do this against my will.

Since I wasn't sure that in production we will use Postgres I don't allow to use any special database dialect specific capabilities for the developers. And switching to another backend from Postgres to the MySQL was literally equal to a connection string change.

MySQL wasn't ready to handle 40-60 tables and queries with 5-10 simple joins. On our data, every request just hang server (they quickly provided a server with a lot of memory and storage for this). A few days later we found that actually MySQL is working but the same simple queries run for 4 or more hours. So-called "experts with 10 years of optimization" spent a week trying to fix indexes and other things but it never happened.

After release, I left the company, because of the toxic atmosphere, but it is another story. But since that, I lost all my faith in MySQL. Maybe for others, it is an option but never for me.

[+] avitzurel|6 years ago|reply
IMHO, this comes down to one thing. They lack the talent (and/or the will/resources to recruit it) to operate MySQL.

I can TOTALLY get the reasoning behind supporting only a single engine but the reasoning they write there are either wrong or misguiding.

[+] avitzurel|6 years ago|reply
I realize my comment hit a sensitive spot but it's really not meant to be negative.

It is 100% legit to give that up, it is 100% legit to go with the clear winner in your mind (it is the same in my mind too), but the technical reasoning is weak IMHO.

if you wanted to support both, you can, but you clearly don't (and that's ok too).

[+] just_myles|6 years ago|reply
I have used postgres for so long that a lot of the baked in functions and features that were super convenient for me.

I recall wanting to change a data type in my query and having to actually use the cast function. Also it was mentioned but when I was using it CTEs were not implemented yet. However, they weren't with postgres until 9.x as I recall either.

Regarding performance, I mean, that is as much design as it is rdbms choice. Also i'm no DBA.

[+] js4ever|6 years ago|reply
Just look how slow gitlab is on a 8cpu with 32gb or ram instance ... It's crazy ! I'm not sure about which part is because of Rails or the way gitlab is implemented. Issue is not about MySQL but more about the lack of knowledge / talent at gitlab
[+] msla|6 years ago|reply
OK, but do they support MariaDB?