Ask HN: How do you organize and manage database migrations?
81 points| anonfunction | 6 years ago
We've been using migrations and doing it all manually but it's become a bottleneck and a little bit of a nightmare with multiple consumers of the database needing to make schema changes.
Another concern is multiple environments, from local development to staging and production. We're using docker-compose for local development which runs the entire "full" schema and then I'm manually applying the migration files to staging and production before we deploy.
I've looked at some projects like flywaydb[1] and liquibase[2] but both are not completely free and seem proprietary. Does anyone know of another open source system that could help manage database schema versioning and migrations?
Thanks so much HN, this is something that I have been struggling with.
1: https://flywaydb.org/
2: https://www.liquibase.org/
[+] [-] t0mbstone|6 years ago|reply
I have literally spun up a barebones Rails app just to manage my schema and migrations for a completely separate Python/PostgreSQL project before.
https://guides.rubyonrails.org/v5.2/active_record_migrations...
[+] [-] mkolodny|6 years ago|reply
[0] https://docs.djangoproject.com/en/2.2/topics/migrations/
[+] [-] diminish|6 years ago|reply
Large open source projects such as GitLab source code are also treasures to discover migration rules.
If you have a millions of users with high traffic tables you may want to check some migration guides for downtime, data migrations and batch processing here https://docs.gitlab.com/ee/development/migration_style_guide...
If you want to have some concurrent ways of DDL operations you may benefit from https://gitlab.com/gitlab-org/gitlab/blob/master/lib%2Fgitla...
[+] [-] tbrock|6 years ago|reply
Why do you think no other toolchain has come close to this for other languages or frameworks? It’s a very common problems that needs a solution no matter what you are building and what you are building it on.
[+] [-] NicoJuicy|6 years ago|reply
- Database-First
- Code-First
[+] [-] sumosudo|6 years ago|reply
https://pypi.org/project/yoyo-migrations/
[+] [-] evanelias|6 years ago|reply
Instead of writing migrations, developers simply add / remove / modify these CREATE statements, going through the same pull request and code review process as for code. The tool knows how to diff the desired state (expressed in the repo) vs the actual state of any database environment (prod / stage / dev, etc) to generate the appropriate DDL.
It's a bit of a paradigm shift relative to traditional mysql and postgres migration tools, but it's an approach that has been used successfully by Facebook internally for nearly a decade. It's also a common approach in the SQL Server world. I've written a blog post [2] describing some of the advantages of the declarative approach.
Skeema currently only supports MySQL and MariaDB, but some declarative tools for Postgres include sqldef [3] and migra [4].
[1] https://www.skeema.io
[2] https://www.skeema.io/blog/2019/01/18/declarative/
[3] https://github.com/k0kubun/sqldef/
[4] https://github.com/djrobstep/migra
[+] [-] lstamour|6 years ago|reply
Interesting ideas though. Microsoft does something similar with a DACPAC where they record the schema and diff, but they also record a list of rename actions etc. and keep a table to track those “migrations” in addition to the diff process. https://docs.microsoft.com/en-us/sql/relational-databases/da... and/or https://docs.microsoft.com/en-us/sql/ssdt/extract-publish-an... — for rename and other SSDT details, see https://docs.google.com/presentation/d/1DvC2gzCucjHFbGiBLa0R... (it’s a “RefactorLog” if searching)
Having seen all that, the Rails ActiveRecord approach strikes a decent balance between specify migrations as imperative actions and keeping a declarative schema checked in. A comparison between the two approaches is at https://blog.raph.ws/2019/01/migrations-vs-dacpacs/ but I find DACPAC to be useful but over-complicated for developers and not well-enough supported on non-windows hosts yet. (It basically requires VS for Windows or SSDT right now...)
Also, it’s likely a migration system needs some kind of code review checks and automation around which actions you’d allow in your migrations if you don’t have that already...
The next question after “how do you migrate?” is probably “how do you backup and how long would it take to restore after a bad migration and/or bad rollback?” These days your answer is probably either “no time at all” for smaller apps with SSDs or “it’s all outsourced to the cloud,” for the new cloud distributed data stores or for the smart DIY ones, “we only deploy one server at a time and can either afford the downtime or have copies in multiple AZs, perhaps eventual consistency...”
[+] [-] ggregoire|6 years ago|reply
* We have migrations written in SQL files named like 45_add_column_x_to_view_y.sql
* We track the migrations in git and we do merge requests and reviews
* We include a comment in each migration containing the original DDL of the object (1. it makes it easier for the reviewer to see what's changing, 2. it gives us a way to rollback, though it never happened so far)
* We track the applied migrations in a table migrations looking like "2019-10-30T10:49:00" ! "45_add_column_x_to_view_y.sql"
* We have a CLI in the same repo that allow us to apply the migrations. It's 50 LOC of python that basically 1. gets the DB host/user/pwd from an .env, 2. checks the last applied migration in the table migrations, 3. finds the new migrations to apply, 4. applies them following the order of the ids e.g. 45 before 46, 5. updates the table migrations
* We have 3 DBs (dev / staging / prod) in AWS RDS and I have an .env for each one of them
* We have a CI pipeline on the repo to automatically apply the merged migrations in staging and check if nothing goes wrong
* When the staging or dev is trashed, I delete it and recreate it from a snapshot of the prod (it takes 5 min) https://stackoverflow.com/a/49878477/652669
[+] [-] notatoad|6 years ago|reply
[+] [-] nilram|6 years ago|reply
Sequence is to run pre-release migrations, deploy the release, then run post-release migrations. Sometimes days apart, in case things need to be stabilized. Testing environment runs only pre-release migrations, until some time before the release, then runs post.
We also have a handful of keywords to run operations in ways that don't lock large, frequently-used tables, or do other environment- or application-specific operations in standard ways.
[+] [-] diminoten|6 years ago|reply
[+] [-] neurostimulant|6 years ago|reply
[+] [-] BjoernKW|6 years ago|reply
There are pro and enterprise licenses in case you have additional requirements. These licenses as I understand it are proprietary and hence non-free but the the community edition absolutely is free software.
I’ve been using the community edition in multiple projects for several years now and it works great, particularly in the context of Spring Boot applications.
It might not be just as polished and well-integrated as ActiveRecord is for Ruby on Rails but keep in mind Flyway is supposed to be framework-independent so it can’t be too tightly integrated.
That said, its integration with Spring Boot is really good. Just add it as a dependency and configuration and migrations during application startup will be taken care of automatically.
[+] [-] sourcesmith|6 years ago|reply
Repeatable migrations are great especially when combined with upserts in Postgres, if you keep config in your database as well.
[+] [-] rodelrod|6 years ago|reply
[+] [-] mfrye0|6 years ago|reply
https://github.com/amacneil/dbmate
[+] [-] cptn_brittish|6 years ago|reply
The process of a sql migration management is pretty simple and like other comments it's not hard to write something yourself which works well I just prefer to use something a little more battle tested in the deployment process.
Also it's a go app so you don't need to pull down a stupid amount of dependencies for it to work just pull from it's github release and mark it as executable.
[+] [-] photonios|6 years ago|reply
We're heavy Postgres users and take full advantage of some of its features which are not very well supported by Django. Quite a number of our migrations involve `migrations.RunSQL` operations. That's fine. We've also developed a package [1] that extends Django with a lot of Postgres specific features. For example, recently we added support for partitioned tables. We can now set those up as normal Django models.
In order to reduce the number of f'ups, we also built a Github bot that inspects PR's for potentially disastrous migrations. For example, the Github bot will yell at you if you try to rename a field or blindly drop a table. The Github bot also makes sure your migration doesn't conflict with another migration that was just merged.
[1] https://github.com/SectorLabs/django-postgres-extra
[+] [-] Foober223|6 years ago|reply
In my local dev environment, I have a DB backup representing the current state of production. This is the starting point SQL scripts must be applied to. A deployment script restores this DB, then applies the SQL scripts against it. The entire deployment process takes about 2 seconds even with many gigs of data in the DB backup.
Sometimes I'll include an extra "dummy data" script applied at the end.
My local environment, Test, and Production are deployed in almost exactly the same way. A few extra actions are done for Production of course. And Production is not restored as that would loose data. Local dev is optimized for a speedy restore/deploy.
Developers always want to treat the DB as if it's code. But it's not. It's a blob of valuable state that must be protected from corruption. You apply patches to that blob of state. The order patches occur matters. There's no "re-build" like in pure code.
[+] [-] arshbot|6 years ago|reply
[+] [-] murkt|6 years ago|reply
So when I need to do some complex migration on a big database, I usually open postgres shell, open a transaction and I develop a migration like a code, in a REPL. And in case of these automatic tools, after I did that I have to go read their documentation, and port SQL to their syntax.
If I have junior members on the team who don't know SQL yet on a good level, easier tasks are automated by the tool, and for more complex task they completely lack skills developing a migration and have a much steeper wall to climb.
Another thing is that many tools have downgrade migrations. Downgrade migrations are a lie! How could I revert a migration that drops a NOT NULL column with some data? In case I really need to revert a migration I will write another forward migration. Which I did exactly zero times in more than ten years. So, writing downgrade migrations is a waste of time.
Another minor point is that sometimes migrations are running for a loooooong time. Not a second or two - it can be many hours. So I don't want this migration to start automatically during my deploy process. But I don't want to have it as a separate script, so on local dev installations these migrations just take the same `make migrate` route. It's much more convenient to take a part of a commented SQL and run it separately, than to take a part of a migration script written in Python and run it separately.
I'm using a Nomad tool https://pypi.org/project/nomad/ which is written in Python. I think any tool that supports plain SQL migrations, has dependencies between migrations and doesn't require to write downgrades would be acceptable.
[+] [-] sandGorgon|6 years ago|reply
The actual application was in Go or nodejs.
I daresay that Activerecord and Alembic outclass Liquibase in a lot of things.
[+] [-] nZac|6 years ago|reply
Some technical benefits to Alembic:
- It will give you the structure for upgrades and downgrades.
- Has a clean interface for DDL operations
- Supports every (?) database that SA does
- You can use it in "Offline" mode if you don't want to have Python and all the dependencies on the server or have to hand the migration off to someone else that has access.
- The branch feature is really nifty if you are in advanced situations.
Some non-technical benefits with Alembic
- It is open source
- zzzeek, the author, is pretty active on here and has built both SQLAlchemy and Alembic so there is a lot of cohesion in styles.
- The issue tracker is active and responsive
- The code is stable (something you want in a migration tool) and is unlikely to go anywhere.
Highly recommend.
Edit: Formatting
[+] [-] psankar|6 years ago|reply
[+] [-] basetensucks|6 years ago|reply
[+] [-] perlgeek|6 years ago|reply
We use CI/CD for all components, including for the schema. The deployment playbook for the schema basically does a `flyway migrate` after installing the schema / migration files.
We have (again, historical reasons) a home-built schema diff tool for our subset of mysql that we use. For Postgres I'd look into apgdiff. We use the diff tool to generate initial versions of the migrations, potentially modify them by hand, and then `git add` them in the schema project.
If you don't like flyway, you could check out https://sqitch.org/ which is fully Open Source.
[+] [-] pramodliv1|6 years ago|reply
[0] https://www.se-radio.net/2012/06/episode-186-martin-fowler-a...
Also checkout the links from the show notes:
http://www.databaserefactoring.com/
http://www.martinfowler.com/articles/evodb.html
http://www.amazon.com/exec/obidos/ASIN/0321293533/agiledba-2...? creative=327641&camp=14573&link_code=as1
http://www.awprofessional.com/bookstore/product.asp?isbn=032...
[+] [-] NicoJuicy|6 years ago|reply
---
The logic for migrations is suprisingly simple:
- A new migration will normally create a file with an Up or Down method and a "timestamp" + "description of the purpose of the migration" ( eg. 20191031_080603532_add_tags_to_contacts.cs )
- Every migration, when run on the database will create a new entry in eg. the _migrations table . Some will store the PreviousContext which is a hash of the CurrentScheme before the migrations.
If the application has a live check on startup. It will check the hash of the current db schema, compare it to an entry in the database.
Check if that compares to the latest.
The latest is the latest "migration file with up or down". It will execute all newer once according to the timestamp. Untill the latest one is executed.
A tool also contains a "TargetMigration" for development purpose or for reverting a database back to a previous state.
Every Up or Down migration is run inside a transaction. If it fails, nothing is stored.
There is also a way normally to do a Seed of the database, after the migration is run. But this could be added in the Up or Down method.
PS. All other methods untill now were bad.
[+] [-] neurostimulant|6 years ago|reply
I like how Wordpress does simple database-first migration: just call `dbDelta()` function with your new table schema in sql format (CREATE TABLE ...) and it'll figure out a way to update your table to fit your new schema. I wonder if there is any similar db migration tool in python that work like this, seem perfect for small projects that don't want to use an ORM.
https://codex.wordpress.org/Creating_Tables_with_Plugins#Cre...
[+] [-] ericHosick|6 years ago|reply
1) Use a migration tool (flyway, dbmigate, etc.)
2) We try and keep scripts idempotent to minimize on migration script explosion. Also helps to keep changes to similar assets located within the same file: easier to see the progress of a given asset as it changes throughout the lifetime of your app.
3) We have a dedicated repository for each database/schema. A CI/CD process triggers on a push to a given branch (dev, qa, stage, prod, etc.). The CI/CD process runs the migration script (in this case AWS Code Pipeline). Having the schema in its own repository decouples our databases from the service(s) that use them.
4) We try our best to separate schema changes from feature changes: a) push database changes first maintaining backwards compatibility, b) then push feature changes, c) then remove backwards compatibility. So, try to minimize on rollback script.
Local development is same as yours: docker-compose.
[+] [-] maktouch|6 years ago|reply
For bigquery, we wrote this nifty tool https://github.com/SplitmediaLabsLimited/supermigration
[+] [-] bkq|6 years ago|reply
Right now it only supports SQLite, MySQL, and PostgreSQL.
[1] - https://github.com/andrewpillar/mgrt
[2] - https://news.ycombinator.com/item?id=19517001
[+] [-] peterwwillis|6 years ago|reply
I don't mean to be "that guy", but unless your whole business's "thing" is no-proprietary-software, please try to pay for commercial licenses of good software. That money pays for bug fixes, security fixes, general maintenance, feature development, testing, support, and a host of other crap you probably don't want to have to do by yourself. If they provide a free tier, you're also supporting the free users.
Flyway is kind of the gold standard, I recommend it.
[+] [-] keshab|6 years ago|reply
Alembic is extra convenient if you're already invested in sqlalchemy.
I've tried yoyo-migrations[1] once, a while ago.
With yoyo, like flyway, migrations are written as SQL queries. However, they're expressed as SQL steps in python files. Might worth a look if you're using python.
1. https://pypi.org/project/yoyo-migrations/