top | item 42388973

Pgroll – Zero-downtime, reversible, schema changes for PostgreSQL (new website)

262 points| todsacerdoti | 1 year ago |pgroll.com | reply

52 comments

order
[+] ltbarcly3|1 year ago|reply
This seems nice at first glance, but lets think about it for a minute:

Before: You had a .sql file and if you messed up you had to revert manually. Maybe you would pre-write the revert script, maybe your site is down if you mess up. It's super easy to understand what is happening though.

Now: you use pgroll. An absolute heaping ton of magic is happening behind the scenes. Every table is replaced by a view and tables have tons of 'secret' columns holding old data. Every operation that made sense before (ALTER TABLE foo ADD COLUMN bar INTEGER NOT NULL) turns into some ugly json mess (look at the docs, it's horrible) that then drops into the magic of pgroll to get turned into who knows what operations in the db, with the PROMISE that it can be reversed safely. Since all software has bugs, pgroll has bugs. When pgroll leaves you in a broken state, which it will sooner or later, you are just FUCKED. You will have to reverse engineer all it's complicated magic and try to get it back on the rails live.

You're trading something you understand and is simple, but maybe not super convenient, for something that is magically convenient and which will eventually put a bullet in your head. It's a horrific product, don't use it.

[+] erulabs|1 year ago|reply
What is good for the goose is not necessarily good for the gander.

Obviously for startups, you're 100% right. Just announce a brief downtime and/or do migrations after-hours. Keep it simple, no one will care if their requests timeout once every week for 30 seconds.

If your company has hundreds of developers making changes across every timezone and downtime (or developers being blocked waiting for scheduled merge windows) costs real money or creates real problems other than optics, something like this or Vitess (MySQL) is definitely worth it.

Engineering should not be a "one-size-fits-all" type of job, and while I do love postgres, my main gripe with the community is that the "keep it simple stupid" mentality persists well beyond its sell-by date in many cases.

[+] tudorg|1 year ago|reply
I think it is fair criticism that this adds complexity. However, I do have a couple of counter-arguments:

In order to avoid downtime and locking, you generally need multiple steps (e.g. some variation of add another column, backfill the data, remove previous column). You can codify this in long guidebooks on how to do schema changes (for example this one from gitlab [1]). You also need to orchestrate your app deployments in between those steps, and you often need to have some backwards compatibility code.

This is all fine but: 1. it slows you down and 2. it's manual and error prone. With pgroll, the process is always the same (start pgroll migration, deploy code, complete/rollback migration) so the team can exercise it often.

Second, while any software has bugs, it's worth noting that the main reason roll-ing back is quick and safe with pgroll is that it only has to drop views and any hidden columns. While the physical schema is changed, it is in a backwards compatible way until with complete the migration, so you can always skip the views if you have to bypass whatever pgroll is doing.

[1]: https://docs.gitlab.com/ee/development/migration_style_guide...

[+] rixed|1 year ago|reply
Pgroll shines if you are doing slow rollouts.

Recently on the market for a tool to manage SQL migration patches with no need for slow rollouts, I reviewed many such tools and the one that impressed me was sqitch: https://github.com/sqitchers/sqitch

So if you are interrested in this field and if Pgroll is not quite what you are looking for, I recommand you have a look at sqitch.

[+] rswail|1 year ago|reply
Using sqitch in production, the main issue is that it effectively has no knowledge of what is happening except for running psql files for you. So it's hard to keep track of what the actual resulting schema should look like after each change has been applied to work out whether it's correct.

Getting people to write deploy/revert scripts are relatively easy. Asking them to write verify scripts that do more than check that a column has been added/removed is hard.

There's the "physical" issues of modifying a schema, and sqitch is great for that. But handling the "logical" issues of schema migration is more than an automated tool for running scripts.

This tool (pgroll) allows you to actually test the modified schema for logical validity without impacting the ongoing operations, which to me, seems like a win.

[+] lucideer|1 year ago|reply
> if Pgroll is not quite what you are looking for

If you don't need slow rollouts, what would you say the downsides of using Pgroll over Sqitch would be?

(I've used neither, but I got the impression from the op that slow rollouts was a feature, not a requirement)

[+] mdaniel|1 year ago|reply
My experience with Sqitch was "all the fun of git rebase with all the fun of manual rollback and verification code" :-( I would never wish that upon my enemies

I'm open to the fact that we may have just had legacy antipatterns drug into the project, since it was shoehorned into the team by a similarly strongly opinionated advocate

[+] hendiatris|1 year ago|reply
Sqitch is an incredibly under appreciated tool. It doesn’t have a business pushing it like flyway and liquibase, so it isn’t as widely known, but I vastly prefer it to comparable migration tools.
[+] Arctic_fly|1 year ago|reply
We've looked into this tool a couple times, haven't adopted it yet. The main issue is compatibility with prisma, which we use for defining our db's schema and automating migrations. If there were a slick integration between the two, that would go a long way to convincing us to adopt. In the meantime, we'll go with convenience and (slightly, almost immeasurably) lower reliability. We haven't had downtime due to a migration in months!
[+] tudorg|1 year ago|reply
Yes, this is the biggest issue right now and we’re working on a solution.

What we want to do is add the ability to generate the pgroll migrations based on the prisma generated migration files. Depending on the operation, you might need to add more info.

This will work fairly generally, not only prisma.

[+] notpushkin|1 year ago|reply
Same but with Alembic! I'm wondering how hard would it be to open pgroll up for third party migration systems.
[+] rtuin|1 year ago|reply
This seems like a great tool! I like how it "simply" works by putting a view on top of the actual table. This concept is widely used in the data engineering world (in dbt).

There must be an easier way to write migrations for pgroll though. I mean, JSON, really?

[+] colemannerd|1 year ago|reply
Are migrations still specified in json?
[+] tudorg|1 year ago|reply
Yes, but we're working on a converter from DDL sql to pgroll json.

The reason for JSON is because the pgroll migrations are "higher level". For example, let's say that you are adding a new unique column that should infer its data from an existing column (e.g. split `name` into `first_name` and `last_name`). The pgroll migration contains not only the info that new columns are added, but also about how to backfill the data.

The sql2pgroll converter is creating the higher level migration files, but leaves placeholder for the "up" / "down" data migrations.

The issue where sql2pgroll is tracked is this one: https://github.com/xataio/pgroll/issues/504

[+] muratsu|1 year ago|reply
Unrelated to the OP product but I’m curious how people are solving this issue on smaller scale with nextjs + prisma/drizzle. Do you just run the builtin migrate script with npm run?
[+] maxf|1 year ago|reply
We use a CI job that runs drizzle migrate whenever the schema files change. Drizzle doesn’t have rollbacks though. Replacing this process with pgroll as a sort of drop-in replacement would be nice. Then orchestrate everything with a couple of CI jobs, and done!
[+] polskibus|1 year ago|reply
Does anyone know of similar tool for MS SQL?
[+] radicalbyte|1 year ago|reply
Not public.. I build something very similar a decade ago but that was part of a low-code system so had access to far more information than most databases (the full history of schema changes with a little bit of metadata over the change).
[+] Dinux|1 year ago|reply
Is this related to supabase? There seems to be a link?
[+] tudorg|1 year ago|reply
Actually no, this is a project by Xata. What link do you mean?