This seems nice at first glance, but lets think about it for a minute:
Before: You had a .sql file and if you messed up you had to revert manually. Maybe you would pre-write the revert script, maybe your site is down if you mess up. It's super easy to understand what is happening though.
Now: you use pgroll. An absolute heaping ton of magic is happening behind the scenes. Every table is replaced by a view and tables have tons of 'secret' columns holding old data. Every operation that made sense before (ALTER TABLE foo ADD COLUMN bar INTEGER NOT NULL) turns into some ugly json mess (look at the docs, it's horrible) that then drops into the magic of pgroll to get turned into who knows what operations in the db, with the PROMISE that it can be reversed safely. Since all software has bugs, pgroll has bugs. When pgroll leaves you in a broken state, which it will sooner or later, you are just FUCKED. You will have to reverse engineer all it's complicated magic and try to get it back on the rails live.
You're trading something you understand and is simple, but maybe not super convenient, for something that is magically convenient and which will eventually put a bullet in your head. It's a horrific product, don't use it.
What is good for the goose is not necessarily good for the gander.
Obviously for startups, you're 100% right. Just announce a brief downtime and/or do migrations after-hours. Keep it simple, no one will care if their requests timeout once every week for 30 seconds.
If your company has hundreds of developers making changes across every timezone and downtime (or developers being blocked waiting for scheduled merge windows) costs real money or creates real problems other than optics, something like this or Vitess (MySQL) is definitely worth it.
Engineering should not be a "one-size-fits-all" type of job, and while I do love postgres, my main gripe with the community is that the "keep it simple stupid" mentality persists well beyond its sell-by date in many cases.
I think it is fair criticism that this adds complexity. However, I do have a couple of counter-arguments:
In order to avoid downtime and locking, you generally need multiple steps (e.g. some variation of add another column, backfill the data, remove previous column). You can codify this in long guidebooks on how to do schema changes (for example this one from gitlab [1]). You also need to orchestrate your app deployments in between those steps, and you often need to have some backwards compatibility code.
This is all fine but: 1. it slows you down and 2. it's manual and error prone. With pgroll, the process is always the same (start pgroll migration, deploy code, complete/rollback migration) so the team can exercise it often.
Second, while any software has bugs, it's worth noting that the main reason roll-ing back is quick and safe with pgroll is that it only has to drop views and any hidden columns. While the physical schema is changed, it is in a backwards compatible way until with complete the migration, so you can always skip the views if you have to bypass whatever pgroll is doing.
Recently on the market for a tool to manage SQL migration patches with no need for slow rollouts, I reviewed many such tools and the one that impressed me was sqitch: https://github.com/sqitchers/sqitch
So if you are interrested in this field and if Pgroll is not quite what you are looking for, I recommand you have a look at sqitch.
Using sqitch in production, the main issue is that it effectively has no knowledge of what is happening except for running psql files for you. So it's hard to keep track of what the actual resulting schema should look like after each change has been applied to work out whether it's correct.
Getting people to write deploy/revert scripts are relatively easy. Asking them to write verify scripts that do more than check that a column has been added/removed is hard.
There's the "physical" issues of modifying a schema, and sqitch is great for that. But handling the "logical" issues of schema migration is more than an automated tool for running scripts.
This tool (pgroll) allows you to actually test the modified schema for logical validity without impacting the ongoing operations, which to me, seems like a win.
My experience with Sqitch was "all the fun of git rebase with all the fun of manual rollback and verification code" :-( I would never wish that upon my enemies
I'm open to the fact that we may have just had legacy antipatterns drug into the project, since it was shoehorned into the team by a similarly strongly opinionated advocate
Sqitch is an incredibly under appreciated tool. It doesn’t have a business pushing it like flyway and liquibase, so it isn’t as widely known, but I vastly prefer it to comparable migration tools.
We've looked into this tool a couple times, haven't adopted it yet. The main issue is compatibility with prisma, which we use for defining our db's schema and automating migrations. If there were a slick integration between the two, that would go a long way to convincing us to adopt. In the meantime, we'll go with convenience and (slightly, almost immeasurably) lower reliability. We haven't had downtime due to a migration in months!
Yes, this is the biggest issue right now and we’re working on a solution.
What we want to do is add the ability to generate the pgroll migrations based on the prisma generated migration files. Depending on the operation, you might need to add more info.
This seems like a great tool! I like how it "simply" works by putting a view on top of the actual table. This concept is widely used in the data engineering world (in dbt).
There must be an easier way to write migrations for pgroll though. I mean, JSON, really?
Yes, but we're working on a converter from DDL sql to pgroll json.
The reason for JSON is because the pgroll migrations are "higher level". For example, let's say that you are adding a new unique column that should infer its data from an existing column (e.g. split `name` into `first_name` and `last_name`). The pgroll migration contains not only the info that new columns are added, but also about how to backfill the data.
The sql2pgroll converter is creating the higher level migration files, but leaves placeholder for the "up" / "down" data migrations.
Unrelated to the OP product but I’m curious how people are solving this issue on smaller scale with nextjs + prisma/drizzle. Do you just run the builtin migrate script with npm run?
We use a CI job that runs drizzle migrate whenever the schema files change. Drizzle doesn’t have rollbacks though. Replacing this process with pgroll as a sort of drop-in replacement would be nice. Then orchestrate everything with a couple of CI jobs, and done!
Not public.. I build something very similar a decade ago but that was part of a low-code system so had access to far more information than most databases (the full history of schema changes with a little bit of metadata over the change).
[+] [-] ltbarcly3|1 year ago|reply
Before: You had a .sql file and if you messed up you had to revert manually. Maybe you would pre-write the revert script, maybe your site is down if you mess up. It's super easy to understand what is happening though.
Now: you use pgroll. An absolute heaping ton of magic is happening behind the scenes. Every table is replaced by a view and tables have tons of 'secret' columns holding old data. Every operation that made sense before (ALTER TABLE foo ADD COLUMN bar INTEGER NOT NULL) turns into some ugly json mess (look at the docs, it's horrible) that then drops into the magic of pgroll to get turned into who knows what operations in the db, with the PROMISE that it can be reversed safely. Since all software has bugs, pgroll has bugs. When pgroll leaves you in a broken state, which it will sooner or later, you are just FUCKED. You will have to reverse engineer all it's complicated magic and try to get it back on the rails live.
You're trading something you understand and is simple, but maybe not super convenient, for something that is magically convenient and which will eventually put a bullet in your head. It's a horrific product, don't use it.
[+] [-] erulabs|1 year ago|reply
Obviously for startups, you're 100% right. Just announce a brief downtime and/or do migrations after-hours. Keep it simple, no one will care if their requests timeout once every week for 30 seconds.
If your company has hundreds of developers making changes across every timezone and downtime (or developers being blocked waiting for scheduled merge windows) costs real money or creates real problems other than optics, something like this or Vitess (MySQL) is definitely worth it.
Engineering should not be a "one-size-fits-all" type of job, and while I do love postgres, my main gripe with the community is that the "keep it simple stupid" mentality persists well beyond its sell-by date in many cases.
[+] [-] tudorg|1 year ago|reply
In order to avoid downtime and locking, you generally need multiple steps (e.g. some variation of add another column, backfill the data, remove previous column). You can codify this in long guidebooks on how to do schema changes (for example this one from gitlab [1]). You also need to orchestrate your app deployments in between those steps, and you often need to have some backwards compatibility code.
This is all fine but: 1. it slows you down and 2. it's manual and error prone. With pgroll, the process is always the same (start pgroll migration, deploy code, complete/rollback migration) so the team can exercise it often.
Second, while any software has bugs, it's worth noting that the main reason roll-ing back is quick and safe with pgroll is that it only has to drop views and any hidden columns. While the physical schema is changed, it is in a backwards compatible way until with complete the migration, so you can always skip the views if you have to bypass whatever pgroll is doing.
[1]: https://docs.gitlab.com/ee/development/migration_style_guide...
[+] [-] rixed|1 year ago|reply
Recently on the market for a tool to manage SQL migration patches with no need for slow rollouts, I reviewed many such tools and the one that impressed me was sqitch: https://github.com/sqitchers/sqitch
So if you are interrested in this field and if Pgroll is not quite what you are looking for, I recommand you have a look at sqitch.
[+] [-] rswail|1 year ago|reply
Getting people to write deploy/revert scripts are relatively easy. Asking them to write verify scripts that do more than check that a column has been added/removed is hard.
There's the "physical" issues of modifying a schema, and sqitch is great for that. But handling the "logical" issues of schema migration is more than an automated tool for running scripts.
This tool (pgroll) allows you to actually test the modified schema for logical validity without impacting the ongoing operations, which to me, seems like a win.
[+] [-] lucideer|1 year ago|reply
If you don't need slow rollouts, what would you say the downsides of using Pgroll over Sqitch would be?
(I've used neither, but I got the impression from the op that slow rollouts was a feature, not a requirement)
[+] [-] mdaniel|1 year ago|reply
I'm open to the fact that we may have just had legacy antipatterns drug into the project, since it was shoehorned into the team by a similarly strongly opinionated advocate
[+] [-] hendiatris|1 year ago|reply
[+] [-] Arctic_fly|1 year ago|reply
[+] [-] tudorg|1 year ago|reply
What we want to do is add the ability to generate the pgroll migrations based on the prisma generated migration files. Depending on the operation, you might need to add more info.
This will work fairly generally, not only prisma.
[+] [-] notpushkin|1 year ago|reply
[+] [-] rtuin|1 year ago|reply
There must be an easier way to write migrations for pgroll though. I mean, JSON, really?
[+] [-] colemannerd|1 year ago|reply
[+] [-] tudorg|1 year ago|reply
The reason for JSON is because the pgroll migrations are "higher level". For example, let's say that you are adding a new unique column that should infer its data from an existing column (e.g. split `name` into `first_name` and `last_name`). The pgroll migration contains not only the info that new columns are added, but also about how to backfill the data.
The sql2pgroll converter is creating the higher level migration files, but leaves placeholder for the "up" / "down" data migrations.
The issue where sql2pgroll is tracked is this one: https://github.com/xataio/pgroll/issues/504
[+] [-] mdaniel|1 year ago|reply
[+] [-] cpursley|1 year ago|reply
[+] [-] brycethornton|1 year ago|reply
[+] [-] muratsu|1 year ago|reply
[+] [-] maxf|1 year ago|reply
[+] [-] konradb|1 year ago|reply
[+] [-] tudorg|1 year ago|reply
[+] [-] polskibus|1 year ago|reply
[+] [-] radicalbyte|1 year ago|reply
[+] [-] jherskovic|1 year ago|reply
[+] [-] tudorg|1 year ago|reply
[+] [-] cmcconomy|1 year ago|reply
[+] [-] tianzhou|1 year ago|reply
[+] [-] Dinux|1 year ago|reply
[+] [-] tudorg|1 year ago|reply
[+] [-] unknown|1 year ago|reply
[deleted]