top | item 40286622

(no title)

clarkbw | 1 year ago

My tests running ALTER varied from ~20 seconds to ~1 min for the changes.

> Current CI/CD practices often make it very easy for software developers to commit and roll out database migrations to a production environment, only to find themselves in the middle of a production incident minutes later. While a staging deployment might help, it's not guaranteed to share the same characteristics as production (either due to the level of load or monetary constraints).

(neon.tech employee here)

This is where branching databases with production data helps quite a bit. Your CI/CD environment and even staging can experience the schema changes. When you build from a seed database you can often miss this kind of issue because it lacks the characteristics of your production environment.

But the author rightly calls out how staging isn't even enough in the next paragraph:

>The problem is, therefore (and I will repeat myself), the scale of the amount of data being modified, overall congestion of the system, I/O capacity, and the target table's importance in the application design.

Your staging, even when branched from production, won't have the same load patterns as your production database. And that load and locks associated will result in a different rollout.

This has me thinking if you can match the production environment patterns in staging by setting staging up to mirror the query patterns of production. Mirroring like what's available from pg_cat could put your staging under similar pressure.

And then this also made me think about how we're not capturing the timing of these schema changes. Unless a developer looks and sees that their schema change took 56 seconds to complete in their CI system you won't know that this change might have larger knock on effects in production.

discuss

radimm|1 year ago

Author here - this is my primary goal, exposing the complexity developer might not even think about. Can't even count number of instances seemingly inconspicuous changes caused incident.

"Works on my DB" is new "works on my machine" (and don't trademark it, please :)))

clarkbw|1 year ago

Agreed! A common ORM pitfall is column rename which often doesn't get implemented as a rename as much as it does a DROP and ADD which will affect the data in a surprising way :-D