top | item 42391048

(no title)

wfleming | 1 year ago

> Migrations in Django > > The Django approach has noteworthy differences and a slightly different workflow

My explanation of Django's approach to migrations would involve a lot more expletives. It is by far my least favorite thing about the framework.

- Fields are not-null by default, the opposite of SQL itself

- Field declarations use argument names that sound like the SQL equivalents (default, on delete cascade/restrict), but they're fully enforced in python and don't make it into the SQL schema at all by default. I get that not every DB supports these features, and there are sometimes good reasons for doing something like a delete cascade in process (if you need to implement custom on-delete logic or something), but the DSL shouldn't read like SQL if it's not going to change the SQL schema. The default value thing combined with not-null by default is particularly easy to get bitten by adding a new field with default if you deploy migrations separately from deploying code (which you must do to avoid downtime): if you add a new field with a default value believing it's safe, you will probably get crashes in the interval between applying the migration & the new code deploying because you've got not-null column without a default value in the schema. They did finally add db_default recently, thankfully, but it took years!

- Django migrations cannot be understood in isolation, the sql a generated migration containing something like an AlterField operation will run depends on what earlier migrations say. You have to check with the sqlmigrate command and/or actually read earlier migrations to be sure you understand what the migration will do. Compared to Rails, where each migration can be read and understood in isolation (though you may still need to understand how a Rails migration DSL will translate to actual SQL of course). This also has a performance impact making Django migrations slower because Django has an in-memory model of what the schema should be at each migration point, so running a migration is not just "load the file and run the commands", it's "determine the schema that should exist after running this migration, diff that with the schema we think exists before this point, magically generate SQL for that diff, then run that".

- The makemigrations command to auto-generate pending migrations is very aggressive and will detect things as "changed" that don't impact the schema. If you changed some help text on a field, or a localization string, or the aforementioned only-in-python default value, makemigrations will see that as requiring a migration that does nothing. Leads to lots of cruft.

- Related to both of the above points, AlterField's auto-generated SQL can be dangerous and bad. Particularly, I've seen cases where very minor changes to a ForeignKey (like changing from nullable to not-nullable, or even not-schema-impacting changes like above) would, by default, have dropped the foreign key constraint & index, and then re-created them. Completely unnecessary and potentially dangerous since it could be locking a large table. I'm not positive, but in some cases I think these have been generated purely because of a django upgrade leading to it deciding the names of the indexes/constraints need to be changed for some reason.

- AlterField will also tend to stomp all over any tweaks you manually made to the schema to work around all these issues. If you manually wrote a SQL statement in an earlier migration to add a default value to a column, and then that column's definition gets tweaked months or years later the generated AlterField is gonna remove your default value. At a technical level this isn't surprising when you understand how Django is modeling the schema internally & generating the SQL changes, but it's definitely a bad user experience downstream of a lot of these design decisions.

Generally the field declaration/migrations system in Django feels to me designed to lead people down a garden path towards bad and dangerous behavior. If I had my druthers I'd enforce a policy in our Django app of "never run makemigrations, all migrations must be manually written SQL".

discuss

order

infamia|1 year ago

> The default value thing combined with not-null by default

It is interesting... I strongly feel the opposite after debugging far too many SQL queries where someone checked for an empty string but forgot to check for NULL. NULL by default is a SQL footgun IMO.

wfleming|1 year ago

I prefer null as a clear “no value” marker to handle rather than special-casing handling of an empty string vs other strings, and if “empty string” isn’t considered valid that should be validation logic and that should never get stored in the database. But it’s certainly a matter of opinion and either can work as long as you’re consistent.

The point about Django’s choices about default and not-null though is that it can easily lead to crashes while you’re adding fields. If you add a string field with default=“” and don’t specify null=False, the generated schema will be a non-null field without a default value in SQL, but Django will backfill “” into all existing rows. To avoid downtime, you need to deploy migrations & apply them, then deploy the models.py/other code changes. But if anyone tries to write a new row before the new code finishes deploying after applying the migration, it will crash.

neillyons|1 year ago

Yep, Django's on_delete behaviour being implemented in Python instead of SQL has caught me out a few times over the years.

I often do `on_delete=models.DO_NOTHING, db_constraint=False` and add the on delete cascade constraint manually to the migration file. https://stackoverflow.com/a/78668897/288424

Same goes for `default. Prior to `db_default` I'd often edit the migration file and set the default in SQL.

I much prefer Elixir's Ecto now.

Izkata|1 year ago

> - The makemigrations command to auto-generate pending migrations is very aggressive and will detect things as "changed" that don't impact the schema. If you changed some help text on a field, or a localization string, or the aforementioned only-in-python default value, makemigrations will see that as requiring a migration that does nothing. Leads to lots of cruft.

    # Attributes that don't affect a column definition.
    # These attributes are ignored when altering the field.
    non_db_attrs = (
        "blank",
        "choices",
        "db_column",
        "editable",
        "error_messages",
        "help_text",
        "limit_choices_to",
        # Database-level options are not supported, see #21961.
        "on_delete",
        "related_name",
        "related_query_name",
        "validators",
        "verbose_name",
    )
Don't know when this was added, I just quickly checked django 4.2

wfleming|1 year ago

Our Django app was still 1.x when I joined, I’ve gotten us up to 3.2 and hope to be on 4.0 in January. It looks like this was added in 4.1, so this’ll be a nice QOL improvement to look forward to, thanks.

alephxyz|1 year ago

> Generally the field declaration/migrations system in Django feels to me designed to lead people down a garden path towards bad and dangerous behavior. If I had my druthers I'd enforce a policy in our Django app of "never run makemigrations, all migrations must be manually written SQL".

`makemigrations --dry-run` is helpful to see if your manual migrations are complete but besides that I agree

wfleming|1 year ago

I believe `makemigrations` builds up its conception of "what the schema is" from the `CreateModel`, `AddField`, `AlterField`, etc. ops in the migrations files. But it doesn't incorporate `RunSQL` ops into building that model of the schema. If my migrations were just a bunch of `RunSQL` ops, I think `makemigrations --dry-run` would basically just see everything from models.py as always needing to be added.

This behavior is why `SeparateDatabaseAndState` is a necessary hack in Django: sometimes you need to do an `AlterField` where the SQL Django would generate is really bad, so you need to write your own `RunSQL` to do the right thing, but you also need Django to see the `AlterField` as applied or you'll have problems with future migrations.

I suppose I could modify my preference to "run makemigrations and then wrap every single op in SeparateDatabaseAndState", but that does not sound fun :).