top | item 21487891

(no title)

kenshaw | 6 years ago

Foreign keys should definitely be used in most schema designs. While I can agree that they are problematic when doing a schema migration, a schema migration would only happen "very rarely" whereas inserts/deletes happen regularly. I believe this comment is more of a reflection of the specific database (MySQL) which has relatively poor FK performance and adjunct issues when compared to other SQL databases.

Also, from a process perspective of migrating a schema, the ideal mechanism would be to incrementally copy from the existing schema to the new schema (meaning there is no breakage in the interim). Granted (and somewhat obviously) for databases as large as what GitHub is likely using, this is not possible -- and so a "full migration" at once is their only option. However, part of that "full migration", to me, would be to _remove_ the original foreign keys from the schema, rename/delete/alter tables/fields/indices, and then after all of that is complete, as a final step, re-add referential integrity (ie, foreign keys).

I'd specifically like to point out the following:

1) This comment gives advice about all SQL databases based on the conclusion / operation of only one database (MySQL), whereas the same base facts do not apply to others (eg, PostgreSQL, Microsoft SQL Server, ...), and should not be used to evaluate whether using FKs makes sense for other databases or not.

2) Schema migrations should be done rarely, if ever. A better approach would have been to start with a schema design that was less prone to _needing_ change. Regardless, even if schema migrations are done with _regularity_, that frequency would still be orders of magnitude less regular than, eg, code changes or CRUD operations. As such, throwing away FKs simply to accommodate either a poor or non-thoroughly thought-out schema is tossing the baby with the bathwater, especially since the purpose of FKs is to reduce the ability of other tiers in the application from losing or otherwise corrupting the structured data storage.

3) FKs are useful when doing CRUD operations. They are not useful during schema changes. As such, the proper process in this case would be to temporarily remove FKs, perform the schema migration, and then re-institute the FKs. This may not always be possible, due to, eg, active business logic components or some such that can't allow the database to be taken offline to perform a migration (see point #2). In that instance, there are a number of other mitigations that can be put in place in the logic tiers, such as being able to recognize different schema revisions, and deploying that _prior_ to doing a schema change. As such, one would first migrate an active "smart" logic tier that can recognize different schema revisions, and allow for incremental migration of schemas without enacting a cluster-wide lock/disabling of reads/writes.

discuss

cryptonector|6 years ago

2b) And keep your schema changes as incremental and backwards-compatible as possible. Add new tables, add columns to tables, but don't drop or rename anything. If you need a flag day, in PG terms just setup a new schema, use triggers to sync the two, and migrate your apps to use the new schema -- when they're all migrated, drop the sync triggers and the old schema.