top | item 38725327

(no title)

butlerm | 2 years ago

If you do not particularly care about performance or have a great deal of headroom then database enforcement of referential integrity is great. Alternatively you could just write test cases to check for it and not pay the severe performance penalty.

The other major downside of database enforcement of referential integrity is the common need to drop and re-create foreign keys during database schema upgrades and data conversions.

discuss

setr|2 years ago

You’re still going to pay the cost of maintaining referential integrity — you’re just doing it on the app side. You can do it faster by being not-correct — eg you don’t need a lock if you ignore race conditions — but it’s not like the database is arbitrarily slow at doing one of its basic fundamental jobs.

Of course, you can just skip the validation altogether and cross your fingers and hope you’re correct, but it’s the same reasoning as removing array bounds checking from your app code; you’ve eked out some more performance and it’s great until it’s catastrophically not so great.

Your reasoning should really be inverted. Be correct first, and maintain excessive validation as you can, and rip it out where performance matters. With OLTP workloads, your data’s correctness is generally much more valuable than the additional hardware you might have to throw at it.

I’m also not sure why dropping/creating foreign keys is a big deal for migrations, other than time spent

butlerm|2 years ago

It is quite common for modern databases to have multiversion concurrency so that writers do not block readers. If you do not your transactions should either be awfully short, you should be prepared to wait, or you should implement dirty reads (which are quite common in any case).

dasil003|2 years ago

"just write test cases to check for [referential integrity]" is doing some awful heavy lifting in this comment.

Assuming a standard n-tier application architecture, how do you guarantee the test prevents race conditions?

jrumbut|2 years ago

You either end up reinventing foreign keys, your support volume will scale faster than your data, or user experience will suffer.

There may be situations where foreign keys become too much overhead, but it's worth fighting to keep them as long as possible. Data integrity only becomes more important at scale. Every orphaned record is a support ticket, lost sale, etc.

butlerm|2 years ago

Typically you look for orphan rows - the sort of thing ON DELETE CASCADE was invented to prevent. Another thing to check for are records that need to exist but should have references cleared when something else is deleted, e.g. ON DELETE SET NULL. And the third thing is ON DELETE RESTRICT.

You can check for the first two of those things after the fact, and they are relatively benign. In many cases it will make no difference to application queries, especially with the judicious use of outer joins, which should be used for all optional relationships anyway.

If you need ON DELETE RESTRICT application code should probably check anyway, because otherwise you have unexpected delete failures with no application level visibility as to what went wrong. That can be tested for, and pretty much has to be before code that deletes rows subject to delete restrictions is released into production.

As far as race conditions go, they should be eliminated through the use of database transactions. Another alternative is never to delete rows that are referred to elsewhere and just set a deleted flag or something. That is mildly annoying to check for however. Clearing an active flag is simpler because you usually want rows like that to stay around anyway, just not be used in new transactions.

hermanradtke|2 years ago

> you could just write test cases to check for it

This effectively means you are building an embedded database in your application and using the networked database for storage. There are a few reasons to do this and a million reasons not to.