(no title)
mollems | 2 years ago
Here’s what jumped out at me: “The new account was created in our database with a null value in the URI field.”
Almost every time I see a database-related postmortem — and I have seen a lot of them — NULL is lurking somewhere in the vicinity of the crime scene. Even if NULL sometimes turns out not to be the killer, it should always be brought in for questioning.
My advice is: never rely on NULL as a sentinel value, and if possible, don’t allow it into the database at all. Whatever benefits you think you might gain, they will inevitably be offset by a hard-to-find bug, quite possibly years later, where some innocuous-seeming statement expects either NULL or NOT NULL and the results are unexpected (often due to drift in the semantics of the data model).
Although this was a race condition, if the local accounts and the remote accounts were affirmatively distinguished by type, the order of operations may not have mattered (and the account merge code could have been narrowly scoped).
chipbarm|2 years ago
Null is a perfectly valid value for data, and should be treated as such. A default value (e.g. -1 for a Boolean or an empty for a string) can make your system appear to work where NULL would introduce a runtime error, but that doesn't mean your system is performing as expected, it just makes it quieter.
I know it's tempting to brush NULL under the rug, but nothing is just as valid a state for data as something, and systems should be written generally to accommodate this.
colejohnson66|2 years ago
[a]: C# is fixing this with "nullable reference types", but as long as it's still opt-in, it's not perfect (backwards compatibility and everything). I can still forcibly pass a NULL to a function (defined to not take a null value) with the null-forgiving operator: `null!`. This means library code still needs `ArgumentNullException.ThrowIfNull(arg)` guards everywhere, just in case the caller is stupid. One could argue this is the caller shooting themselves in the foot like `Option.unwrap_unchecked` in Rust, but "good practice" in C# (depending on who you ask) tends to dictate guard checks.
[b]: Which is kind of stupid, IMO. Why should `my_column BOOL` be able to be null in the first place? Nullable pointers I can understand, but implicitly nullable everything is a horrible idea.
kaoD|2 years ago
IMO the problem (at least in this case) is not NULL in the DB, but NULL at the application level.
If NULL is some sort of Maybe monad and you're forced to deal with it, well, you're forced to deal with it, think about it, etc.
Empty string, whatever NULL string is in your language of choice, or some sort of sigil value you invent... not much of a difference.
int_19h|2 years ago
An empty string is better as a sentinel value because at least this doesn't have the weird "unknown value" semantics that NULL does. But if you really want the same level of explicitness and safety as an option type, the theoretically proper way to do this in relational model is to put the strings themselves in a separate table in a 1:N (where N is 0 or 1) relationship with the primary table.
semiquaver|2 years ago
In this case, a `user_uris` table with non-nullable columns and a unique constraint on `user_id` is the first option that comes to mind.
tedunangst|2 years ago
mananaysiempre|2 years ago
Yes! NULL is relational for “don’t know”, and SQL is (mostly, with varying degrees of success) designed to treat is as such. That’s why NULL=anything is NULL and not e.g. false (and IMO it’s a bit of a misfeature that queries that branch on a NULL don’t crash, although it’s still better than the IEEE 754 NaN=anything outright evaluating to false). If the value is funky but you do know it, then store a funky value, not NULL.
Tempest1981|2 years ago
unknown|2 years ago
[deleted]
btown|2 years ago
IMO automated merging/deduplication of "similar" records is one of those incredibly hard problems, with edge cases and race conditions galore, that should have a human in the loop whenever possible, and should pass data (especially data consumed asynchronously) as explicitly as possible, with numerous checks to ensure that facts haven't shifted on the ground.
In many cases, it requires the implementors to start by thinking about all the concerns and interactivity requirements that e.g. a Git-style merge conflict would have, and try to make simplifying assumptions based on the problem domain from that starting position.
Looking at the Mastodon source [0], and seeing that there's not even an explicit list of to-merge-from IDs passed from the initiator of the merge request to the asynchronous executor of the merge logic, it seems like it was only a matter of time before something like this happened.
This is not a criticism of Mastodon, by the way! I've personally written, and been bitten by, merge logic with far worse race conditions, and it's frankly incredible that a feature like this even exists for what is effectively [1] a volunteer project! But it is a cautionary tale nonetheless.
[0] https://github.com/mastodon/mastodon/blob/main/app/workers/a... (note: AGPL)
[1] https://opencollective.com/mastodon
wouldbecouldbe|2 years ago
I agree rest of it can be hard, and would be nervous. But this should have been obvious
msla|2 years ago
More deeply, NULL is inevitable because reality is messy and your database can't decline to deal with it just because it's messy. You want to model titles, with prenomials and postnomials, and then generate full salutations using that data? Well, some people don't have postnomials, at the very least, so even if you never store NULLs you're going to get them as a result of the JOIN you use to make the salutation.
You can remove the specific NULL value, but you can't remove the fact "Not Applicable"/"Unknown" is very often a valid "value" for things in reality, and a database has to deal with that.
wouldbecouldbe|2 years ago