SQL NULLs aren't weird, they're just based off of Kleene's TRUE-FALSE-UNKNOWN logic!
If you want you can read NULL as UNKNOWN and suddenly a whole bunch of operations involving them become a lot more intuitive:
1. TRUE OR UNKNOWN = TRUE, because you know you have at least one TRUE already.
2. TRUE AND UNKNOWN = UNKNOWN, because you don't know whether you have two TRUEs or not. It's just out there.
3. UNKNOWN XOR UNKNOWN = UNKNOWN, because it could darn near be anything: TRUE XOR TRUE, TRUE XOR FALSE, FALSE XOR FALSE, FALSE XOR TRUE... Internalizing this is where SQL's use of NULL / UNKNOWN really becomes intuitive.
4. (TRUE AND FALSE) XOR (TRUE OR UNKNOWN) = (FALSE) XOR (TRUE) per #1 = TRUE. See, it's consistent, you just need to keep in mind that if you have a lot of known UNKNOWNs they're quite parasitic and your final outcome is likely to be, itself, an UNKNOWN. Just like in real life!
That doesn't address anything in the second half of the post though, starting with this pull quote:
> The fact that NULLs are distinct for UNIQUE columns but are indistinct for SELECT DISTINCT and UNION continues to be puzzling. It seems that NULLs should be either distinct everywhere or nowhere. And the SQL standards documents suggest that NULLs should be distinct everywhere. Yet as of this writing, no SQL engine tested treats NULLs as distinct in a SELECT DISTINCT statement or in a UNION.
there is a pattern starting to emerge here on hackernews of highly voted posts by people who present themselves as experts and thought leaders who shamelessly put their lack of understanding at display. it's frightening.
This is the correct way of thinking about things. Null is one of the hardest things for traditional software engineers in my experience as a guy who came up as a data admin.
It's unfortunate to namesquat on 'boolean' if your elements have three possible values. Just call it 'ternary logic' and let individuals decide which system to use.
Even simpler, note that the first two examples are essentially just two of the most basic boolean algebra identities. Replace UNKNOWN with a variable and change the notation to the more conventional boolean algebra notation and you've got:
This is the mental model that I use and teach: a NULL is a virus that will likely make everything it touches NULL... unless cordoned off by a COALESCE or IFNULL or something else that explicitly speaks about its NULL-sensitive behavior in its documentation.
So if you see a bug where a WHERE or FILTER clause is unexpectedly failing, and you're in a time crunch, rather than pounding your head against the wall, start sprinkling COALESCE statements around (while being thoughtful about the domain meaning of what you're doing) and it's more likely than not to solve your problem!
Except that NULL is not the same as UNKNOWN! NULL is a data value (like integers) that can appear in data expressions (like NULL + 1) and comparisons (like NULL = 1) whereas UNKNOWN is a truth value that can appear in boolean/logical expressions constructed from logical connectives like AND, OR, NOT.
A data expression always evaluates to a data value, and usually whenever any part of the expression is NULL, the entire expression evaluates to NULL.
A comparison evaluates to a truth value, and usually when a comparison invovles a NULL it returns UNKNOWN. This leads to weird behaviors where both `SELECT 3 WHERE NULL = NULL;` and `SELECT 3 WHERE NULL <> NULL;` returns nothing (because the query engine does not output a row if the predicate returns UNKNOWN on it).
What you listed above only comes into play for boolean/logical connectives like AND, OR, NOT, and in that case we follow 3-valued logic.
And there's more annoying corner cases when you deal with DISTINCT. The situation is so hopeless that SQLite has a whole table documenting divergent behaviors of NULL in different systems: https://www.sqlite.org/nulls.html
> select null = null; returns NULL, because each NULL is basically a placeholder representing any “unknown value”. Two unknown values are not necessarily the same value; we can’t say that they are equal, because we don’t know the value of either of them.
Agreed with all of this, it would probably have been better if they were named `unknown` instead of reusing the `null` keyword.
Note also that since Postgresql 15, you can use `NULLS NOT DISTINCT` when creating a unique index [0]. I'm less familiar with other databases.
> ... and this is even less obvious if you’re used to using ORMs.
Which is why I continue to be such an ORM skeptic. I agree that they're convenient. But I do worry that we've now got an entire generation of engineers who regularly interact with relational databases, but have largely been spared the effort of learning how they actually work.
As another commenter pointed out, if you've learned basic relational algebra then the way SQL nulls behave seems obvious and logically consistent. The logic is the same as the logic behind the comparison rules for NaN in IEEE floats. It's the behavior of C-style nulls that is, always and forever, a billion-dollar mistake.
Stop thinking of ORMs as trying to hide the details of SQL and you'll stop hating them. Instead think of them as a way to compose relational queries dynamically, with the full power of your primary language, instead of inside of database stored procedures in a language totally devoid of any support for basic software engineering best practices. They shouldn't be hiding SQL from your primary language, they should be exposing the relational model to it! SQL is not the only possible implementation of the relational model, and it's not even a particularly good one. Even SQL's founders don't think it implements EF Codd's relational model very faithfully. Good ORMs act as a domain-specific language for the relational model embedded inside the parent language.
My experience with ORMs is that most of the time you end up needing to write some queries by hand, in raw SQL. Usually these are the most complex, that you can't express in your ORM's DSL. My point being, I don't think using an ORM really shields you from having to learn how it works behind the scenes.
If you can avoid learning SQL or the underlying DBMS then great, I have no problem with people ignoring things that are ignore-able. Problem is you can't. You will need to learn the DBMS, and the ORM will end up just getting in the way of what you really want to do.
I was actually going to comment on how breathtakingly ignorant the blog post author was for making this statement. What on earth ORM would make a UNIQUE constraint against multiple columns without an explicit instruction, much less where some of the columns are NULLable and then leave the user entirely unaware that anything like that was happening? ORMs do nothing like this and the developer would always have built this constraint explicitly and therefore is working at the DDL layer. ORMs have nothing to do with whether or not someone knows SQL just like cars have nothing to do with whether or not someone knows how to walk.
Oh man. I had a talk with a DBA about how oracle could not deal with an adress with no street name - literally a tiny village with 10 houses on 1 nameless town square. It was unsearchable in parts of the app because street='' was interpreted as street is null. DBA kept claiming oracle was right and the town should adapt their naming to our software.
This attitude was so prevalent at the time, I sometimes wonder if the rise of noSQL was simply people sick of dealing with Oracle DBAs
It's weirder. If you insert an empty string into a VARCHAR field in Oracle, it returns Null back to you when you query that same field. At the very least, I'd expect a software system to behave in a deterministic way. I.e. either throw an error because you're not doing something right (whatever Oracle deems right in this case), or give you back what you gave it, especially for database software who's entire role of existence is to persist data without side-effects.
Exactly this. SQL is based on the relational algebra and that's well-defined, NULL along with other features of SQL work in an entirely regular and predictable way. The only time it's weird is when a developer decides that it should work the way Javascript (or whatever) NULLs work because that's the last time they saw the same word used in a programming language, in which case it's the assumption that's weird.
When the null concept was introduced to me in the seventies, the only thing I could say was that it would be causing a lot of unnecessary confusion in the future.
If you have missing values in your datarecord then that datarecord belongs in an exception-queue.
And now some 45 years later people are still discussing it like we did then..
> If you have missing values in your datarecord then that datarecord belongs in an exception-queue.
This depends on the context, no? I doubt there exists someone with a contact list on their phone which has every single field for every single contact populated.
There needs to be some way to codify that a field in a datarecord is unpopulated. Using the "zero value" for the type of the field (e.g., the empty string) is reasonable, but is this necessarily better than NULL? I reckon an argument can be made that this approach is just as likely to lead to bugs.
I'm not necessarily in favor of NULL, for what it's worth, but I can't think of an adequate replacement which doesn't reduce to "NULL in sheep's clothing".
Missing values are not always an exception. There's a reason modern languages almost universally include an Option type which may contain a Null and allow you to pass it around as a first class value. Good data representations allow you to express missing values.
SQL nulls in some ways behave in similar to floating point nans. Of course nans are also weird in their own way, but it is a bit comforting that its not so completely singularly weird.
NaN is cool because it's almost like a type that constrains uncertainty. What do we know about this entity? not much! but it's definitely not a number. Calling it anything else would have been a mistake.
Null is more confusing because it means different things in different languages. Sometimes it's a more constrained uncertainty, eg this definitely doesn't exist. But in sql it's a less constrained uncertainty, like "undefined" in math. The value of this thing couldn't make sense in this context, but we can't make assertions about its existence.
In Object Oriented Context "null" is useful to indicate that some object doesn't have value for that property.
What's interesting is, do we mean that in our data that attribute has no value? Or do we mean the real-world object represented by the data does not have that attribute?
Does null mean
a) We don't know the value of this attribute
for this object, or
b) We do know that there is no value for this
attribute in the real-world object
represented by our data.
In JavaScript because there is both null and undefined it is easy to assume that undefined means we don't know the value and null means we do know it has no value.
EXAMPLE:
The attribute 'spouse'. Some people have a spouse some don't. So what does it mean if the value of the field 'spouse' is null? That we know there is no spouse, or that we don't know who the spouse is if any.
In practical terms we can say null means "We don't know" which includes the case that there is no spouse.
> In JavaScript because there is both null and undefined it is easy to assume that undefined means we don't know the value and null means we do know it has no value.
Javascript objects have two kinds of undefined that are both represented by the same value. You have to use another method to see which it is, and I've seen "foo" in this example used for the same thing as "null" in your example:
But in a _relational_ database lack of spouse would not be modeled with a nullable column "spouse" but rather an absence of a spouse row/relation. Which is very real-world-like.
I remember from my databases course at university that NULL means that the database doesn't contain that data, and empty string means that it is known to be empty.
Even if we set three-value logic aside for a moment, this behavior of NULL still makes sense intuitively.
The value of NULL in a particular table cell is simply a way to indicate 'no value'. If you want the values in a column to be unique, cases where there are no values shouldn't be considered.
This plays out similarly in practice. For example, you may want to allow users to optionally reserve a username, and if they do, those usernames should be unique. It's hard to imagine a use case where by wanting a field to be both optional (nullable) and unique, you mean that the field should be optional for a single record (!) and required for all the rest. Of course, you mean that IF there is a value, THEN it should be unique.
I'm glad SQL Server finally got this, but I wish the syntax was nicer. It's a multi-word infix operator that gets tough to read. I've been using Snowflake SQL recently and I like that they just made it a function called EQUAL_NULL
I actually like how NULLs behave in SQL. They mean "I don't know" In the modern programming language we all care about Null safety. But no matter how you model your data, you will always run into the situations when you don't know everything. So I believe NOT NULL is not very practical. NULLs in SQL handle these case very well - when the input is unknown your output is unknown
Agreed. If SQL didn't have NULL, we'd have other special values meaning "I don't know" or "no data" all over the place.
Too many newbies hear that NULL is bad, so they declare all columns as NOT NULL and end up inserting ad hoc values like 0, -1, '', or {} when they inevitably come across cases where they don't have data. Which is even worse than NULL.
NULLs are weird because they are basically two different types under the same name. The 3-value logic type is useful for representing "missing" foreign keys, but 2-value logic type is arguably more useful when searching/sorting/aggregating.
I think we would have been better-off by treating FKs (and maybe outer JOINs) as a special case, and using 2-value logic everywhere else.
I don't see why this is weird. Unique means no duplicates. Nulls can't be duplicates of each other because they're not equal to each other.
If you don't like null semantics, you're free to use sentinel values. You can make all the sentinel values the same, or you can make them all different. Either way, you or someone who has to use your system will be back here tomorrow complaining about how weird it is.
This has always made queries unpredictable in many scenarios and it should be a feature to turn nulls off entirely and swap them out with Option<T> instead.
This reminds me back in the day when I was writing a DSL for a project.
Since the data we were getting was sourced from an RDBMS, I wanted NULL to be a first class concept in the DSL, with similar traits.
Early on, I simply made any expression that involved a NULL result in NULL. Naively this was all well and good, but it failed spectacularly in condition statements.
Instead of A = NULL == false, I had A = NULL == NULL. And, as you can imagine, a single NULL in the expression would just pollute the entire thing, and since NULL was considered as FALSE for conditionals, any NULL in an expression made the entire thing, eventually, FALSE.
Naturally I went back and made the comparison operators always return booleans. But it was a fun little side effect at the time.
I have deep but vague thoughts around the concept. My first intuition is that we have put too many things under NULL and None and such.
Partially, we use ... and I'll be very broad here ... "variables" as boxes we look in for answers. Answers to questions, answers to "Hey I put something in there for the time being to reference later." If I went into programming terms rather than just SQL, sometimes we get meta-answers.
You haven't made the box yet (declared the variable).
You haven't decided how the box is structured (picked a type or a length or something).
Okay, you did those but the box is virgin (nothing has been placed in the box yet).
That kind of thing. An empty set for "yes, you asked but nothing meets those criteria."
It is not supposed that null is the bottom value in the universe of all the values that your program can recognize? Why people need to complicate it?, and yeah in that definition `null == null`, but a `null_pointer != null` because null pointer is at the bottom of all the possible pointer value, and null by itself is not a pointer. The same for (0,null), (false, null) and ("", null). null should only be equal to itself.
And lastly undefined != null, because undefined is related with structures indicating that a field was not defined when the structure was created
NULL is the absence of a value. If you try and treat it as a value, you're going to have a bad time. So an attempted UNIQUE(email_address, deleted_at) constraint is fundamentally flawed. If you treated NULL as a value that could be unique, you're going to break foreign keys.
But let's continue the logic of deleted_at being NULL indicating an active account, which seems to the intent here. You end up doing things like:
SELECT /* ... */
FROM accounts
WHERE email_address = '...'
AND deleted_at IS NOT NULL
Depending on your database, that may or may not index well. More problematic, you may end up with privacy leaks if someone forgets the last conditional.
If anything, you want to reverse this so someone has to go out of their way to explicitly select deleted accounts. There are multiple strategies for this eg using an active_accounts view or table.
Lastly, there are lots of potential reasons for an account to be disabled or otherwise not visible/accessible. Takedowns, court orders, site safety, hacked accounts and so on.
Overloading deleted_at to have a semantic meaning for an active account is just fundamentally bad design.
> Overloading deleted_at to have a semantic meaning for an active account is just fundamentally bad design.
Then don't do that. It's kind of a leap to say soft deletes are categorically bad because someone might confuse "deleted" with "inactive". My users table does the super-advanced thing of having both columns. The ORM also doesn't forget to add the not-null criterion. There's also zero databases in active use where it poses a problem to indexing.
Soft deletes suck in their own way, but none of the alternatives are perfect either.
You put the "is not null" on the index itself and then simply don't use it for the much rarer queries that are on deleted accounts. Or just use a view for active accounts.
Overloading timestamps to carry a boolean on null is awesome as long as you decide that's what you're doing and use one of the several standard techniques to dodge the easily avoided potential downside.
This isn't a valid security concern, more than any other incorrect sql query would be anyway. A dev can always write a bad Q, you need another way to address that it's not more likely here because of the null.
SQL NULL is also coerced to different boolean values depending on context e.g. in a WHERE clause NULL is coerced to false, whereas in a CHECK constraint NULL is coerced to true
And SQL null shares the feature with many other languages that any type can be a NULL (although a column can be set NOT NULL). Much like Java, it is no end of grief that a type that claims to be, say, a “NUMBER” is actually “NUMBER | NULLType”.
null != null is pretty bizarre at first, until you understand the reason the did it was to try to make sense of null-able indexed columns. Not sure why we couldnt have our cake and eat it, but instead we got IS NOT NULL is not the same as != NULL
I think (blasphemous hot take ahead) that the standards of implementation of relational models are wrong.
NULLs still have their (rare) place, but the foremost issue with query results is that they are tabular rather than hierarchical. The main culprits being (1) outer joins that represent or induce nonsensical operations and (2) lack of non-null “zero” values for types like date. Of course hierarchies can make querying more complex, but mostly in cases where the relational logic goes crazy itself and you had to go tabular anyway.
If you think of it, distinct, group by and windowing feel like workarounds in tabular mode but would be natural to hierarchies, because everything is naturally distinct and grouped-by by design and windows are basically subtables in these rows.
Bonus points you could fetch “SELECT FROM a, b_rows LEFT JOIN b AS b_rows …” in a single query without duplicating `a`s and nullifying `b`s when N <> 1. And when you aggregate through a column in `b`, there’s no headache what to do with join-produced NULLs (unless `b` columns are nullable by your design, then it’s on you). And when it all arrives to a client, it’s already well-shaped for ui, processing, etc. No more:
last_a_id = undefined
for (row of rows) {
if (row.id != last_a_id) {
…
last_a_id = row.id
}
…
}
I’m pretty sure you recognize this programming idiom immediately.
Before you criticize, I’m not talking about hierarchical/OO tables. Only about ways of getting and handling query results. You still can reshape a relation like you want. The difference is that a database engine doesn’t have to put it all onto a (N x M x …) table and instead creates sort of a subset of relations which is efficient in space and natural to walk through. It already does that when walking through indexes, selects are naturally hierarchical. All it has to do is to track relations it went through rather than just dumping rows from a set of cursors that it knows the start and end points of, but loses this knowledge by writing into a plain table.
Ah yes, someone discovering the existence of three value logic in SQL and expecting 2VL behavior. Classic. We've all been there, right?
Personally I wish more languages were like python or ruby and had chosen None or Nil over Null which alleviates the confusion a bit, as those names better indicates that it's NOT an "unknown" (1 unknown value != 1 other unknown, which intuitively makes sense.) In ruby or python it's more obvious that None and Nil are "nothing" types and therefore equivalence makes sense (nil == nil, None == None are both true)
The NULLs in unique constraints quirk actually works differently in ORACLE databases, which is infuriating to say the least. Apparently this comes from some ambiguity in some sql standard, anyone know more about this?
hiAndrewQuinn|1 year ago
If you want you can read NULL as UNKNOWN and suddenly a whole bunch of operations involving them become a lot more intuitive:
1. TRUE OR UNKNOWN = TRUE, because you know you have at least one TRUE already.
2. TRUE AND UNKNOWN = UNKNOWN, because you don't know whether you have two TRUEs or not. It's just out there.
3. UNKNOWN XOR UNKNOWN = UNKNOWN, because it could darn near be anything: TRUE XOR TRUE, TRUE XOR FALSE, FALSE XOR FALSE, FALSE XOR TRUE... Internalizing this is where SQL's use of NULL / UNKNOWN really becomes intuitive.
4. (TRUE AND FALSE) XOR (TRUE OR UNKNOWN) = (FALSE) XOR (TRUE) per #1 = TRUE. See, it's consistent, you just need to keep in mind that if you have a lot of known UNKNOWNs they're quite parasitic and your final outcome is likely to be, itself, an UNKNOWN. Just like in real life!
crazygringo|1 year ago
> The fact that NULLs are distinct for UNIQUE columns but are indistinct for SELECT DISTINCT and UNION continues to be puzzling. It seems that NULLs should be either distinct everywhere or nowhere. And the SQL standards documents suggest that NULLs should be distinct everywhere. Yet as of this writing, no SQL engine tested treats NULLs as distinct in a SELECT DISTINCT statement or in a UNION.
tessierashpool9|1 year ago
gigatexal|1 year ago
mrkeen|1 year ago
thayne|1 year ago
Pxtl|1 year ago
Kleene's TRUE-FALSE-UNKNOWN logic is weird.
SQL nulls effectively violate the reflexive property of equality, because X=X does not result in a value of TRUE.
And in many contexts in SQL, NULL is treated as equivalent to false, such as within a WHERE clause.
So that means that X=X is effectively FALSE in SQL*.
That is a clown language.
*(yes, I have the same opinion about ANSI NaN, but that wouldn't come up so often if Javascript didn't love NaN for some stupid reason.)
zigzag312|1 year ago
Being based on someone's logic is not sufficient. Most weird things are based on some (weird) logic.
tshaddox|1 year ago
1. x ∧ 1 = 1 (identity law for conjunction)
2. x ∨ 0 = 1 (identity law for disjunction)
btown|1 year ago
This is the mental model that I use and teach: a NULL is a virus that will likely make everything it touches NULL... unless cordoned off by a COALESCE or IFNULL or something else that explicitly speaks about its NULL-sensitive behavior in its documentation.
So if you see a bug where a WHERE or FILTER clause is unexpectedly failing, and you're in a time crunch, rather than pounding your head against the wall, start sprinkling COALESCE statements around (while being thoughtful about the domain meaning of what you're doing) and it's more likely than not to solve your problem!
unknown|1 year ago
[deleted]
remywang|1 year ago
A data expression always evaluates to a data value, and usually whenever any part of the expression is NULL, the entire expression evaluates to NULL.
A comparison evaluates to a truth value, and usually when a comparison invovles a NULL it returns UNKNOWN. This leads to weird behaviors where both `SELECT 3 WHERE NULL = NULL;` and `SELECT 3 WHERE NULL <> NULL;` returns nothing (because the query engine does not output a row if the predicate returns UNKNOWN on it).
What you listed above only comes into play for boolean/logical connectives like AND, OR, NOT, and in that case we follow 3-valued logic.
And there's more annoying corner cases when you deal with DISTINCT. The situation is so hopeless that SQLite has a whole table documenting divergent behaviors of NULL in different systems: https://www.sqlite.org/nulls.html
demurgos|1 year ago
Agreed with all of this, it would probably have been better if they were named `unknown` instead of reusing the `null` keyword.
Note also that since Postgresql 15, you can use `NULLS NOT DISTINCT` when creating a unique index [0]. I'm less familiar with other databases.
0: https://www.postgresql.org/docs/15/sql-createtable.html
layer8|1 year ago
cm2187|1 year ago
magicalhippo|1 year ago
That said, as someone self-taught in SQL, I agree NULL was not a good choice.
Replacing NULL with UNKNOWN and the third boolean value as INDETERMINATE for example would have been better.
[1]: https://stackoverflow.com/a/79270181
[2]: https://learn.microsoft.com/en-us/sql/t-sql/language-element...
[3]: https://www.postgresql.org/docs/current/functions-comparison...
NoMoreNicksLeft|1 year ago
Why would anyone want to use another database?
duncan-donuts|1 year ago
bunderbunder|1 year ago
Which is why I continue to be such an ORM skeptic. I agree that they're convenient. But I do worry that we've now got an entire generation of engineers who regularly interact with relational databases, but have largely been spared the effort of learning how they actually work.
As another commenter pointed out, if you've learned basic relational algebra then the way SQL nulls behave seems obvious and logically consistent. The logic is the same as the logic behind the comparison rules for NaN in IEEE floats. It's the behavior of C-style nulls that is, always and forever, a billion-dollar mistake.
feoren|1 year ago
thrance|1 year ago
hot_gril|1 year ago
zzzeek|1 year ago
irrational|1 year ago
hyperman1|1 year ago
This attitude was so prevalent at the time, I sometimes wonder if the rise of noSQL was simply people sick of dealing with Oracle DBAs
zo1|1 year ago
svieira|1 year ago
datadrivenangel|1 year ago
grahamlee|1 year ago
dataflow|1 year ago
Could you explain how this makes sense then?
If NULL is just "unknown" then shouldn't this be a type error?Moreover, could you explain why the heck this ought to be empty?
As a human this looks insane to me, "relational algebra" be damned. You find a row, then you look it up again, and it's not there? What the hell?whstl|1 year ago
I will die on the hill that regular C-like nulls are the actual thing that's weird.
The real billion dollar mistake [1] was the damage it made on the minds of developers.
[1] https://en.wikipedia.org/wiki/Tony_Hoare
iefbr14|1 year ago
acuozzo|1 year ago
This depends on the context, no? I doubt there exists someone with a contact list on their phone which has every single field for every single contact populated.
There needs to be some way to codify that a field in a datarecord is unpopulated. Using the "zero value" for the type of the field (e.g., the empty string) is reasonable, but is this necessarily better than NULL? I reckon an argument can be made that this approach is just as likely to lead to bugs.
I'm not necessarily in favor of NULL, for what it's worth, but I can't think of an adequate replacement which doesn't reduce to "NULL in sheep's clothing".
DangitBobby|1 year ago
jfb|1 year ago
zokier|1 year ago
giraffe_lady|1 year ago
Null is more confusing because it means different things in different languages. Sometimes it's a more constrained uncertainty, eg this definitely doesn't exist. But in sql it's a less constrained uncertainty, like "undefined" in math. The value of this thing couldn't make sense in this context, but we can't make assertions about its existence.
dunham|1 year ago
galaxyLogic|1 year ago
What's interesting is, do we mean that in our data that attribute has no value? Or do we mean the real-world object represented by the data does not have that attribute?
Does null mean
a) We don't know the value of this attribute for this object, or
b) We do know that there is no value for this attribute in the real-world object represented by our data.
In JavaScript because there is both null and undefined it is easy to assume that undefined means we don't know the value and null means we do know it has no value.
EXAMPLE: The attribute 'spouse'. Some people have a spouse some don't. So what does it mean if the value of the field 'spouse' is null? That we know there is no spouse, or that we don't know who the spouse is if any.
In practical terms we can say null means "We don't know" which includes the case that there is no spouse.
Izkata|1 year ago
Javascript objects have two kinds of undefined that are both represented by the same value. You have to use another method to see which it is, and I've seen "foo" in this example used for the same thing as "null" in your example:
This is something you have to account for because the key is still there if you try to remove a value by just setting it to undefined: This is the right way to remove the key:mnsc|1 year ago
andai|1 year ago
bogeholm|1 year ago
unknown|1 year ago
[deleted]
kopirgan|1 year ago
The princess eyes are as blue as the stone of Galveston
Have you seen the princess eyes?
No!
Have you seen the blue stone of Galveston?
No!
So you're comparing something you've never seen with something else you've never seen!
That's NULL comparison
ludwik|1 year ago
The value of NULL in a particular table cell is simply a way to indicate 'no value'. If you want the values in a column to be unique, cases where there are no values shouldn't be considered.
This plays out similarly in practice. For example, you may want to allow users to optionally reserve a username, and if they do, those usernames should be unique. It's hard to imagine a use case where by wanting a field to be both optional (nullable) and unique, you mean that the field should be optional for a single record (!) and required for all the rest. Of course, you mean that IF there is a value, THEN it should be unique.
Recursing|1 year ago
(Although in rare cases that is even weirder: https://stackoverflow.com/a/58998043 )
ziml77|1 year ago
lolpanda|1 year ago
int_19h|1 year ago
dalton_zk|1 year ago
kijin|1 year ago
Too many newbies hear that NULL is bad, so they declare all columns as NOT NULL and end up inserting ad hoc values like 0, -1, '', or {} when they inevitably come across cases where they don't have data. Which is even worse than NULL.
branko_d|1 year ago
I think we would have been better-off by treating FKs (and maybe outer JOINs) as a special case, and using 2-value logic everywhere else.
al2o3cr|1 year ago
kurtbuilds|1 year ago
`1 is not distinct from NULL` => false
`NULL is not distinct from NULL` => true
`0 is not distinct from 1` => false
blast|1 year ago
getnormality|1 year ago
If you don't like null semantics, you're free to use sentinel values. You can make all the sentinel values the same, or you can make them all different. Either way, you or someone who has to use your system will be back here tomorrow complaining about how weird it is.
otteromkram|1 year ago
https://modern-sql.com/concept/null
(Note: I am not affiliated with that bloh/website in any way, shape, or form.)
gxt|1 year ago
solumunus|1 year ago
hot_gril|1 year ago
tzury|1 year ago
https://blog.rustprooflabs.com/2022/07/postgres-15-unique-im...
Practically speaking, I go with not null, and always set default value.
whartung|1 year ago
Since the data we were getting was sourced from an RDBMS, I wanted NULL to be a first class concept in the DSL, with similar traits.
Early on, I simply made any expression that involved a NULL result in NULL. Naively this was all well and good, but it failed spectacularly in condition statements.
Instead of A = NULL == false, I had A = NULL == NULL. And, as you can imagine, a single NULL in the expression would just pollute the entire thing, and since NULL was considered as FALSE for conditionals, any NULL in an expression made the entire thing, eventually, FALSE.
Naturally I went back and made the comparison operators always return booleans. But it was a fun little side effect at the time.
at_a_remove|1 year ago
Partially, we use ... and I'll be very broad here ... "variables" as boxes we look in for answers. Answers to questions, answers to "Hey I put something in there for the time being to reference later." If I went into programming terms rather than just SQL, sometimes we get meta-answers.
You haven't made the box yet (declared the variable).
You haven't decided how the box is structured (picked a type or a length or something).
Okay, you did those but the box is virgin (nothing has been placed in the box yet).
That kind of thing. An empty set for "yes, you asked but nothing meets those criteria."
criloz2|1 year ago
And lastly undefined != null, because undefined is related with structures indicating that a field was not defined when the structure was created
jmyeet|1 year ago
But let's continue the logic of deleted_at being NULL indicating an active account, which seems to the intent here. You end up doing things like:
Depending on your database, that may or may not index well. More problematic, you may end up with privacy leaks if someone forgets the last conditional.If anything, you want to reverse this so someone has to go out of their way to explicitly select deleted accounts. There are multiple strategies for this eg using an active_accounts view or table.
Lastly, there are lots of potential reasons for an account to be disabled or otherwise not visible/accessible. Takedowns, court orders, site safety, hacked accounts and so on.
Overloading deleted_at to have a semantic meaning for an active account is just fundamentally bad design.
chuckadams|1 year ago
Then don't do that. It's kind of a leap to say soft deletes are categorically bad because someone might confuse "deleted" with "inactive". My users table does the super-advanced thing of having both columns. The ORM also doesn't forget to add the not-null criterion. There's also zero databases in active use where it poses a problem to indexing.
Soft deletes suck in their own way, but none of the alternatives are perfect either.
giraffe_lady|1 year ago
Overloading timestamps to carry a boolean on null is awesome as long as you decide that's what you're doing and use one of the several standard techniques to dodge the easily avoided potential downside.
This isn't a valid security concern, more than any other incorrect sql query would be anyway. A dev can always write a bad Q, you need another way to address that it's not more likely here because of the null.
indeed30|1 year ago
qwertydog|1 year ago
https://dbfiddle.uk/C5JqMP8O
jakubmazanec|1 year ago
[1] https://www.edgedb.com/blog/we-can-do-better-than-sql
trollbridge|1 year ago
boxed|1 year ago
Imo, SQL should add "NOTHING", add "UNKNOWN" as a synonym for "NULL", and deprecate "NULL".
exabrial|1 year ago
xd|1 year ago
edit: an empty string, false, 0 are all values.
anonnon|1 year ago
osigurdson|1 year ago
wruza|1 year ago
NULLs still have their (rare) place, but the foremost issue with query results is that they are tabular rather than hierarchical. The main culprits being (1) outer joins that represent or induce nonsensical operations and (2) lack of non-null “zero” values for types like date. Of course hierarchies can make querying more complex, but mostly in cases where the relational logic goes crazy itself and you had to go tabular anyway.
If you think of it, distinct, group by and windowing feel like workarounds in tabular mode but would be natural to hierarchies, because everything is naturally distinct and grouped-by by design and windows are basically subtables in these rows.
Bonus points you could fetch “SELECT FROM a, b_rows LEFT JOIN b AS b_rows …” in a single query without duplicating `a`s and nullifying `b`s when N <> 1. And when you aggregate through a column in `b`, there’s no headache what to do with join-produced NULLs (unless `b` columns are nullable by your design, then it’s on you). And when it all arrives to a client, it’s already well-shaped for ui, processing, etc. No more:
I’m pretty sure you recognize this programming idiom immediately.Before you criticize, I’m not talking about hierarchical/OO tables. Only about ways of getting and handling query results. You still can reshape a relation like you want. The difference is that a database engine doesn’t have to put it all onto a (N x M x …) table and instead creates sort of a subset of relations which is efficient in space and natural to walk through. It already does that when walking through indexes, selects are naturally hierarchical. All it has to do is to track relations it went through rather than just dumping rows from a set of cursors that it knows the start and end points of, but loses this knowledge by writing into a plain table.
drzaiusx11|1 year ago
Personally I wish more languages were like python or ruby and had chosen None or Nil over Null which alleviates the confusion a bit, as those names better indicates that it's NOT an "unknown" (1 unknown value != 1 other unknown, which intuitively makes sense.) In ruby or python it's more obvious that None and Nil are "nothing" types and therefore equivalence makes sense (nil == nil, None == None are both true)
ungut|1 year ago
MathMonkeyMan|1 year ago
khana|1 year ago
[deleted]