top | item 36825668

Nulls Nullified (2005)

28 points| declanhaigh | 2 years ago |dbazine.com

44 comments

order

cubefox|2 years ago

The author talks a lot about how NULL in SQL is supposedly unnecessary, and claims that it is totally possible to handle unknown SQL values without NULL. Yet he successfully avoids explaining how his solution is supposed to work. At the end he links to a (paid?) paper where his solution is apparently explained, though the link has since ceased to work. Great.

dspillett|2 years ago

There are many papers describing how NULL as an explicit “value” (you still at least need something to present empty results from left or right joins) can be made unnecessary – just look back to the development of relational databases and the theory that are based upon. No need to pay for this author's paper when it probably just restates things that are found in many open access papers/books/other.

Representing unknown/non-existent without allowing an explicit NULL value is can be modelled by the property being another entity, you have a child table that lists the values of the property so where you would store a NULL otherwise there simply exists no row in that table. In SQL this means extra joins and in most (all?) SQL implementations this results in (sometimes significantly) lower performance. The big argument there is whether this points to a problem in the theory (if arguing that NULL should not be “stored”), in SQL as an implementation of relational database theory, or in the implementations of SQL…

[I'm aware my terminology is all over the place, as most people's is: when talking about relational theory rather than SQL as a speicific implementation of it I should use tuple not row, relation not table, etc., but getting that right only serves to confuse people (a great many of those with a more self-taught and/or field trained programming background background than one that involves any computer science study) who know only SQL and have thus far not needed to be aware of the theory or history].

quickthrower2|2 years ago

NULL makes little sense from first principles.

Either get rid of it, or allow nulls (and logically any other union) by supporting union types.

It is quite arbitrary to allow a type that makes sense (string, int) etc. and then also allow nulls so you allow INT | NULL union but no other unions. So you have this multi-purpose "other value" whose meaning is inferred by the application.

I guess they were added as a pragmatic "I dunno" field for CRUD systems without needing to go to all the effort to support unions. For example not everyone has a middle name, but you don't want to go all 5th Normal Form on it.

rstuart4133|2 years ago

I don't know about not making sense from first principles, but if a principle concern is bug free code then NULL is a disaster. I've seen so bugs caused by people forgetting a column could be NULL and using '<', '<=' I avoid them like the plague now. It's not just the unfamiliar behaviour, it's also that everyone forgets to write tests for it.

A simple fix would be to add a dialect option that makes any operation on a potentially null value a syntax error. This is possible because in SQL it's almost always possible to determine if a value X could potentially be NULL or not. If it could be NULL, something like (X == y) or (X + y) should generate a syntax error. Instead you should have to write (X is not null and X == y) or ISNULL(X + y, 0).

dtech|2 years ago

NULL/missing is so common that it makes sense to provide special language support, even if you don't have full unions. See e.g. Kotlin and Typescript.

What doesn't make sense is to treat a nullable type the same as non-nullable type causing errors everywhere (Java), or make it so special that it basically has completely separate logic and operators attached (SQL)

sacado2|2 years ago

In theory NULL (or any such sentinel value, such as the infamous NaN is floatin-point arithmetic) makes sense because there are partially defined functions. Division being the most famous example, but most basic operations on lists share that property too.

The problem is the way it's handled in many programming languages, where it can blend so easily with legitimate value and blow the whole program off at execution time.

It's a practical problem, not a theoretical one.

And it's not the same as union types. The division of two numbers always returns a number, and nothing else, it's just that, for some values, it's not defined.

Mikhail_Edoshin|2 years ago

As far as I understand NULLs were indeed a pragmatic choice, but the chief reason was the need to compute derived table values. E.g. in a join it is normal to get unknown values for a cell. This has to be expressed somehow. How? NULL seems to be a reasonably good generic solution to this. It may be possible to come up with a different generic solution, but such a solution would probably require something like conditional fields in a table and this is a whole new level of relational logic.

reichstein|2 years ago

I'll have to disagree.

We can all agree that the "silently accept null" approach is, by now, a quadrillion dollar mistake.

But having a safe "or null" union type as the _only_ union type in your language, isn't as far fetched as you make it sound.

It's the good old "allow zero, one or an infinite amount of any feature" rule. Sometimes "one" is the right choice.

And the same principle applies to values. A plain value is one value. A list/collection is an arbitrary number of values. And `null` is no value. Some would, rightly, say that `void` is no value, so `null` it's really a value representing no value. And then you will also want to represent zero-or-one value, which is where the -or-null type comes in.

This is a more fundamental union than just "a Foo or a Bar" which is always a value, then we're just arguing over type.

You can always use an `Option` class instead. If you have classes. But if you're going to use `Option.none` to represent the value of an uninitialized variable, you're going to build it into the language anyway, and then you might as well admit to that one union type.

cubefox|2 years ago

While I'm a big proponent of union types in programming languages (against Haskell-like Option wrapping and unwrapping), it is not clear to me that they would work for something like SQL. Would it mean allowing columns with union types?

bazoom42|2 years ago

As soon as you have composite types, you will want to filter and join on the individual components, which would require extending the query language to allow drilling down into nested composite types. Basically you are back to hieracical databases. This is why Codd recommends expressing composite types as relations rather than complex values. You can express the same information, but can use relational algebra rather then special-case operators.

Nulls are a special case though - even if null was not allowed in base tables, you would still need them in outer joins.

petalmind|2 years ago

> For example not everyone has a middle name, but you don't want to go all 5th Normal Form on it.

(actual question) how does 5NF apply here?

I thought that this case could be handled null-free by introducing a separate table

people_middlenames (human_id PK, middlename NOT NULL);

Is it already 5NF? I thought 5NF requires more complex structure of data, like explained in Wikipedia, for example.

Thanks,

SPBS|2 years ago

> NULL makes little sense from first principles.

You need something to represent lack of a value on LEFT JOINs. The author acknowledges as much, the two things he seems to be complaining about are:

1. It's impossible to distinguish between true lack of value from LEFT/RIGHT JOIN and a value that was directly set as NULL.

- Interestingly, this is exactly why JavaScript has both null and undefined, a design decision that many people also complain about and grumble that it should have been unified into one.

2. There's no reason for NULL to not equal NULL, which makes NULL a headache to handle and requires special IS NULL/IS NOT NULL operators to test for NULL-ness.

- Postgres 15 now addresses this with the DDL option UNIQUE NULLS NOT DISTINCT.

Finally, the author claims to have a solution that solves all these problems but sadly locked it behind a paywall on a website which is now defunct.

> Until recently, there was no logically correct, relational solution to missing data. We offer an outline of a possible such a solution in Practical Database Foundations paper #8, “The Final NULL in the Coffin,” which also summarizes the problems with NULLs.

http://www.dbdebunk.citymax.com/page/page/1396241.htm

> This website has been cancelled.

> Click here to go to the 5-minute website builder.

https://web.archive.org/web/20041209115415/http://www.dbdebu...

> ORDERING AND PRICING

> Delivery will be in PDF format. We strongly recommend to upgrade to the latest version of Acrobat reader and to have the following fonts installed: Verdana, Courier, Arial Narrow, Arial Black and Wingdings, so we don't have to embed these fonts and enlarge the files.

classified|2 years ago

It's not a union type, but an option. INT | NULL --> Option<INT>.

contravariant|2 years ago

Why do pointed sets not make sense from first principles?

tragomaskhalos|2 years ago

I assume the author's alternative to null is to punt nullable values into their own table, where the presence of the row tells you that the value is in fact there. Thing is, if your table has three such columns then you're going to need three extra tables. The whole argument is reminiscent of the polemic against surrogate keys, viz that they are a violation of Codd. No-one is arguing that nulls in SQL aren't a mess, but an RDBMS is a tool, not a holy shrine, and most of us prefer to just use bloody things to get work done and ignore the purist bikeshedding.

TRiG_Ireland|2 years ago

The author is right about one thing. If NULL means "unknown", then the result of the comparison A > B where either or both values is NULL should itself be NULL, not FALSE.

cactusfrog|2 years ago

Null means unknown. I think the main issue is that people use it to mean “not applicable to this row”, which is different than null’s initial purpose. I think the problems with null could be solved by adding in another None value to distinguish between these two situations.

al2o3cr|2 years ago

LOL @ "I'm smarter than the authors of the SQL standard, buy my paper to find out why!"