top | item 3900224

An employee, whose last name is Null, kills our employee lookup app

538 points| willvarfar | 14 years ago |stackoverflow.com

148 comments

order
[+] NelsonMinar|14 years ago|reply
This is funny, but it's also a real world example of the kind of encoding nightmare that made SOAP RPC encoding really awkward. Various SOAP toolkits used to serialize a missing value as the empty string, or a literal value like "null" or 0, or all sorts of awfulness. I think the correct thing for the spec is to set xsi:nil="true" as an attribute on the XML tag in question, but IIRC about half the toolkits didn't understand that.

(I speak in the past tense of SOAP because I am an optimist.)

[+] danudey|14 years ago|reply
I worked at a company where we were replacing the user-facing component of our giant, ugly PHP storefront with a Rails version; in doing so, our developers implemented a JSON bridge between the two, allowing the frontend and backend to operate separately, using separate databases (and actually, they were in separate data centres).

As we were testing, we found that some products in our database would cause a JSON decoding error on the Rails side. After a few minutes, we realized the problem. We had a string field for something (product IDs, manufacturer SKU, etc). On the PHP side, the JSON encoder was using PHP's is_numeric() for each field to see if the field was a number (to determine how to encode it). Some of the SKUs, however, happened to be composed entirely of digits, and for those, PHP encoded them into the JSON as integer values. This, of course, broke on the Rails end, because Rails was expecting a string value and got an integer value.

In the end, we had to write a surprising amount of code to work around the brain damage involved, since regardless of what we tried to do PHP wanted, by default, to send things as integers whenever possible. I believe the final fix was to actually patch the JSON encoder library and special-case that field.

[+] scott_s|14 years ago|reply
This is really a fundamental problem: how do you indicate operation failure? This relies on two things: the range (the valid output values) of the operation itself, and the range of the datatype you're mapping the operation's result to.

If the operation and the datatype's range are not equal, then you can indicate failure inside the return value by applying special meaning to invalid values. But if the operation and the datatype's range are equal, then you need another distinct value to indicate failure. The difficulty is in recognizing which situation you're in, and as you point out, this is one where, effectively, the operation and the datatype have the same range.

[+] wpietri|14 years ago|reply
Wow. I love that the S stands for Simple. "You keep using that word. I do not think it means what you think it means."
[+] narrator|14 years ago|reply
They must have got the absolutely brilliant idea of conflating null with the empty string from Oracle.
[+] joelhaasnoot|14 years ago|reply
Ran into this with a REST XML API recently where someone was trying to do some reflection-type serialization of XML. The API had longitude and latitude of all train stations, and some genious decided to call the tags 'lat' and 'long'. 'long' conflicted with the datatype Long and it wasn't fun. Version 2 of the API has fixed this issue luckily.
[+] kodablah|14 years ago|reply
I think the absence of the element/attribute is the best way to define null assuming your XSD is set up properly. Many XML marshalling libraries work well with this approach.

(note, I too have long since abandoned SOAP)

[+] billybob|14 years ago|reply
I have joked that I might change my name to Sample User, develop a piece of land in the country, and name my road Example Avenue, taking address 123. This would make me impervious to datamining, because my results would always be thrown out.

But a last name of 'Null' may be even better. :)

[+] aqme28|14 years ago|reply
On the contrary, you'd probably receive a lot of "test" mail that leaked through.
[+] mkopinsky|14 years ago|reply
If a patient with the last name of "Mouse" ever checks in to the hospital where I work, I have doubts about whether any of his labs will be performed. Standard practice is when creating a test user in production or placing a test order, name him Anything Mouse and people know to simply delete the request from the system.
[+] sequoia|14 years ago|reply
If others create dummy content anything like I do, you'd do better to name yourself asdffsadf asfafs.
[+] cstuder|14 years ago|reply
When a website asks me for my birthday, I usually put 01.01.1970 into it.

Any system administrator looking at that will either be amused or search for the error in his date time parser.

[+] losvedir|14 years ago|reply
> develop a piece of land in the country

Careful about picking a low-populated area like this. I used to live in a town with population of about 2,000 and the post office clerks knew most everyone by name. One time I signed up for a site and just used "123 Blah St." as a placeholder address. Months later, some letter was mailed to that address, but the mail clerk, recognizing my name, just helpfully put it in my PO Box anyway!

[+] ShabbyDoo|14 years ago|reply
I once worked for a medical records software company. We received a bug report that a particular patient's record could not be viewed. Our support engineer remoted into the client's site and asked the secretary for the patient's name. It was Bobby Null. You can imagine what sort of underlying assumption about String serialization led to this issue. [A preemptive aside: We had proper confidentiality agreements in place. No HIPAA rules were violated.]
[+] jsprinkles|14 years ago|reply
Doesn't telling us the patient's name violate HIPAA in itself?
[+] cafard|14 years ago|reply
Sort of weirdly classical, like Odysseus identifying himself to the Cyclops as "Noman".
[+] MrJagil|14 years ago|reply
Or the schizophrenic in Hitchcocks "Psycho" called Norman.
[+] ScottBurson|14 years ago|reply
This is probably a direct consequence of the fact that XML (unlike S-expressions, or JSON) fails to be self-describing. See [PDF]: http://homepages.inf.ed.ac.uk/wadler/papers/xml-essence/xml-...
[+] Nitramp|14 years ago|reply
XML is self-describing, it just so happens that XML's data model is not identical (and actually not even close) to SOAP's data model, or the typical programming language's data model.

XML itself only describes a text encoding, XML infoset describes node labeled trees, possibly graphs through xml:id and idref.

Unlike JSON it doesn't have a concept of null, it only has absence of a node. The authors of SOAP just invented a truly terrible way of mapping XML into a programming language's constructs (which are typically edge labeled trees with typed nodes).

XML is actually a decent data format for markup. Using it for other purposes (RPC format, configuration files, ...) usually doesn't end well.

[+] aidos|14 years ago|reply
Ironic considering that the language behind (coldfusion) doesn't even have a concept of null (it just uses empty string).
[+] vitomd|14 years ago|reply
Well my last name have a ñ . So for example my credit card have a weird character like "&" . Others just change to n. My last name crash a educational site when I registered
[+] TazeTSchnitzel|14 years ago|reply
Could you not replace it with 'ny' or something similar?
[+] joeyh|14 years ago|reply
Something that worries me about perl to no end is tests like:

if ($lastname) { ... }

This fails when $lastname="0". But I am constantly seeing perl code that does it.

[+] kgtm|14 years ago|reply
Actually, what worries you is not Perl per se, but people that write Perl code and don't know what they want to test for. The code shown interrogates $lastname for values that represent truth in Perl, while it ought to be checking for definedness:

    if(defined $lastname) { ... }
The two are totally different cases. I would also argue that the problem lies elsewhere if you have values for a 'lastname' field in your data set that consist of a single letter.
[+] toyg|14 years ago|reply
It's not Perl. I recently had a conversation with a friend who didn't like me using "if myvar == 0:" or "if myvar is 0:" in python code rather than "if myvar:". Call me paranoid, but i like to be as explicit as possible in my checks, you never know when magic conversion tricks (which are often platform- or implementation-dependant) will end up biting you in the ass.
[+] jrgnsd|14 years ago|reply
The same problem exists in PHP, where "", "0" and 0 are all treated as false if you don't check the type. Welcome to the trap of loose typing.
[+] funkeemonk|14 years ago|reply
17 years ago, when I got my second Internet account with my ISP, I filled in these 3 names for my choice of email address on their paper signup form.

root@ , nobody@ and daemon@

They gave me "daemon". I've terminated that account long ago, but last I checked (6 years ago?), I could still retrieve emails and dial in using a modem using that account.

[+] why-el|14 years ago|reply
This is hilarious. Seriously, the question votes were being incremented live. :)
[+] gouranga|14 years ago|reply
This is exactly why you should never mix data and code/markup. When the semantic barrier is broken, all shit breaks loose.

I've always wondered if SICP style scheme would cause these sort of problems.

[+] ars|14 years ago|reply
It's more an issue of in-band or out-of-band signaling.

It's hard to do in-band signaling properly, but often time you only have a single data channel and then you have no choice.

[+] lifeisstillgood|14 years ago|reply
I believe that these errors are so common they represent a Cognitive bias on the part of programmers. At some point every developer wants to execute a one line command and have the system "do something". If they cannot get that one line, then they have two options. - Wrap up more abstraction code, until one line executes (the SOAP solution), or think deeply about what you are trying to do and take things away until one line is clear and obvious (The REST solution)