This is funny, but it's also a real world example of the kind of encoding nightmare that made SOAP RPC encoding really awkward. Various SOAP toolkits used to serialize a missing value as the empty string, or a literal value like "null" or 0, or all sorts of awfulness. I think the correct thing for the spec is to set xsi:nil="true" as an attribute on the XML tag in question, but IIRC about half the toolkits didn't understand that.
(I speak in the past tense of SOAP because I am an optimist.)
I worked at a company where we were replacing the user-facing component of our giant, ugly PHP storefront with a Rails version; in doing so, our developers implemented a JSON bridge between the two, allowing the frontend and backend to operate separately, using separate databases (and actually, they were in separate data centres).
As we were testing, we found that some products in our database would cause a JSON decoding error on the Rails side. After a few minutes, we realized the problem. We had a string field for something (product IDs, manufacturer SKU, etc). On the PHP side, the JSON encoder was using PHP's is_numeric() for each field to see if the field was a number (to determine how to encode it). Some of the SKUs, however, happened to be composed entirely of digits, and for those, PHP encoded them into the JSON as integer values. This, of course, broke on the Rails end, because Rails was expecting a string value and got an integer value.
In the end, we had to write a surprising amount of code to work around the brain damage involved, since regardless of what we tried to do PHP wanted, by default, to send things as integers whenever possible. I believe the final fix was to actually patch the JSON encoder library and special-case that field.
This is really a fundamental problem: how do you indicate operation failure? This relies on two things: the range (the valid output values) of the operation itself, and the range of the datatype you're mapping the operation's result to.
If the operation and the datatype's range are not equal, then you can indicate failure inside the return value by applying special meaning to invalid values. But if the operation and the datatype's range are equal, then you need another distinct value to indicate failure. The difficulty is in recognizing which situation you're in, and as you point out, this is one where, effectively, the operation and the datatype have the same range.
Ran into this with a REST XML API recently where someone was trying to do some reflection-type serialization of XML. The API had longitude and latitude of all train stations, and some genious decided to call the tags 'lat' and 'long'. 'long' conflicted with the datatype Long and it wasn't fun. Version 2 of the API has fixed this issue luckily.
I think the absence of the element/attribute is the best way to define null assuming your XSD is set up properly. Many XML marshalling libraries work well with this approach.
I have joked that I might change my name to Sample User, develop a piece of land in the country, and name my road Example Avenue, taking address 123. This would make me impervious to datamining, because my results would always be thrown out.
If a patient with the last name of "Mouse" ever checks in to the hospital where I work, I have doubts about whether any of his labs will be performed. Standard practice is when creating a test user in production or placing a test order, name him Anything Mouse and people know to simply delete the request from the system.
Careful about picking a low-populated area like this. I used to live in a town with population of about 2,000 and the post office clerks knew most everyone by name. One time I signed up for a site and just used "123 Blah St." as a placeholder address. Months later, some letter was mailed to that address, but the mail clerk, recognizing my name, just helpfully put it in my PO Box anyway!
I once worked for a medical records software company. We received a bug report that a particular patient's record could not be viewed. Our support engineer remoted into the client's site and asked the secretary for the patient's name. It was Bobby Null. You can imagine what sort of underlying assumption about String serialization led to this issue. [A preemptive aside: We had proper confidentiality agreements in place. No HIPAA rules were violated.]
XML is self-describing, it just so happens that XML's data model is not identical (and actually not even close) to SOAP's data model, or the typical programming language's data model.
XML itself only describes a text encoding, XML infoset describes node labeled trees, possibly graphs through xml:id and idref.
Unlike JSON it doesn't have a concept of null, it only has absence of a node. The authors of SOAP just invented a truly terrible way of mapping XML into a programming language's constructs (which are typically edge labeled trees with typed nodes).
XML is actually a decent data format for markup. Using it for other purposes (RPC format, configuration files, ...) usually doesn't end well.
Well my last name have a ñ . So for example my credit card have a weird character like "&" . Others just change to n. My last name crash a educational site when I registered
Actually, what worries you is not Perl per se, but people that write Perl code and don't know what they want to test for. The code shown interrogates $lastname for values that represent truth in Perl, while it ought to be checking for definedness:
if(defined $lastname) { ... }
The two are totally different cases. I would also argue that the problem lies elsewhere if you have values for a 'lastname' field in your data set that consist of a single letter.
It's not Perl. I recently had a conversation with a friend who didn't like me using "if myvar == 0:" or "if myvar is 0:" in python code rather than "if myvar:". Call me paranoid, but i like to be as explicit as possible in my checks, you never know when magic conversion tricks (which are often platform- or implementation-dependant) will end up biting you in the ass.
17 years ago, when I got my second Internet account with my ISP, I filled in these 3 names for my choice of email address on their paper signup form.
root@ , nobody@ and daemon@
They gave me "daemon". I've terminated that account long ago, but last I checked (6 years ago?), I could still retrieve emails and dial in using a modem using that account.
I believe that these errors are so common they represent a Cognitive bias on the part of programmers. At some point every developer wants to execute a one line command and have the system "do something". If they cannot get that one line, then they have two options. - Wrap up more abstraction code, until one line executes (the SOAP solution), or think deeply about what you are trying to do and take things away until one line is clear and obvious (The REST solution)
[+] [-] NelsonMinar|14 years ago|reply
(I speak in the past tense of SOAP because I am an optimist.)
[+] [-] danudey|14 years ago|reply
As we were testing, we found that some products in our database would cause a JSON decoding error on the Rails side. After a few minutes, we realized the problem. We had a string field for something (product IDs, manufacturer SKU, etc). On the PHP side, the JSON encoder was using PHP's is_numeric() for each field to see if the field was a number (to determine how to encode it). Some of the SKUs, however, happened to be composed entirely of digits, and for those, PHP encoded them into the JSON as integer values. This, of course, broke on the Rails end, because Rails was expecting a string value and got an integer value.
In the end, we had to write a surprising amount of code to work around the brain damage involved, since regardless of what we tried to do PHP wanted, by default, to send things as integers whenever possible. I believe the final fix was to actually patch the JSON encoder library and special-case that field.
[+] [-] scott_s|14 years ago|reply
If the operation and the datatype's range are not equal, then you can indicate failure inside the return value by applying special meaning to invalid values. But if the operation and the datatype's range are equal, then you need another distinct value to indicate failure. The difficulty is in recognizing which situation you're in, and as you point out, this is one where, effectively, the operation and the datatype have the same range.
[+] [-] wpietri|14 years ago|reply
[+] [-] narrator|14 years ago|reply
[+] [-] joelhaasnoot|14 years ago|reply
[+] [-] kodablah|14 years ago|reply
(note, I too have long since abandoned SOAP)
[+] [-] billybob|14 years ago|reply
But a last name of 'Null' may be even better. :)
[+] [-] aqme28|14 years ago|reply
[+] [-] mkopinsky|14 years ago|reply
[+] [-] sequoia|14 years ago|reply
[+] [-] cstuder|14 years ago|reply
Any system administrator looking at that will either be amused or search for the error in his date time parser.
[+] [-] losvedir|14 years ago|reply
Careful about picking a low-populated area like this. I used to live in a town with population of about 2,000 and the post office clerks knew most everyone by name. One time I signed up for a site and just used "123 Blah St." as a placeholder address. Months later, some letter was mailed to that address, but the mail clerk, recognizing my name, just helpfully put it in my PO Box anyway!
[+] [-] ShabbyDoo|14 years ago|reply
[+] [-] unknown|14 years ago|reply
[deleted]
[+] [-] jsprinkles|14 years ago|reply
[+] [-] wpietri|14 years ago|reply
http://caterina.net/archive/001011.html
Flickr cofounder Caterina Fake couldn't fly on Northwest Airlines because their system silently deleted her tickets.
[+] [-] cafard|14 years ago|reply
[+] [-] MrJagil|14 years ago|reply
[+] [-] ColinDabritz|14 years ago|reply
http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-b...
* No one has a name that is a reserved system keyword (Null, Nan, Unknown...)
[+] [-] read_wharf|14 years ago|reply
[+] [-] zbowling|14 years ago|reply
[+] [-] ars|14 years ago|reply
[+] [-] alanbyrne|14 years ago|reply
[+] [-] rdtsc|14 years ago|reply
http://xkcd.com/327/
[+] [-] ScottBurson|14 years ago|reply
[+] [-] Nitramp|14 years ago|reply
XML itself only describes a text encoding, XML infoset describes node labeled trees, possibly graphs through xml:id and idref.
Unlike JSON it doesn't have a concept of null, it only has absence of a node. The authors of SOAP just invented a truly terrible way of mapping XML into a programming language's constructs (which are typically edge labeled trees with typed nodes).
XML is actually a decent data format for markup. Using it for other purposes (RPC format, configuration files, ...) usually doesn't end well.
[+] [-] septerr|14 years ago|reply
I heard of similar story of a student in Birmingham whose license plate was 'null'.
[+] [-] piinbinary|14 years ago|reply
[+] [-] hughw|14 years ago|reply
[+] [-] aidos|14 years ago|reply
[+] [-] vitomd|14 years ago|reply
[+] [-] TazeTSchnitzel|14 years ago|reply
[+] [-] joeyh|14 years ago|reply
if ($lastname) { ... }
This fails when $lastname="0". But I am constantly seeing perl code that does it.
[+] [-] kgtm|14 years ago|reply
[+] [-] toyg|14 years ago|reply
[+] [-] jrgnsd|14 years ago|reply
[+] [-] gabrtv|14 years ago|reply
[+] [-] funkeemonk|14 years ago|reply
root@ , nobody@ and daemon@
They gave me "daemon". I've terminated that account long ago, but last I checked (6 years ago?), I could still retrieve emails and dial in using a modem using that account.
[+] [-] why-el|14 years ago|reply
[+] [-] gouranga|14 years ago|reply
I've always wondered if SICP style scheme would cause these sort of problems.
[+] [-] ars|14 years ago|reply
It's hard to do in-band signaling properly, but often time you only have a single data channel and then you have no choice.
[+] [-] lifeisstillgood|14 years ago|reply