top | item 27985167

My Opinion on “5” == 5

27 points| jmount | 4 years ago |win-vector.com

63 comments

order
[+] nathanaldensr|4 years ago|reply
IMO, any language with a type system should enforce comparisons only between those types (or derived types) unless a developer explicitly overrides an operator--and even then, overriding operators is sometimes considered bad practice due to violating the principle of least astonishment. The types of a comparison operator's operands matter because the type system exists. Allowing things like "5" == 5 suddenly introduces human expectations of logical equality--"they're both '5', therefore they're equal." Programming languages shouldn't codify these human interpretations.

My favorite language, C#, has some unfortunate history. The == operator when dealing with references indicates reference equality (Object.ReferenceEquals), but can be overridden to indicate value equality (Object.Equals) even with reference types. Then, when dealing with value types, == suddenly means value equality. It would've been better to define two separate operators for reference and value equality. This would've helped make usages self-documenting.

[+] int_19h|4 years ago|reply
You can already kinda sorta do that if you consistently use == assuming value equality, and use ReferenceEquals() - which doesn't need to be qualified, since everything inherits from Object - for reference equality.

You still have == working on all references in this case, but this can be interpreted as reference types having the default value semantics of "every instance is logically different", so referenced values are only equal to themselves.

The annoying part of this approach is that ReferenceEquals() is the one that ends up being used way more often than == in practice for reference types, but it's also much longer to type, and more awkward to use being a function. Python and VB, with their "is" operator for reference equality, handled this one best. Unfortunately, C# wasted "is" as a keyword for something far less common - a type check (and now also pattern matching).

[+] zajio1am|4 years ago|reply
> IMO, any language with a type system should enforce comparisons only between those types (or derived types)

There are languages with a type system that have universal type (and all types are derived from it), in these languages it make sense that "5" == 5 is defined and false.

[+] egeozcan|4 years ago|reply
That historical... erm... glitch... in C# is very unfortunate and burned us many times back when I was working in a .NET shop (people getting creative was usually the culprit). I've even seen some codebases where people decided to implement their own equality operators (lesson through experience: don't do this).
[+] MarkSweep|4 years ago|reply
At least more recent versions of Visual Studio highlight the == differently so you know it is overridden.
[+] zeroimpl|4 years ago|reply
Funny how they say SQL gets it right, but then shows an example which is the opposite of my experience.

From PostgreSQL

    # select '5' = 5;
    ?column?
    --------
    t
    (1 row)
I believe this is standard SQL. Putting something in quotes just means it is a literal, doesn't mean it is a string. Of course if you explicitly cast it to a string, then you get an error. As in:

    select '5'::text = 5;
    ERROR:  operator does not exist: text = integer
    LINE 1: select '5'::text = 5;
                             ^
    HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.
I agree with the premise - SQL gets it right, but perhaps only in PostgreSQL?
[+] kennywinker|4 years ago|reply
The only acceptable answer is a type error. His example of python being able to handle heterogeneous collections is fine, but a good language will make you be explicit about the types.

let a: String = “5”

let b: Any[] = [5,6,7]

let c: Int[] = [5,6,7]

a in b // false - we don’t know what’s in b so we check and a is not in b

a in c // type error - there’s no way a is in c. Convert a to an Int, and handle the error cases involved in that.

[+] simiones|4 years ago|reply
In dynamic languages, like Python, all variables have a static type of "Any", so "5" == 5 has matching static types. Then the runtime types (or tags as some prefer to call them) are checked in the implementation of == itself. This is perfectly equivalent to what must happen in your example with `a in b`. Python can't express something like Int[], though.
[+] infogulch|4 years ago|reply
The author is right, the answer should be a type error. If you want to compare them as ints, convert the string to an int first (and deal with the potential failure in the case that the string does not represent an integer). If you want to compare them as strings, convert the int to a string first. There are multiple approaches to transform this invalid comparison into a valid comparison, choose one. Hint: prefer the one that most closely aligns with the abstract meaning of the domain.

A point that other commenters brought up is what to do in the case of user input. Lets assume it's unreasonable to expect the user to select a type explicitly. In this case I'd say there are actually three distinct types: int, string, unknown-currently-represented-as-a-string. This doesn't solve the problem of needing to select a type conversion, it just makes it more obvious that a type conversion must occur.

This reminds me of Go's const numbers which are actually represented as abstract numbers with arbitrary precision during compilation and only get assigned a concrete value once its assigned to a variable that has a concrete type. This allows, for example, to have an expression using Pi which can be assigned to both float and double that utilizes each type's maximum precision.

[+] nybble41|4 years ago|reply
> This reminds me of Go's const numbers which are actually represented as abstract numbers with arbitrary precision during compilation and only get assigned a concrete value once its assigned to a variable that has a concrete type.

Haskell does something similar. Literal integers and decimal fractions are translated into expressions like "fromInteger (12345 :: Integer)" for "12345" or "fromRational (314159 % 100000 :: Ratio Integer)" for "3.14159" (where the % operator, defined in Data.Ratio, constructs a Ratio value from a numerator and denominator). The inputs have arbitrary precision, and the concrete type of the expression is resolved through the normal type unification process.

IMHO the static type error, runtime type error, and always-false interpretations all have some merit, though naturally static typing without any implicit conversion has the best chance of detecting mistakes early on. The only one I would reject as obviously invalid is the one that says that "5"==5 is true.

[+] wydfre|4 years ago|reply
Brendan Eich created the language in 10 days. Think about the billions of dollars and billions of man-hours that have spent related to a language not designed and slowly gained userbase, or scholarly, but a guy with a job. One guy. One guy in 10 days decided the fate of the internet. That's crazy.

https://thenewstack.io/brendan-eich-on-creating-javascript-i...

[+] brrrrrm|4 years ago|reply
You're referring to JavaScript whereas the post is about R
[+] tyingq|4 years ago|reply
To be fair, "5" == 5 is true for Awk, Perl, and Tcl. All of which were pretty popular at the time.

Not discounting other implications of a 10 day design period :)

[+] xscott|4 years ago|reply
I wonder how different the world would be if he'd been allowed to use Scheme instead.
[+] hprotagonist|4 years ago|reply
I would accept False.

I would also not complain if a "Warning: why are you comparing strings and integers" was raised.

Operators like +,-, ... should raise TypeError, or similar.

[+] withinboredom|4 years ago|reply
In php, this is also true or false if you have strict types on.

I’d argue that there are some valid use cases for “5” and 5 to be equal. For example, if you take user input where on an admin screen for a user ID or username, you don’t know if the type is a string or number. Another common example is when you don’t have a known encoding for a packet yet, so you treat it as a binary string. In these cases, it’s easier to grok, as a human what is expected.

[+] lexicality|4 years ago|reply
If you're dealing with user input, you should always be validating the type before working with it.

If you want to compare input to an int, you should be verifying the input is a valid int before comparing it.

Just hoping the language will do the right thing is a great way to inadvertently allow exploits in your service

[+] crote|4 years ago|reply
On the other hand, what about "V" == 5? Or one of the literally dozens of unicode characters representing the number five in different writing systems? Maybe even "five" == 5?

Accepting both a numerical user id and a username string is a very valid scenario. However, I believe you should handle that by explicitly trying to parse it as a number, not hoping that automatic coercion will do the right thing. It's just a footgun waiting to go off.

[+] Koshkin|4 years ago|reply
> it’s easier to grok, as a human

People usually understand the difference between "cat" and a cat.

[+] llimos|4 years ago|reply
PHP also has a nice thing I wish more languages had, which stops you shooting yourself in the foot due to this - the concatentaion operator (.) is different from addition (+).

So "5" + 1 is 6, just as 5 + 1 is 6.

[+] smnrchrds|4 years ago|reply
So that's why some websites fail to log you in with numerical usernames and/or passwords that start with zero.
[+] tyingq|4 years ago|reply
There are several languages where this sort of type coercion exists. Javascript, Awk, Perl, Tcl, and PHP as you mentioned.

For the most part, the developers in those spaces seem to know the pitfalls and how to avoid them.

[+] samatman|4 years ago|reply
I like Lua's solution to this dilemma:

    5 == "5" -- false
    5 + "5"  -- 10
    5 .. "5" -- 55
It's just dynamic enough, none of the conversions are surprising and they're frequently helpful.
[+] staticassertion|4 years ago|reply
Wow this is actually the worst solution I could imagine.
[+] vehemenz|4 years ago|reply
"5" == 5 evaluating to true is only a problem if you don't understand how to do a strict comparison.

It is conventional in PHP and JavaScript to use === unless there is some reason not to.

[+] antihero|4 years ago|reply
I guess one could argue that === should be ==, and the original == shouldn't exist.
[+] Etheryte|4 years ago|reply
The article you're commenting on isn't about Javascript nor PHP though.
[+] Kalanos|4 years ago|reply
for what it's worth, `5` is not a valid json key, but `"5"` is.
[+] Y_Y|4 years ago|reply
What about

    "" ==
[+] croddin|4 years ago|reply
Syntax error. Unexpected end of line.