top | item 12514372

(no title)

NickPollard | 9 years ago

You are right - in a tautalogical way - that type systems only catch type errors. However, in modern languages (including Haskell, Scala, as well as newer, more experimental languages like Idris), those type errors can be extremely powerful.

Many people assume that 'types' are simply primitives like Int and String, and that a type checker just makes sure you don't pass an Int to a function expecting String. However, it is possible to express far more powerful statements about your data using a good type system.

For example, you can express the idea of non-emptiness of a container, as mentioned in the article. Then you know that, say, taking the max element of a non-empty container is guaranteed to give you an element, whereas with a possibly-empty container you might not have any element at all, causing a null, or exception, or at least requiring an Optional type.

You can express safety properties such as a sanitized string vs. unsanitized. You can have a Sanitized type that can only be created by calling a sanitize function - which carefully escapes/handles any invalid characters - and then functions that might, say, pass a value into an SQL instruction can be typed to only take Sanitized strings. Now the representation in memory of Strings and Sanitized strings is identical, but by using different types and a certain set of allowed functions on those types, you can encode the invariant that a string cannot be inserted into an SQL query until it has been sanitized. Now your type checker can catch SQL insertion vulnerabilities for you. How's that for a type error?

discuss

order

rbonvall|9 years ago

Yes, this is the point that's often missing in discussions about types. You can (and you have to work to) encode properties as types to get more value out of them. It's not about avoid mixing ints and strings.

xapata|9 years ago

You make a good point, so I'll answer in parts.

First, when most people talk about static typing, they're talking about the near-useless version -- just types like Int and String. I think we agree there, so I won't mention it further.

Second, a dynamically typed language like Python has more typing information than some folks first assume. Python's AttributeError is quite similar to a TypeError. In fact, with old-style classes (v2.1 and earlier), many errors that are now TypeErrors were AttributeErrors. Calling len() on an inappropriate object would raise "AttributeError: no __len__". In many cases where folks talk about wanting a static type system, they really just want interfaces.

The Sanitized string example is a good counter-point because the interface needs to be near-identical to a regular string. I'm not certain a more complex memory representation (caused by defining a different class) would cause noticeable inefficiency. We're probably not doing vectorized operations on strings.

This brings me to my third point, that Python 3 has a similar split between two types: bytes and str. The memory representation is slightly different, bytes vs unicode, but the interfaces are nearly identical. Two differences would be decode vs encode and that getting an element from bytes (annoyingly) gives an int. The distinction between the two types is enforced mostly inside builtin functions, implemented in C. This was a big deal, causing backwards incompatibility, many flamewars, and we're still resolving it, though I think it's clear to most people now that Python 3 is the future.

Is it possible that the Python 2/3 split could have been avoided if we had a static type system? Perhaps, if we had multiple dispatch, the function signatures could have remained the same, avoiding backwards incompatibility... I'm just speculating here. My guess is no, getting rigorous about unicode would cause incompatibility regardless of the type system. I'll get back to the main topic now.

> Now your type checker can catch SQL insertion vulnerabilities for you.

This sounds useful, but a good interface solves the problem just as well. I'm a Pythonista (if you haven't noticed), so my example is PEP 249 that specifies a DB API for all database wrapper implementers to follow. It states that it's the wrapper dev's responsibility to implement a sanitizing string interpolation for the cursor's execute method.

My conclusion is that designing a good interface is important whether you have dynamic or static typing. Static typing errs on the side of safety, dynamic typing errs on the side of flexibility. Both can mimic the other. Arguing that one is better is like saying linear regression is better/worse than k-nearest-neighbors.

the_af|9 years ago

> First, when most people talk about static typing, they're talking about the near-useless version -- just types like Int and String. I think we agree there, so I won't mention it further.

I don't agree. Who is "most people"? Certainly not PL designers and not most of what I've seen here in HN. More importantly, it's also not what the article under discussion is saying, either.

> Static typing errs on the side of safety, dynamic typing errs on the side of flexibility. Both can mimic the other. Arguing that one is better is like saying linear regression is better/worse than k-nearest-neighbors.

In my experience, this isn't true. Modern statically typed languages have all the convenience of dynamically typed ones, such as REPLs and elegance, plus the safety of early warnings and the guidance that static types give you while writing your code (if you've ever written code like this, you'll know the feeling of working with building blocks that "fit" with each other). So you can have your cake and eat it, too.

Also in my experience, not having experience with these languages is what leads some people to think their type systems can only state trivial things such as "this is a String". They can do more. They can say things such as "this expression/function doesn't write to disk as a hidden side effect", which is useful!