Look at the humongous type that Hindley-Milner infers for this tiny program

[+] flebron|11 years ago|reply

Perhaps an easier example is to do this in GHCi:

    let x = id id id id id id id id id id id id id id id id id id id id id id id id

See how long that takes (add ids if your machine can take it). To deduce a type for x, ghci will use 2^k type variables, if you have k ids. This is easily seen by induction:

If k = 1, then the type of x is a -> a. This takes 2 = 2^1 = 2^k variables. If k > 1, it is of the form id f. Let t be the type of f. t has 2^{k - 1} variables, by induction. Thus the outermost id must take something of type t (this is 2^{k - 1} variables) and produce something of that same type t (this is another 2^{k - 1} variables), that is, it's of type t -> t. 2^{k - 1} + 2^{k - 1} = 2^k.

Note that this information is purely at compile time. Haskell isn't typed at runtime, this is all just to check types. At runtime, x is a no-op, as expected.

[+] seanmcdirmid|11 years ago|reply

I used to be able to kill Scala compiler performance with just a a few identifiers. Well, Scala is trying to deal with nominal subtyping, which is well known to be quite hard (so the type inference algorithm isn't really H-M, and must rely on heuristics).

[+] spacemanaki|11 years ago|reply

That's very interesting! This seems to me like it might be a different kind of edge case though, since the type of x is still t -> t, even if it takes a long time for the type checker to arrive at this. From the little I got out of Mairson's paper, it seems that the proof he uses depends on the exponential size of the type itself, but I could be mistaken... either way I wonder how they're related?

[+] seliopou|11 years ago|reply

The title's a little misleading, as it suggests that there's something strange or even wrong about the type that OCaml or Haskell infer for the program. Of course the type that these systems infer for the program is correct, and the author does demonstrate an understanding of that. But that a small input to a system can produce a large output should be little surprise to any person that works with complex systems, especially computer systems.

The underlying mechanism that leads to the computational complexity in HM(x) systems--the process of generalizing types to schemas and instantiating schemas to types in order to support let-polymorphism--would make for an interesting discussion, but the author hasn't hit on that yet. Unfortunately, the only people that really understand that part of the algorithm are the people that have implemented the algorithm, and there aren't many of those people around. Strange but true. The algorithm that's taught and implemented in most undergraduate PL courses (PL:AI-based courses included) does not support let-polymorphism. If the author's reading, upgrade to ATTAPL[0] and start implementing.

Also, to answer the question "How to compile Turing machines to ML types", have a look at typo[1] for one way to do it in Haskell (by reducing it to lambda calculus (shameless plug)).

[0]: http://www.cis.upenn.edu/~bcpierce/attapl/

[1]: https://github.com/seliopou/typo

[+] JadeNB|11 years ago|reply

I think that this is my new favourite esolang. Type-level naturals (http://www.haskell.org/haskellwiki/Type_arithmetic), eat your heart out! What was your inspiration for making this?

By the way, sorry for a "useless use of cat" (http://en.wikipedia.org/wiki/Cat_%28Unix%29#Useless_use_of_c...) nit-pick, but note that

    $ cat examples/fac.typo | typo

should probably be

    $ typo < examples/fac.typo

[+] throwaway_yy2Di|11 years ago|reply

What about this example?

    f0 x = (x,x)
    f1 x = f0 (f0 x)
    f2 x = f1 (f1 x)
    f3 x = f2 (f2 x)
    f4 x = f3 (f3 x)

This is worse than O(c^n): the size of the last function's type is O(2^2^n).

(In spacemanaki's blog example, each new expression doubles the size of the previous one. Here, each new expression squares the size. f4 builds 2^2^4 = 65,536 copies of 'x', and f5 will crash your compiler if you define it).

[+] emillon|11 years ago|reply

I wrote an article on this example:

http://blog.emillon.org/posts/2014-05-21-making-type-inferen...

At first I expected something of the form O(2^n) too but DEXPTIME actually contains O(2^p(n)) where p is a polynomial.

[+] natte|11 years ago|reply

O(2^2^n) = O(4^n) = O(c^n) ; O(2^3^n) = O(8^n) = O(c^n) ; ...

[+] munificent|11 years ago|reply

This is a fantastic post. I'm passingly familiar with ML and have heard the term "let polymorphism" a number of times, but this is the first time I've seen it clearly explained.

[+] spacemanaki|11 years ago|reply

Thanks, that's very flattering as I've been a long time fan of your blog!

[+] tomp|11 years ago|reply

> Why can’t we just allow polymorphic lambda-bound variables?

Two reasons. The first is that in general, type inference for higher-rank polymorphism is undecidable. The second is that even if it were decidable, it would be impractical - I think that if we have a program like this one:

  let test(f) = (f(1), f(true))

the assumption that the programmer made an error is statistically much more likely than the assumption that `f` should have a polymorphic type. Another reason is that it's not entirely clear what kind of polymorphic type to infer for `f` - should it be `forall a. a -> a` or e.g. `forall a. a -> int`?

> What does this have to do with different “ranks” of polymorphism?

Higher-rank polymorphism means that polymorphic types can appear within other types; in a system with support for higher-rank polymorphic types, the parameter `f` in the function `test` above could be declared with type `forall a. a -> a`, the function `test` would have a type `(forall a. a -> a) -> int * bool`. Higher-rank polymorphism is formalized using System F, and there are a few implementations of (incomplete, but decidable) type inference for it - see e.g. Daan Leijen's research page [1] about it, or my experimental implementation [2] of one of his papers. Higher-rank types also have some limited support in OCaml and Haskell.

> How is let-polymorphism implemented? How do you implement it without just copying code around?

The "standard" implementation consists of two operations, generalization and instantiation. At let bindings, types are generalized by replacing all unbound type variables with polymorphic type variables (care must be taken not to generalize type variables that can be bound later, but that's a secondary issue here). Every time a variable with a polymorphic type is used, its type is instantiated, which means that all polymorphic type variables are replaced with fresh unbound type variables.

  let f = fun x -> (x, x)

  print f(1), f("boo")

In the example above, the type inferred for `fun x -> (x, x)` is `_a -> _a * _a`, where `_a` is an unbound (but not polymorphic) type variable. At let binding, this type is transformed into `forall a. a -> a * a` by replacing `_a` with a polymorphic type variable `a` (in most ML languages, the `forall` is implicit). Then, when `f` is used in `f(1)`, its type is instantiated into `_b -> _b * _b` and `_b` is unified with the type of `1`, `int`. When `f` is used in `f("boo")`, its type is instantiated into `_c -> _c * _c`, and `_c` is unified with type of `"boo"`, `string`. Since `_b` and `_c` are different variables, there is no error here (if `f` had a monomorphic type, `_b` and `_c` would in fact be the same, and this example would result in an error "cannot unify type int with type string").

> What’s the relationship between let enabling exponential function composition and the exponential time result?

So there is quite a bit of copying happening every time a variable with polymorphic type is used. I've never studied the worst-case scenarios of Hindley-Milner, but I imagine that it has to do with instantiation copying large polymorphic types around.

> Do implementations of Hindley-Milner actually represent types as dags and utilize structural sharing?

Yes; an unbound type variable is typically represented as a reference cell, which is assigned to the type the type variable is unified with. So, if we have

  double : forall a. a -> a * a

  double (1, 1, 1) : (int * int * int) * (int * int * int)

then the type of double is first instantiated, yielding `_b -> _b * _b`, and then the type of the parameter is unified with the type of the argument, `int * int * int`. At this point, the reference cell in the internal representation of `_b` is updated to point to `int * int * int`, which means that both `_b` in the result type actually point to the same representation of `int * int * int`.

[1] http://research.microsoft.com/en-us/projects/fcp/

[2] https://github.com/tomprimozic/type-systems/tree/master/firs...

[+] spacemanaki|11 years ago|reply

Clearly I should write more blog posts with lots of questions at the end, thanks!

[+] userbinator|11 years ago|reply

Another reason is that it's not entirely clear what kind of polymorphic type to infer for `f`

How about forall a. a -> b ? If we make assumption that f is polymorphic, then nothing can be inferred about the return value, so it should also be polymorphic?

[+] quink|11 years ago|reply

[+] Camillo|11 years ago|reply

If you're busy, just read the original Stack Overflow Q&A: http://stackoverflow.com/questions/22060592/. It's 1/20th the size of the blog post and I got all I needed out of it.

[+] Patient0|11 years ago|reply

I've found the following: http://okmij.org/ftp/Haskell/AlgorithmsH.html#teval

to be very useful for understanding how the HM type algorithm works.

They rephrase the problem as "writing an expression interpreter" for your program that recursively evaluates the abstract types of each expression instead of the concrete value.

I found this to be a very intuitive way to understand it.

It also then makes it easier to see why some "type inference" algorithms might guarantee to terminate while others might run forever: the more complicated your type system, the less abstract the "type values" are. They become like actual "values", and the behaviour becomes closer to the actual running of the program, which may not terminate.

[+] bmh100|11 years ago|reply

Fascinating behavior. Since this is an edge, would it ever occur in human-written code? Could there be a situation where a computer would end up generating such code?

[+] lmm|11 years ago|reply

I've had this kind of thing happen when I try to use dependent types in Scala. E.g. if you want to select a HList sublist out of another HList (using the types) then that's quadratic, because just figuring out that a HList contains a particular type takes linear time.

(The idea was to do DI using HLists, to get something like Spring but that worked at compile time. It worked, but as the program got larger the compile times became too long for it to be practical)

[+] rwmj|11 years ago|reply

I've written a lot of OCaml code and I don't recall having the compiler blow up on hand-written code. The OCaml type inference engine works well except on objects where you sometimes need to annotate variables to tell it what object type it is. (But then we rarely use OO in OCaml).

It's a different matter with generated code. For a while I had a program that generated OCaml code (from descriptions of Linux kernel structs). That would result in 100000+ line generated OCaml programs which frequently caused very long compile times (eg 20 minutes and more), with the time being spent in the type inference part of the compiler.

[+] spacemanaki|11 years ago|reply

It's unlikely to occur as the type is exponential in size and thus quite unwieldy to do anything with! I haven't worked with generated ML or Haskell code, but I still think it's unlikely to be an issue in "real" code.

[+] freyrs3|11 years ago|reply

No, HM algorithm has terrible worst case complexity, but the worst cases don't correspond to programs humans write and generally it works perfectly in practice.

[+] toolslive|11 years ago|reply

Actually, there's some use case for these things: you can simulate varargs, and yes the compilation time can be made arbitrary large: https://gist.github.com/toolslive/5957292

  $> time ocamlc vararg.ml
  real    1m9.372s 
  user    1m9.264s
  sys     0m0.044s

Try adding a few a's and see what that gives.

[+] spacemanaki|11 years ago|reply

Thanks for pointing that out! I hadn't made the connection and now I've got another reason to figure out how fold works (have yet to wade through all of http://mlton.org/Fold)

[+] unknown|11 years ago|reply

[deleted]

[+] BorisMelnik|11 years ago|reply

question: anyone know of where the pronunciation of "!" to be "bang" came from? I always called these exclamation points but see a lot of programmers calling them "bangs." anyone?

[+] AnimalMuppet|11 years ago|reply

Well, they are exclamation points. But for many punctuation characters, there's a hacker nickname (or more than one). I've heard of "bang" for "!", "hook" for "?", and "shriek" or "splat" for "#".

The Jargon File has more detail: http://www.catb.org/jargon/html/B/bang.html

[+] troels|11 years ago|reply

That sub title is hilarious.

[+] pohl|11 years ago|reply

This little program walked out on stage. What it did next blew the compiler away.

[+] mehwoot|11 years ago|reply

Compilers hate him! Find out this one simple program....

[+] JadeNB|11 years ago|reply

Indeed, that is already (essentially) the subtitle of the article.

[+] htk|11 years ago|reply

English called, they are running out of exclamation points.

[+] antics|11 years ago|reply

This talk was submitted to a conference which literally requires exclamation points to be in the title. So that's where they come from.

[+] htk|11 years ago|reply

Maybe there are not as many Seinfeld fans here as I thought.

[+] glibgil|11 years ago|reply

We can borrow the extra ones from the beginning of Spanish sentences.

56 comments