MLton Compiler Overview

[+] Athas|8 years ago|reply

This is a very nice wiki. I am implementing an ML dialect myself, and studying the implementation of MLkit, MosML, and Mlton has been very enlightening. I work with the maintainers of the former two, but my compiler has ended up being more like Mlton in its design.

[+] rwmj|8 years ago|reply

So am I! What parser are you using? I'm using a recursive descent (ie hand written) parser and it has some troubles parsing ML completely.

[+] mattnewton|8 years ago|reply

Oh wow, I had Professor Fluet in college, and taught me functional programming concepts in haskell and lisp. Cool to see his work on the front page of Hn.

In addition to being a smart guy he was a really good professor. I really enjoyed the class (survey of many different paradigms), even if I did come away jaded and thinking that all the standard languages I have used in my career since are poor approximations of what they could be!

[+] bjg|8 years ago|reply

+1 Professor Fluet was one of my favorite professors, it was his first year at RIT, and he did an awesome job.

[+] rwmj|8 years ago|reply

With work-in-progress support for RISC-V: https://github.com/agoode/mlton/commits/riscv-wip

[+] hajile|8 years ago|reply

Now if we could only get better threading support.

[+] nickpsecurity|8 years ago|reply

The highest-performing compiler for ML's is MLton. The highest-correctness compiler is CakeML. Anyone looking for an interesting project in compilers and verification should consider porting some MLton techniques to CakeML.

https://cakeml.org

[+] tytytytytytytyt|8 years ago|reply

In what ways is it more correct?

[+] sevensor|8 years ago|reply

I don't know compilers, but this seems like a lot more steps than I would have expected. Is it actually an unusually large number of transformations, or is that just how compilers are done?

[+] undecidabot|8 years ago|reply

Compilers come in all shapes and sizes.

Some are single pass like Turbo Pascal. They directly generate machine code while parsing (no AST!). Niklaus Wirth (Pascal/Modula/Oberon guy) wrote a book following this approach [1].

Some have multiple passes like Chez Scheme. They have many simple passes, which they call a "nanopass". Andrew Keep has a great talk on this approach [2].

In practice, most compilers today are multi-pass, though probably not as many Chez. If we look at Rust, they go from AST -> HIR -> MIR -> LLVM IR -> machine code [3]. There are probably more things going on from LLVM to machine code, but I'm not knowledgeable enough to comment on it.

I think the trade-off here is clear: less passes -> shorter compile times, more passes -> faster code generated, more modular compiler. Martin Odersky (Scala guy) has a paper attempting to get the best of both [4].

[1] http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf

[2] https://www.youtube.com/watch?v=Os7FE3J-U5Q

[3] https://blog.rust-lang.org/2016/04/19/MIR.html

[4] https://infoscience.epfl.ch/record/228518/files/paper.pdf

[+] strangemonad|8 years ago|reply

I see lots of comments about how the multiple passes have to do with limited resources like memory. In MLton’s case, (it’s a whole program optimizing compiler) each trasnlation phase has nothing to do with resource contention. And in general, limited resources is typically handled by separate compilation, not compiler passes.

The more important design choices here have to do with 1) Typed assembly languages (rather than untyped assymbly languages) 2) Type directed translation (rather than syntax directed translation) 3) The ability to reason about the correctness of each step of the translation as the program is gradually lowered from SML to assembly.

For more on the typed assembly language work, check out the work of Morissett and Harper that came out of CMU (specifically the TIL / TAL parts of the ConCert project). The XML type system was developed a little bit before that by Harper and Lillibridge. The general idea of a language being a mathematical object with an elaboration and a semantics where you can reason about progress and preservation at each step is a result of the vision of Robin Milner (there’s a good overview here http://homepages.inf.ed.ac.uk/stg/Milner2012/R_Harper-html4-...)

[+] chubot|8 years ago|reply

It depends on the language: how far away is it from machine code? (among other things)

C compilers are very close to machine code, so it takes few passes to write a simple non-optimizing C compiler. I'm pretty sure that after parsing, the original C compilers were one pass. We talked about them a bit here [1]

ML is further from the machine model, so compiling it takes more work. For example, it has closures, while C doesn't, and that's something extra that you have to deal with, probably in multiple stages of the compiler.

A common theme in functional languages like ML and Lisp is "desugaring passes" or "lowering passes", which basically means turning a language construct into a more basic one, so that you can treat things more uniformly at later stages. The more language features you have, the more potential for desugaring/lowering. OCaml and Haskell in particular have a ton of features that can be treated like this.

Slide 14 here has a (partial) diagram of the Scala compiler, which is extremely deep: https://www.slideshare.net/Odersky/compilers-are-databases

[1] https://news.ycombinator.com/item?id=16610938

[+] bjoli|8 years ago|reply

There are many different approaches. Chezscheme does a gazillion steps using the nanopass framework, but is still fast enough to compile a metric shit-tonne of code without any noticeable delay.

[+] Athas|8 years ago|reply

Optimising compilers and compilers for high-level languages tend to have many passes. Mlton is both.

[+] nikofeyn|8 years ago|reply

i have seen some work regarding a real-time version of mlton. i would love to see this work done more because nothing would please me more to be able to do embedded systems with an ML language.

30 comments