top | item 13294949

CakeML – A Verified Implementation of ML

163 points| setori88 | 9 years ago |cakeml.org

37 comments

order

vog|9 years ago

I find it interesting that CakeML, like many other developments in this area, is based on SML (Standard ML) and not OCaml (Objective Caml). Moreover, whenever I read something about ML languages, it seems most people in the academic field talk about SML.

Yet, it seems that OCaml is more popular among programmers and real-world projects. Even though these programmers come from the academic field, given the niche existence that OCaml still is. For example, the astonishing MigrageOS project chose OCaml instead of SML.

So my question is:

How is that? Why is OCaml so much popular, despite having just one implementation and no real spec? Why is SML with its real spec and multiple implementations not as least equally popular?

EDIT: Here are two possible answers that I don't think apply:

1. OCaml may be "good enough", which, combined with network effects, make choosing OCaml over SML a self-fulfilling prophecy. I don't think it is that simple, because OCaml users and projects come mostly from the academic field. They are deeply concerned with correctness of code. Which would mean they should all have favored SML over OCaml. In fact, sometimes correctness seems to be the sole motivation. For example, the author(s) of OCaml-TLS didn't just want to create yet another TLS library in a hip language. They are concerned with the state of the OpenSSL and similar libraries, and wanted to create a 100% correct, bullet-proof, alternative.

2. Although one could attribute this to the "O" in Objective Caml, I don't think it is that simple, because it seems the object-oriented extensions are almost unused, and wherever I saw them being used (e.g. LablGTK, an OCaml wrapper for the GTK UI library) I don't see that much value, and that sticking to plain OCaml Modules and Functors would have led to a better interface.)

Athas|9 years ago

I work with a bunch of people who were very involved in SML, and I've asked this question many times. They usually say that SML (as a language) stagnated because of personal conflicts in the standardisation committee. A bit of a "worse is better" microcosmos, really, with a focus on on discussing theoretical perfection instead of how to get things into the hands of users. You can partially see this in play today in the Successor-ML[0] discussions, where there's not really any clear vision or leadership driving things forward, just a lot of (often very good) language suggestions.

My own extrapolation is that a multitude of SML implementations meant that programmers couldn't take advantage of implementation-specific extensions, as that would prevent their code from running elsewhere. This hindered real-world experimentation (as opposed to academic investigations) and organic growth of the language.

But, I should say that as starting point for creating new functional languages, SML is an exceedingly clean design. I'm using it myself for my own language design efforts.

[0]: https://github.com/SMLFamily/Successor-ML/issues

mhd|9 years ago

The old ML compilers were more deeply rooted in academia. It was almost popular to write a new one to fiddle with a new idea (like concurrency). Performance and more pragmatic considerations were a bit secondary and that included OS interfaces.

Caml had these, and thus was used to build some software somewhat popular on Unix systems (e.g. Unison, MLDonkey, a flash compiler). And while these days it might seem like we've got plenty of freely available, fast compiling, native languages to choose from, just a few years the situation was very different. So some just chose the language for that reason, never mind syntax or semantics. As long as it wasn't C.

So you just got the feeling of more dedicated, pragmatic language maintainers and actual community use. People who really care about purism went over to Haskelly anyway.

(Personally, I like the idea of languages with more than one compiler, if only the "Standard" part of "SML" would be a bit bigger)

fmap|9 years ago

I can add a few more data points. First regarding CakeML:

- CakeML is written in HOL4, which is using ML.

- SML, unlike Ocaml has a well-defined semantics and so there is a good starting point for a verified compiler.

Second, regarding Ocaml, the easiest reason for choosing Ocaml over ML is because there is only one implementation of the language that everybody is using. This makes it a lot more likely that the ocaml compiler will still be around a few decades from now. There are several modern ML compilers, most of which are more or less unmaintained. PolyML, which everybody seems to be using these days, is substantially developed by a single person.

tempodox|9 years ago

I can't speak for others, but from a purely practical and non-academic point of view, I chose OCaml over SML because it's painless and easy to produce a standalone binary executable. Most SML implementations seem to be able to do that too, but require me to write compiler-specific glue code, IIRC. I don't want to have to dive into compiler internals to produce an executable. Also, I appreciate that the OCaml maintainers are not afraid to provide useful syntactic sugar although it may not seem “real FP”, like the `for` loop for instance. I seldom use the OO extension, but support for virtual methods can be very handy at times. You can implement the mechanics for those with functors too, but it's just more code to write that way.

DanielBMarkham|9 years ago

It may be similar to the reason for adoption for F# -- it does objects

Doesn't matter if you actually code with objects. The important point for adoption is that it does them.

So for a noob .NET programmer you say, hey, look at F#! You can code it just like C#. Kinda.

But as soon as they start coding, you tell them nope, all of these types of things are actually antipatterns.

rwmj|9 years ago

We use OCaml for the virt tools because it (a) generates efficient, standalone binaries and (b) lets us call out to C libraries really easily. We use OCaml over (say) C or C++ because it allows us to write code with far fewer bugs and headaches. You can find one of these large, real world OCaml programs here: https://github.com/libguestfs/libguestfs/tree/master/v2v

I think network effects do matter much more than you say. There is virtually no community around SML.

mafribe|9 years ago

Summary: CakeML is the first verified optimising compiler that bootstraps.

Side note: Cake stands for CAmbridge KEnt, which is where (most of) CakeML's verification was carried out.

The pioneering project in this space was X. Leroy's CompCert. This was the first verified optimising compiler. More precisely, a realistic, moderatly-optimising compiler for a large subset of the C language down to PowerPC and ARM assembly code.

aisofteng|9 years ago

Could you expand on the significance of the first, for those of us not familiar with formal verification?

Is this is a first because it is is theoretically difficult to do, or because it requires a lot of implementation time? What are some key points to read up and understand in order to properly appreciate this result, past the Wikipedia article on formal verification [1]?

Thank you in advance for any elaboration.

[1] https://en.wikipedia.org/wiki/Formal_verification

nickpsecurity|9 years ago

Everyone with formal method background interested in this work should consider taking on one of their posted projects that would improve it. Especially Ocaml to CakeML translator.

https://cakeml.org/projects

Just email them first in case someone has done the work already. Academics sometimes are slow to update web sites due to digging deep into their research. ;) The best uses I can think of for CakeML are:

A reference implementation to do equivalence checks against with main language, a ML or not, being something optimized.

Someone to build other tools in that need high assurance of correctness. Prototype it to get the algorithm right using any amount of brains and tooling that already exist with an equivalent CakeML program coming out. Then, that turns into vetted object code.

A nice language for writing low-level interpreters, assemblers, or compilers that bootstrap others in a high-confidence way. Idea being in verifiable or reproducible builds where you want a starting point that can be verified by eye. They can look at the CakeML & assembly output with some extra assurance on top of hand-doing it. One might even use the incremental compilation paper on building up a Scheme to end up with a powerful, starting language plus assurance binary matches code.

gravypod|9 years ago

Go into the compiler explorer [0] and type the following

    val num = 10
Then take a look at the x86 generation.

What is all of that. It doesn't look like executable data needed. Is that just implicit functions or something baked into the language? If it is, why isn't it being tree-shook?

junke|9 years ago

It seems you need to add a semi-colon after the line (the AST was empty). A minimal program is simply "();", and compiles into something even more complex. A big part of it looks like runtime setup/cleanup code, common for all programs.

Twirrim|9 years ago

What is ML in this context? Neither CakeML nor Standard ML site appear to actually define it, and it's an acronym with a few definitions in tech (e.g. Machine Learning)

more_original|9 years ago

It's Meta Language. The ML language was developed in the early 1970s as a meta-language for the LCF theorem prover. ML was developed as a language for programming proof tactics. The strong type system and type soundness guarantees of ML were important to guarantee that such tactics could only prove correct theorems.

https://en.wikipedia.org/wiki/ML_(programming_language)

fithisux|9 years ago

What is the difference between CakeML and StandardML in terms of Syntax and Semantics?