top | item 12056230

Corrode: C to Rust translator written in Haskell

387 points| adamnemecek | 9 years ago |github.com

122 comments

order
[+] tinco|9 years ago|reply
Absolutely blown away by the detail of the documentation. The main logic of this project is in a literate haskell file you can easily read on GitHub.

https://github.com/jameysharp/corrode/blob/master/src/Langua...

I wonder how readable is to someone who isn't experienced in Haskell. To me reads like a breeze, but I have a project using the exact same parsing library so maybe that puts me at an advantage.

The language-c library he uses is an excellent one, it's a fully spec compliant C parser that's well maintained. I've based my C compiler on it and I haven't encountered any C code it couldn't parse yet. One time I upgraded to a new OSX and Apple added some stupid thing to their headers that broke the parser and a fix was merged within days. This means it takes away the entire headache of parsing C leaving just the actual compiling.

[+] fractalsea|9 years ago|reply
The documentation is amazing. One problem I find with rendered literate haskell is that it because quickly unclear how the indentation across blocks of code fits together. It would be nice if there was some kind of renderer that kept the indentation in the docs, or had some kind of "indentation guide".
[+] eridius|9 years ago|reply
By "some stupid thing to their headers" are you referring to nullability annotations? I'm not sure what else Apple would have added to pure C headers (as opposed to obj-c) any time in the past few years.
[+] pierrec|9 years ago|reply
I was curious about how this worked so I looked into the source a little (even though Haskell isn't exactly my cup of tea), and WOW... This is just amazing. The most important part of the source is highly educative literate haskell:

https://github.com/jameysharp/corrode/blob/master/src/Langua...

[+] vvanders|9 years ago|reply
That's an incredibly brilliant idea, there's some serious craftsmanship going on in that source file.

Kinda taking Rust's doc unit tests format to a whole new level.

[+] ctb_mg|9 years ago|reply
As someone who knows zero haskell, and little markdown, can someone explain how this works?

haskell.org [1] says there is "bird style" and "LaTeX style" ways of marking off code vs documentation, and I see neither in the linked file. Is it the "```haskell" blocks?

[1] https://wiki.haskell.org/Literate_programming#Haskell_and_li...

[+] kerkeslager|9 years ago|reply
This is the coolest thing I've seen on HN in a long time, and useful to boot. Hopefully this will be a very big help to people moving over to Rust from C for its safety and type-checking. In general I don't support rewrites because, as many experienced programmers have pointed out, rewrites often make many of the same mistakes as the program they're rewriting. But transpilation allows us to keep the code with all the fixes to those mistakes.

In theory I'm a big supporter of Rust. I strongly feel that we should be using stronger-typed languages than C for developing security-critical applications, and from the outside, it looks like Rust solves a lot of my criticisms of C without giving up any benefits of C. A transition over to Rust could be a big win for security and reliability of many systems.

However, I'm reluctant to devote time to learning Rust primarily because it's not supported by GCC (or any other GPL compiler that I know of). I hope the next cool thing that that the Rust community does is to continue the work done by Philip Herron[1] on a Rust front-end for GCC. I know the classic response to this is, "Do it yourself!" but there are too many other areas of Open Source that are higher priorities for me, so sadly this will have to be done by someone else if it happens at all.

[1] https://github.com/redbrain/gccrs

[+] jjnoakes|9 years ago|reply
I also would love to see a gcc front-end for rust, but my reasons are not related to the license.

I have to support platforms that gcc has backend for, but LLVM does not.

[+] trajing|9 years ago|reply
I'm curious why you have an issue with the mostly-MIT licensed main Rust compiler. Could you explain?
[+] nategri|9 years ago|reply
This is probably the most "Hacker News" thing I've ever seen.
[+] valarauca1|9 years ago|reply
We won't reach peak HN until somebody writes an ARC to Node.js transpiler in Haskell.
[+] wink|9 years ago|reply
I do get that Haskell is useful to be taken as a tool for these kind of code transformations (at least I have seen quite a few of those) but I am always a bit surprised that people would start such a project in a language that has -per se- nothing to do with either the source or the target language. I know, I know, it doesn't always have to be this way, but I am very much of the opinion that everytime good tools in an ecosystem are written in the language in said ecosystem you get a lot more (and meaningful) contributions.

Best examples: rake (and everything in the ruby ecosystem basically), the amount of people touching ruby c code is very small compared to all the 'standard tools', or cargo.

[+] masklinn|9 years ago|reply
> I do get that Haskell is useful to be taken as a tool for these kind of code transformations (at least I have seen quite a few of those) but I am always a bit surprised that people would start such a project in a language that has -per se- nothing to do with either the source or the target language.

The author explained this in a blog post[0]: Haskell has a very complete C parser library with a nice API[1] which the author already knows, Rust doesn't; furthermore since one of the project's goal is to be as syntax-directed as possible the translator is straightforward and should be understandable with very little understanding of Haskell (which can be bootstrapped from understanding Rust)

[0] http://jamey.thesharps.us/2016/07/translating-c-to-rust-and-...

[1] http://hackage.haskell.org/package/language-c

[+] duaneb|9 years ago|reply
Haskell is very good at writing correct parsers easily—that's one thing. This is part of the reason it was chosen as an early perl6 test bed via pugs. Rust is getting there (I'm a big fan of the lalrpop library) but the ecosystem is no where near as mature as Haskell's for feature-complete libraries. The pace of development of the rust ecosystem is mind-boggling, though—I never guessed that rust would have taken off in popularity as much as it has.
[+] posterboy|9 years ago|reply
might be easier to do something between clang and the llvm.
[+] DaGardner|9 years ago|reply
as many "transpilers" / compilers, whatever you might name them, it lacks example input output.

I want to see how my new rust code base looks light, does it compile with some heuritics, or just 1:1 C to rust primitives?

[+] DanWaterworth|9 years ago|reply
Here you go. It didn't like my stdio.h. Apparently enums and unions aren't supported, but:

    extern int printf(char *, ...);

    int main(int argc, char argv[]) {
        printf("Hello, world!\n");
        return 0;
    }
Was turned into:

    extern {
        fn printf(arg1 : *mut u8, ...) -> i32;
    }
    #[no_mangle]
    pub unsafe fn main(mut argc : i32, mut argv : *mut u8) -> i32 {
        printf(b"Hello, world!\n\0".as_ptr() as (*mut u8));
        0i32
    }
edit: Also worth noting, it removes all comments. I believe this to be a limitation of language-c [1]

[1] https://hackage.haskell.org/package/language-c

[+] steveklabnik|9 years ago|reply

  > Because the project is still in its early phases, it is not yet
  > possible to translate most real C programs or libraries.
It is currently trying to port over semantics exactly, so the Rust code is far from idiomatic Rust. Doesn't mean it's not useful, just saying that it's trying to be 1:1.
[+] loeg|9 years ago|reply
Has anyone tried it on some real-world codebases? How about kernel code? It would be very exciting to improve real-world crash safety and security by e.g. converting popular drivers quickly and automatically, followed by a manual pass applying safer Rust semantics.
[+] steveklabnik|9 years ago|reply
The README explicitly states that this is far too early for real programs. It doesn't come close to supporting all of C yet.
[+] codygman|9 years ago|reply
Trying to get to computer to do exactly this.
[+] serge2k|9 years ago|reply
> Partial automation for migrating legacy code that was implemented in C. (This tool does not fully automate the job because its output is only as safe as the input was; you should clean up the output afterward to use Rust features and idioms where appropriate.)

This was my immediate concern. Is there any chance this tool can produce anything close to clean, safe, idiomatic, rust code?

[+] nemaar|9 years ago|reply
> Is there any chance this tool can produce anything close to clean, safe, idiomatic, rust code?

That is generally not possible, unless the C code only uses specific patterns known by the converter tool. That's very unlikely, considering that people write C code to be 'quick' and usually use all kinds of tricks.

[+] haimez|9 years ago|reply
Here we have it gentlemen: HN bingo.
[+] eutectic|9 years ago|reply
I wonder if there would be any point in using this to fuzz the Rust compiler.

On the one hand, you could use CSmith with a C compiler as a convenient oracle, but on the other you would only be covering a very limited subset of e.g. the type system.

[+] felixangell1024|9 years ago|reply
The name "Corrode" doesn't seem very positive given the purpose of this program...
[+] Mathnerd314|9 years ago|reply
The code has a lot of special cases. Could these be eliminated using machine translation techniques?
[+] cannonpr|9 years ago|reply
I understand the world play, but perhaps it's a misunderstanding of Rusts name origin ? https://www.reddit.com/r/rust/comments/27jvdt/internet_archa... It's after a fungus https://en.wikipedia.org/wiki/Rust_(fungus)
[+] kevinastone|9 years ago|reply
Their logo is a rust colored sprocket. Rust is full of iron/corrosion/oxidation references.
[+] steveklabnik|9 years ago|reply
This is of itself a misunderstanding of the origin: that's one of many reasons it's called Rust, not the only one. There's no single reason for the name.
[+] Rusky|9 years ago|reply
Rust has had iron oxide puns for names for a very long time, even from people who know where Graydon got the name.
[+] wspeirs|9 years ago|reply
It's too bad this is written in Haskell. I don't have anything against Haskell, it is just not as popular a language as others.[1] Any ANTLR target language would have been a solid choice.[2] This way more of the community could contribute. This is an invaluable tool if we're truly going to see a shift from C (or C++) to Rust.

[1] http://pypl.github.io/PYPL.html

[2] http://www.antlr.org/download.html

[+] DanWaterworth|9 years ago|reply
I think people interested in compilers are disproportionately Haskell inclined.
[+] vvanders|9 years ago|reply
I've used ANTLR in anger a few times and for some reason it's always left a bad taste in my mouth. I always seem to spend more time debugging how ANTLR works rather than doing the work I set out to do.

Granted most of my use cases was building a simple DSL so it might be different when talking about whole source conversion.

[+] Retra|9 years ago|reply
Haskell is very popular. It's a top 40 in a field of thousands. It's more popular than Rust.