top | item 7647276

Porting the Go compiler from C to Go

212 points| sqs | 12 years ago |gophercon.sourcegraph.com | reply

93 comments

order
[+] beliu|12 years ago|reply
Author of the post here. Happy to answer any questions I can, and FYI, we (Sourcegraph) are liveblogging all of GopherCon at http://gophercon.sourcegraph.com. Let us know if you have any questions or find it useful!
[+] codezero|12 years ago|reply
You might want to clarify that you aren't Russ Cox and that this post is a summary of a presentation he gave.
[+] Shamanmuni|12 years ago|reply
It's great that they are aiming for an automated conversion from C to Go. It's clear they aspire to convert their code which was written in a certain way. But I think it would be a huge boost in Go usage if they could eventually aim to transpile any C code into Go code.

A little dream of mine would be if in the future when Rust is stable Mozilla developed a transpiler from C++ to Rust. That would be brilliant.

By the way, all the other talks at GopherCon seem pretty interesting, I hope someone uploads videos of them soon.

[+] jerf|12 years ago|reply
It is unclear how one would compile arbitrary C code into useful Go. The stereotyped conventions of well-written compiler code allows for a more idiomatic translation than a general translator could ever aspire to.

C++ to Rust would be even crazier.

(Transpile is a silly word. It's "compile". Compiling already trans-es.)

[+] mwsherman|12 years ago|reply
The use case is to bootstrap the conversion from C to Go, if one has made a decision to do so. You pick a cutoff time, and say “now were going to do our work in Go”.

What comes out the other end, as Russ phrases it, is a C program in Go syntax. The next phase of work is to turn it into “Go” as a human might write it.

The translated program is not meant for deployment, really. It’s to give the humans a starting point.

[+] kodablah|12 years ago|reply
"a transpiler from C++ to Rust"

I think the best approach for something like this is a disassembler for LLVM IR. I am not extremely familiar with the LLVM IR, but I figure there are common patterns or debug symbols clang produces that could let you get a decent idea of what the higher level gave you. If not targeted to clang output too much, you can then transpile any LLVM-able language to Rust.

I think, but am not sure, that the problem with Rust being a target for these other languages (my pipe dream is to write a JVM in it w/ a JIT using the bootstrapped rustc lib) is that you have to use unsafe code all the time (unless there's 0 performance penalty for ARC) because the original code is not written with ownership/borrowing semantics.

[+] Someone|12 years ago|reply
Reading https://docs.google.com/document/d/1P3BLR31VA8cvLJLfMibSuTdw..., I don't see that they are aiming for an automated conversion; they are aiming for an automation-assisted conversion.

They want to go quite a step up from doing some global regex replaces, but in the end, step 3 has "cleaning up and documenting the code, and adding unit tests as appropriate.". I don't think they aim to automate anything there.

[+] NateDad|12 years ago|reply
It has been stated that there will be videos available, but I'd not expect them until next week sometime (when the organizers get home and have recuperated).
[+] stcredzero|12 years ago|reply
Automated rewrite FTW! This can help you avoid freezing a project while it is being ported. Also, if you have a code base with its own idioms, then those idioms can be matched and translated, which can produce cleaner target code.
[+] adient|12 years ago|reply
Go 1.0 spec is already frozen and will not change any time soon.
[+] micro_cam|12 years ago|reply
I'm reminded of the fortran to c compiler f2c that was used to produce pure c implementations of lots of libraries like LAPACK.

I'm curious how they will handle things like pointer arithmetic and memory safety in C vs Go. If they mange to do so in a performant way I could see translating lots of numerical or computationally intensive code to Go so that it could be run in a shared cloud environment without worries about memory safety and without having to resort to vms for separation.

[+] mjcohen|12 years ago|reply
If you took arbitrary fortran, especially with i/o, the results were an unreadable mess - but they compiled and ran.

For a project I had (in the early 90s iirc), I had to extensively modify the fortran and make multiple versions before the generated C was readable. Still, much easier than a rewrite.

[+] thinkpad20|12 years ago|reply
> There are currently 1032 goto statements in the Go compiler jumping to 241 labels.

Wow, that's really striking. I know that goto statements have their uses but for something written in the last couple of years to have over a thousand of them is very surprising (it might not be surprising at all for those who write C code all the time). I guess they're mostly just for error handlers?

[+] awda|12 years ago|reply
As others have pointed out, this is actually idiomatic C. You have a portion of your function below a label which is "clean up everything and return this variable (usually error status)." Then at any point where you have an error, you goto that label and everything is released and the error gets returned. It's sort of a manual 'defer' pattern.

"goto considered harmful" is taken out of context and bemoaned mostly by people unfamiliar with idiomatic C, I think. (Edit: sorry, I mean just for C. For other languages, e.g. Go, there may be more appropriate patterns like the 'defer' keyword.)

[+] daeken|12 years ago|reply
I'm not surprised to see them in a compiler. When you're parsing and want to treat it as a big state machine, gotos are your best friend.
[+] coldtea|12 years ago|reply
>but for something written in the last couple of years to have over a thousand of them is very surprising

Hardly surprising. Gotos will be found by the hundrends or thousands in C projects like compilers, kernels etc.

It's used for localized consolidated error handling, but also for stuff like parsing code (bison generates tons of gotos IIRC), state machines, etc.

If anything it's the old "Goto statement considered harmful" that's a little too naive.

[+] azernik|12 years ago|reply
Yeah, there are a lot of them in e.g. Linux kernel code for jumping to end-of-function cleanup. I suspect that in the process of translation to Go most will be replaced by the defer keyword.
[+] im3w1l|12 years ago|reply
I saw one goto that could cleanly be replaced with an else (goto fp).

I don't think the authors were very goto-averse.

[+] hrjet|12 years ago|reply
This might become a nice benchmark for the Go language; the same code base implemented in C and Go! It may not be fine-tuned for optimisation, neither in C nor in Go, but may still give a good ball-park estimate.
[+] lazyjones|12 years ago|reply
Indeed, but it would be much more useful and impressive if they rewrote the compiler in Go by hand. I'm disappointed that they are aiming for an automatic translation instead - some people are going to ask themselves whether Go actually isn't that much fun to program in. An independent reimplementation is better for correctness too, they can compare outputs and find bugs in both implementations instead of porting over old bugs and adding new bugs where the translation goes wrong.
[+] AYBABTME|12 years ago|reply
I'm really thankful for the liveblogging, as I couldn't manage to get my body to the conference.

I understand the desire to promote the Sourcegraph app by doing the blogging, and I think its effective. However, the blog is real annoying to browse, as every (prominent?) link points to Sourcegraph the app instead of the blog.

[+] AYBABTME|12 years ago|reply
Just to clarify, because I think my original comment is disbalanced.

I'm REALLY thankful for sourcegraph's liveblogging. The above comment was a suggestion as I thought they might want to know that (at least for me), the navigation of the blog was confusing.

[+] kristianp|12 years ago|reply
"They’re deciding to automatically convert the Go compiler written in C to Go, because writing from scratch would be too much hassle."

When transcribing a talk, there isn't any need to write "They're". Just use the same pronoun the presenter used, otherwise it stands out like a sore thumb.

[+] pohl|12 years ago|reply
3) Go has turned out to be a nice general purpose language and the compiler won’t be an outsize influence on the language design.

In what sort of ways does self-hosting early influence a language design? Were they hoping to avoid something in particular by delaying self-hosting?

[+] gizmo686|12 years ago|reply
The general way that self hosting influences language design is that the compiler is often one of the first major projects to be built using a language. This does not give it more influence then other major projects, but if your goal is to have a language designed for use case X, it is generally best to have your early projects with it be for X. Additionally, self hosting may encourage a language design that makes bootstrapping easier (such as a stricter divide between the state-1 language and the general language).
[+] rdc12|12 years ago|reply
"Note: There’s a book written about converting goto code to code without goto in general, but this is a sledgehammer and not necessary here."

Anyone have any idea what the title of that book is?

[+] ANTSANTS|12 years ago|reply

  >A Union is like a struct, but you’re only supposed to use one value
  >(they all occupy the same space in memory). It’s up to the programmer to know which variable to use.
  >  There’s a joke in some of the original C code:
  >      #define struct union /* Great space saver */
  >  This inspired a solution:
  >      #define union struct /* keeps code correct, just wastes some space */
Somewhere in Scotland, a sum type sheds a single tear.
[+] piokuc|12 years ago|reply

  >      #define union struct /* keeps code correct, just wastes some space */
Not always, though...