top | item 16192071

C as an intermediate language (2012)

133 points| yinso | 8 years ago |yosefk.com

60 comments

order
[+] WalterBright|8 years ago|reply
I considered compiling D to C. The trouble is when you need to do things like exception handling, adjustor thunks, define things that are implementation defined or undefined in C, etc. I figured I'd spend too much time struggling with that, and besides, it would make compilation slow.

> Without doubt, today the answer is C++ and Objective-C – languages whose first compilers emitted C code.

I wrote the first native C++ compiler for the PC. It was a huge win over the cfront based PC compilers, which were nearly unworkable.

[+] WalterBright|8 years ago|reply
Another problem I failed to mention is you don't have control over the C compiler the customer is using. It may behave differently, have bugs in it, etc., and the customer will blame you. Being unable to fix the C compiler means you'll be fixing the problem in the wrong place, making things pretty hard on you.

I run into these sorts of problems with D's C++ interface. In particular, how "long long" and "long" are mangled when they are the same size, and when "long long" is used vs when "long" is used. It's an endless whack-a-mole problem.

[+] stingraycharles|8 years ago|reply
> I figured I'd spend too much time struggling with that, and besides, it would make compilation slow.

Is that only because of the extra compilation step, or are there other things making this more slow ?

[+] rdc12|8 years ago|reply
If you were writing D from scratch today, would you go for a full back end solution or would you target the LLVM?
[+] quicklime|8 years ago|reply
The Mercury compiler has a backend which compiles to C. This is currently described as "high-level" C, because the generated code sort-of (relatively speaking) looks like code that a C programmer might write. But back in the early days (mid 90s?) it emitted "low-level" GNU (not ANSI/ISO) C, which basically looked like assembly code for an abstract machine. Kind of like the "compilation strategy" described in TFA, taken a number of steps further.

One of the nice things about GNU C is that various GCC-specific extensions can be taken advantage of. For example ISO C does not require a compiler to optimise tail recursive calls, which are heavily used in functional (and logic) programming languages, where recursion is used instead of loops. It also lets you access CPU registers directly, and write inline assembly code.

[+] mpweiher|8 years ago|reply
This becomes difficult when you want to do something that C doesn't really handle (well), for example the processor flags or for example passing control to another function, leaving the stack intact (objc_msgSend).

If you can then insert some asm, that's good, otherwise you hit a brick wall.

See: https://cr.yp.to/qhasm/20050129-portable.txt

[+] lisper|8 years ago|reply
This. C is a great intermediate language for languages that have semantics close to C. Otherwise, not so much.
[+] pjmlp|8 years ago|reply
Another language that famously took this route is Eiffel.

EiffelStudio uses a VM like environment for the develop-compile-debug cycle, and then compilation to native code via the platform's system C compiler for deployment.

[+] vram22|8 years ago|reply
Eiffel seemed like a really good language. I wish it had succeeded more and become mainstream. I've tried it out some with EiffelStudio earlier. Their support for developing GUI apps, at least on Windows, also seemed good, based on what I tried of it.

The book "Object-Oriented Software Construction" by Bertrand Meyer, Eiffel creator, is also very good. I had read most of it some years ago.

[+] jmiserez|8 years ago|reply
I remember compilation being rather slow though, and deleting the EIFGENS folder frequently when something stopped working. This meant all the C code had to be regenerated and then recompiled, which took a while.

I still think it was a great language for learning programming.

[+] shmolyneaux|8 years ago|reply
In addition to the ones already mentioned here, pypy, a tracing jit for python2 and python3, is another project which gets compiled to C. It's written in RPython, which gets compiled down to C.

RPython includes a jit generator that can be used to speedup new languages written in it. Pixie [0] is an example of another language written in RPython.

[0]: https://github.com/pixie-lang/pixie

[+] tburmeister|8 years ago|reply
Cython is another Python-like language that gets compiled to C; it's a great tool for getting big performance bumps for very little effort.
[+] Scarbutt|8 years ago|reply
Pixie was abandoned by the author mostly because he realized he couldn't achieve the performance that Clojure has on the JVM.
[+] frankpf|8 years ago|reply
> [...] pypy, a tracing jit for python2 and python3, is another project which gets compiled to C.

That's interesting. How does it work? I think most JIT compilers emit assembly directly and then execute it. Does PyPy generate C code while your program is running?

[+] throwaway7645|8 years ago|reply
There was a big HackerNews thread on Pixie awhile back with the author Tim Baldrige and why he eventually stopped the project. A neat idea.
[+] pankajdoharey|8 years ago|reply
I have always asked this question why C as IL and not something like C-- which is more portable and less system call specific. I think that LLVM IR already serves a good purpose at doing this and is really easy to hand program in.
[+] jokoon|8 years ago|reply
I'm still learning how to build a language, and compiling to c seems like the easiest thing to do, but I'm curious if it's more difficult or not.

If my language is close to c (functions, scope, variable, types), can I take advantage of it so it's less work in my compiler to let the c compiler catch errors, or must I rewrite a full parser?

All I want to do is add pythonic indenting, range loops, maps, and geometric types with their operators (a little similar to shader languages).

[+] lucozade|8 years ago|reply
As described, you’ll almost definitely need to write a parser. However, if your type system is very similar to C’s you may be able to just lower from the AST to C and let it worry about semantic analysis and code generation.

The biggest loss is that a lot of error and debug info will be relative to the C rather than your language. This means the user will be exposed to the internals.

Probably the biggest win, is that it’s almost trivial to integrate C, and potentially C++ libraries into your language. This has been hugely beneficial to Nim, for example. They have written a full front end though so they don’t have the error issue above.

Like most things it’s a trade off.

[+] exDM69|8 years ago|reply
I would say that emiting LLVM IR is easier than C. I have done a few toy compiler projects with LLVM and it is very nice. I've never done a compiler that emits C, but I can't imagine it being any easier. At least you'd have to write more boilerplate code.

You also can't create JIT compiled REPL using C as easily.

[+] dfox|8 years ago|reply
You can implemement reasonably C-like languages (eg. Objective C and various C-derived DSLs in many Lisp implementations) by a simple text transforming preprocessor and get reasonable error messages from the C backend (#file is your friend for achieving that).

But for anything more complex you are better off implementing complete parser with your own error handling and reporting.

Edit: one of the clearest signs of language implemented in this way is use of '@' character as part of syntax extensions as it is the only printable ascii character that has no syntactic meaning in C (the other such "unused character" is '$', but its meaning is implementation-defined, for example gcc has option that causes it to be accepted as part of identifiers, which is quite obviously the default on VMS)

[+] qznc|8 years ago|reply
Using LLVM is a little bit harder but gives you advantages like better debugging. Long term that pays off.

If you only play around, using C is fine.

[+] dtech|8 years ago|reply
> let the c compiler catch errors, or must I rewrite a full parser?

In the long run you always want to do it specifically for your language, because otherwise giving usable error messages and debugging information becomes impossible.

[+] d33|8 years ago|reply
I heard that GCC supports more architectures than LLVM. Perhaps Rust having a C backend would make sense? On the other hand, I have no idea how tightly coupled to LLVM it is.
[+] Rusky|8 years ago|reply
The language itself isn't coupled to LLVM outside of some underspecified details around things like unsafe and the memory model, which will eventually be better-defined.

The compiler has historically been fairly strongly tied to LLVM. This is changing for several reasons as the compiler is refactored, which should enable alternative backends like directly generating C or even plugging directly into GCC.

There is also an LLVM backend for generating C, at various levels of maintenance, that might be made to work.

[+] adrianN|8 years ago|reply
LLVM is used for optimizations in Rust. LLVM is more than just a nice way to target multiple backends, it's a very sophisticated set of building blocks for compilers.
[+] __s|8 years ago|reply
Rust has tried to not couple its design to LLVM

For awhile LLVM could target C

[+] cordite|8 years ago|reply
One I knew of was Vala. I’ve not used it, but the elementary-os devs seem to like it. https://wiki.gnome.org/Projects/Vala
[+] ktpsns|8 years ago|reply
Vala is interesting because it adds syntactic sugar for the GObject OOP library [1] which is a fundament in GTK+ GUI programming (but also can be used independently). All this was (is?) part of the idea that C++, Qt (with the moc) and finally KDE is bloated, broken from the roots. From my understanding, the main argument is the (in many cases unneccessary) complexity of C++ vs. the simplicity of C. In the Gnome desktop world, from my experience many devs moved on to Python based GUI programming. But I still think there are purists who prefer C/GLib due to its plain design.

[1] https://en.wikipedia.org/wiki/GObject