I considered compiling D to C. The trouble is when you need to do things like exception handling, adjustor thunks, define things that are implementation defined or undefined in C, etc. I figured I'd spend too much time struggling with that, and besides, it would make compilation slow.
> Without doubt, today the answer is C++ and Objective-C – languages whose first compilers emitted C code.
I wrote the first native C++ compiler for the PC. It was a huge win over the cfront based PC compilers, which were nearly unworkable.
Another problem I failed to mention is you don't have control over the C compiler the customer is using. It may behave differently, have bugs in it, etc., and the customer will blame you. Being unable to fix the C compiler means you'll be fixing the problem in the wrong place, making things pretty hard on you.
I run into these sorts of problems with D's C++ interface. In particular, how "long long" and "long" are mangled when they are the same size, and when "long long" is used vs when "long" is used. It's an endless whack-a-mole problem.
The Mercury compiler has a backend which compiles to C. This is currently described as "high-level" C, because the generated code sort-of (relatively speaking) looks like code that a C programmer might write. But back in the early days (mid 90s?) it emitted "low-level" GNU (not ANSI/ISO) C, which basically looked like assembly code for an abstract machine. Kind of like the "compilation strategy" described in TFA, taken a number of steps further.
One of the nice things about GNU C is that various GCC-specific extensions can be taken advantage of. For example ISO C does not require a compiler to optimise tail recursive calls, which are heavily used in functional (and logic) programming languages, where recursion is used instead of loops. It also lets you access CPU registers directly, and write inline assembly code.
This becomes difficult when you want to do something that C doesn't really handle (well), for example the processor flags or for example passing control to another function, leaving the stack intact (objc_msgSend).
If you can then insert some asm, that's good, otherwise you hit a brick wall.
Another language that famously took this route is Eiffel.
EiffelStudio uses a VM like environment for the develop-compile-debug cycle, and then compilation to native code via the platform's system C compiler for deployment.
Eiffel seemed like a really good language. I wish it had succeeded more and become mainstream. I've tried it out some with EiffelStudio earlier. Their support for developing GUI apps, at least on Windows, also seemed good, based on what I tried of it.
The book "Object-Oriented Software Construction" by Bertrand Meyer, Eiffel creator, is also very good. I had read most of it some years ago.
I remember compilation being rather slow though, and deleting the EIFGENS folder frequently when something stopped working. This meant all the C code had to be regenerated and then recompiled, which took a while.
I still think it was a great language for learning programming.
In addition to the ones already mentioned here, pypy, a tracing jit for python2 and python3, is another project which gets compiled to C. It's written in RPython, which gets compiled down to C.
RPython includes a jit generator that can be used to speedup new languages written in it. Pixie [0] is an example of another language written in RPython.
> [...] pypy, a tracing jit for python2 and python3, is another project which gets compiled to C.
That's interesting. How does it work? I think most JIT compilers emit assembly directly and then execute it. Does PyPy generate C code while your program is running?
I have always asked this question why C as IL and not something like C-- which is more portable and less system call specific. I think that LLVM IR already serves a good purpose at doing this and is really easy to hand program in.
I'm still learning how to build a language, and compiling to c seems like the easiest thing to do, but I'm curious if it's more difficult or not.
If my language is close to c (functions, scope, variable, types), can I take advantage of it so it's less work in my compiler to let the c compiler catch errors, or must I rewrite a full parser?
All I want to do is add pythonic indenting, range loops, maps, and geometric types with their operators (a little similar to shader languages).
As described, you’ll almost definitely need to write a parser. However, if your type system is very similar to C’s you may be able to just lower from the AST to C and let it worry about semantic analysis and code generation.
The biggest loss is that a lot of error and debug info will be relative to the C rather than your language. This means the user will be exposed to the internals.
Probably the biggest win, is that it’s almost trivial to integrate C, and potentially C++ libraries into your language. This has been hugely beneficial to Nim, for example. They have written a full front end though so they don’t have the error issue above.
I would say that emiting LLVM IR is easier than C. I have done a few toy compiler projects with LLVM and it is very nice. I've never done a compiler that emits C, but I can't imagine it being any easier. At least you'd have to write more boilerplate code.
You also can't create JIT compiled REPL using C as easily.
You can implemement reasonably C-like languages (eg. Objective C and various C-derived DSLs in many Lisp implementations) by a simple text transforming preprocessor and get reasonable error messages from the C backend (#file is your friend for achieving that).
But for anything more complex you are better off implementing complete parser with your own error handling and reporting.
Edit: one of the clearest signs of language implemented in this way is use of '@' character as part of syntax extensions as it is the only printable ascii character that has no syntactic meaning in C (the other such "unused character" is '$', but its meaning is implementation-defined, for example gcc has option that causes it to be accepted as part of identifiers, which is quite obviously the default on VMS)
> let the c compiler catch errors, or must I rewrite a full parser?
In the long run you always want to do it specifically for your language, because otherwise giving usable error messages and debugging information becomes impossible.
I heard that GCC supports more architectures than LLVM. Perhaps Rust having a C backend would make sense? On the other hand, I have no idea how tightly coupled to LLVM it is.
The language itself isn't coupled to LLVM outside of some underspecified details around things like unsafe and the memory model, which will eventually be better-defined.
The compiler has historically been fairly strongly tied to LLVM. This is changing for several reasons as the compiler is refactored, which should enable alternative backends like directly generating C or even plugging directly into GCC.
There is also an LLVM backend for generating C, at various levels of maintenance, that might be made to work.
LLVM is used for optimizations in Rust. LLVM is more than just a nice way to target multiple backends, it's a very sophisticated set of building blocks for compilers.
Vala is interesting because it adds syntactic sugar for the GObject OOP library [1] which is a fundament in GTK+ GUI programming (but also can be used independently). All this was (is?) part of the idea that C++, Qt (with the moc) and finally KDE is bloated, broken from the roots. From my understanding, the main argument is the (in many cases unneccessary) complexity of C++ vs. the simplicity of C. In the Gnome desktop world, from my experience many devs moved on to Python based GUI programming. But I still think there are purists who prefer C/GLib due to its plain design.
[+] [-] WalterBright|8 years ago|reply
> Without doubt, today the answer is C++ and Objective-C – languages whose first compilers emitted C code.
I wrote the first native C++ compiler for the PC. It was a huge win over the cfront based PC compilers, which were nearly unworkable.
[+] [-] WalterBright|8 years ago|reply
I run into these sorts of problems with D's C++ interface. In particular, how "long long" and "long" are mangled when they are the same size, and when "long long" is used vs when "long" is used. It's an endless whack-a-mole problem.
[+] [-] stingraycharles|8 years ago|reply
Is that only because of the extra compilation step, or are there other things making this more slow ?
[+] [-] rdc12|8 years ago|reply
[+] [-] quicklime|8 years ago|reply
One of the nice things about GNU C is that various GCC-specific extensions can be taken advantage of. For example ISO C does not require a compiler to optimise tail recursive calls, which are heavily used in functional (and logic) programming languages, where recursion is used instead of loops. It also lets you access CPU registers directly, and write inline assembly code.
[+] [-] mpweiher|8 years ago|reply
If you can then insert some asm, that's good, otherwise you hit a brick wall.
See: https://cr.yp.to/qhasm/20050129-portable.txt
[+] [-] lisper|8 years ago|reply
[+] [-] pjmlp|8 years ago|reply
EiffelStudio uses a VM like environment for the develop-compile-debug cycle, and then compilation to native code via the platform's system C compiler for deployment.
[+] [-] vram22|8 years ago|reply
The book "Object-Oriented Software Construction" by Bertrand Meyer, Eiffel creator, is also very good. I had read most of it some years ago.
[+] [-] jmiserez|8 years ago|reply
I still think it was a great language for learning programming.
[+] [-] shmolyneaux|8 years ago|reply
RPython includes a jit generator that can be used to speedup new languages written in it. Pixie [0] is an example of another language written in RPython.
[0]: https://github.com/pixie-lang/pixie
[+] [-] tburmeister|8 years ago|reply
[+] [-] Scarbutt|8 years ago|reply
[+] [-] frankpf|8 years ago|reply
That's interesting. How does it work? I think most JIT compilers emit assembly directly and then execute it. Does PyPy generate C code while your program is running?
[+] [-] throwaway7645|8 years ago|reply
[+] [-] 1wd|8 years ago|reply
And Bigloo Scheme. http://www-sop.inria.fr/indes/fp/Bigloo/doc/bigloo-3.html
And Gambit Scheme. http://gambitscheme.org/
There seems to be a theme here...
[+] [-] pankajdoharey|8 years ago|reply
[+] [-] jokoon|8 years ago|reply
If my language is close to c (functions, scope, variable, types), can I take advantage of it so it's less work in my compiler to let the c compiler catch errors, or must I rewrite a full parser?
All I want to do is add pythonic indenting, range loops, maps, and geometric types with their operators (a little similar to shader languages).
[+] [-] lucozade|8 years ago|reply
The biggest loss is that a lot of error and debug info will be relative to the C rather than your language. This means the user will be exposed to the internals.
Probably the biggest win, is that it’s almost trivial to integrate C, and potentially C++ libraries into your language. This has been hugely beneficial to Nim, for example. They have written a full front end though so they don’t have the error issue above.
Like most things it’s a trade off.
[+] [-] exDM69|8 years ago|reply
You also can't create JIT compiled REPL using C as easily.
[+] [-] dfox|8 years ago|reply
But for anything more complex you are better off implementing complete parser with your own error handling and reporting.
Edit: one of the clearest signs of language implemented in this way is use of '@' character as part of syntax extensions as it is the only printable ascii character that has no syntactic meaning in C (the other such "unused character" is '$', but its meaning is implementation-defined, for example gcc has option that causes it to be accepted as part of identifiers, which is quite obviously the default on VMS)
[+] [-] tyingq|8 years ago|reply
[+] [-] qznc|8 years ago|reply
If you only play around, using C is fine.
[+] [-] dtech|8 years ago|reply
In the long run you always want to do it specifically for your language, because otherwise giving usable error messages and debugging information becomes impossible.
[+] [-] d33|8 years ago|reply
[+] [-] Rusky|8 years ago|reply
The compiler has historically been fairly strongly tied to LLVM. This is changing for several reasons as the compiler is refactored, which should enable alternative backends like directly generating C or even plugging directly into GCC.
There is also an LLVM backend for generating C, at various levels of maintenance, that might be made to work.
[+] [-] adrianN|8 years ago|reply
[+] [-] __s|8 years ago|reply
For awhile LLVM could target C
[+] [-] cordite|8 years ago|reply
[+] [-] ktpsns|8 years ago|reply
[1] https://en.wikipedia.org/wiki/GObject
[+] [-] unknown|8 years ago|reply
[deleted]