Compiling dynamic programming languages

[+] alangpierce|7 years ago|reply

Hmm, I think the performance results for the last example aren't actually valid. The example/tco.js program is a linear-time algorithm since it's only making one recursive call, not two, so at n=50 you should expect it to be pretty much instant. Much, much less than a millisecond to evaluate.

I think the "0.080 total" and "0.087 total" are just from node startup time (and the variability within that), not time actually executing the function. I just ran node on an empty file and the running time was "0.087 total".

I think when doing these sorts of perf measurements it's best to make the code run for multiple seconds to better account for startup time, JIT warmup, caches, and the various other subtle factors that come into play.

[+] snek|7 years ago|reply

I cannot agree more here. I strongly recommend benchmark.js (https://github.com/bestiejs/benchmark.js/)

[+] snek|7 years ago|reply

In V8, crossing the boundary between the public C++ API and "js land" is actually quite expensive. In most cases you will get more perf from writing your code in JS. This is why we write a lot of Node.js's core in JS instead of C++.

Operations you see in the compiled output like `Local<Function>::Cast(global_3->Get(String::NewFromUtf8(isolate, "Boolean")));` are extraordinarily expensive, and should pretty much be avoided at all costs.

[+] eatonphil|7 years ago|reply

Thanks for the info. jsc right now is little more than PoC and I don't have a ton of hope for competing with V8 for performance long term (except perhaps when type inference or hinting enter the story).

The most immediate advantage I can think of for having a project like jsc long-term is for packaging/deployment. we use zeit's pkg tool at work today but a robust/mature Javascript-to-native compiler is much more compelling.

[+] timruffles|7 years ago|reply

I'm doing this for Javascript - finding it a lot of fun! https://github.com/timruffles/js-to-c

I'd recommend it as a project to any wanting to learn more about a particular language (implementing a language teaches you how it works in painstaking detail), and get a better 'feel' for how languages and compilation works in general.

[+] sdegutis|7 years ago|reply

Ironic that your first initial and last name is truffles considering bridging between C and JS (and other languages) is possible if they're implemented via Graal's Truffle library: https://github.com/oracle/graal/tree/master/truffle

[+] 13of40|7 years ago|reply

Me too, except with Visual Basic 6. (It's a work thing.) Besides error handling, I think the gnarliest thing I've come across is this: VB has built in file management commands it inherited from GW-BASIC. Built in, in the sense that they're first class statements built into the language. One of these, for renaming files, goes "NAME <old> AS <new>", where old and new can be arbitrarily complex expressions. Name isn't a reserved word, so you can have a variable or subroutine called "name". So consider these three statements:

name (...) as (...)

name (...) = (...)

name (...)

...where (...) represents an arbitrarily complex expression. The first one is a "name as" command, the second is an assignment to an array element, and the third is a procedure call, and you can't tell which until you've fully and accurately parsed the first arbitrarily huge expression.

Also, the following:

name:

In BASIC a colon separates two statements on the same line, and an empty statement is valid, so is this a label or a subroutine call followed by an empty statement?

Edit - just remembered this gem: You can use a "With" block to save having to type the name of an object when referring to its members...

With SomeObject

    MsgBox .Name

End With

...will pop up a dialog showing SomeObject.Name. Now go ahead and tokenize that second line. If you're like me you got three tokens: [MsgBox][.][Name] The problem is that's indistinguishable from...

MsgBox.Name

You could say that ".Name" should be one token. Hmm. So what if it's something like...

MsgBox .Name.Substring(1, 3).ToUpper()

Still one token? I think my hair's going to be white by the time I finish this project.

[+] UncleEntity|7 years ago|reply

I wonder if it's possible to just get the AST out of V8 and not have maintain your own lexer/parser? I also wonder if abstract interpretation might be the way to go to get type information out of the js code which should simplify the generated C++.

That said, reading TFA and the linked BSDScheme one gave me some ideas for the next time I get around to playing with minischeme. Too many toys and not enough time...

[+] eatonphil|7 years ago|reply

I'm not using a parser I wrote. Though of course someone must maintain it.

I considered writing this in C++ to get more tooling (JS parser, C++ AST libraries, etc.) but in the short-term I do not see myself switching. I'd be a little more likely to switch back to D though because I find data structures in Rust annoying.

[+] bambataa|7 years ago|reply

It’s striking how unreadable the V8 output is. Is that due to V8 itself or just this particular use of it?

[+] eatonphil|7 years ago|reply

It's the product of whipping together some basic examples for proof of concept in a few weeks. :)

To some degree it will always be "messier". 30-60% of generated code will just be converting between C++ and V8 types.

But really, I just need to prioritize using more tmp values so I'm not shoving crazy amounts of logic in one line.

It can definitely be difficult to debug. Things would be easier if I were generating C++ ASTs and pretty-printing it (I should probably do this in the future) but for the PoC I'm just emitting strings of C++.

Chicken Scheme's generated C is more along the lines of what I'm aspiring to.

[+] benbristow|7 years ago|reply

I don't think the output really needs to be readable since they're going for speed over anything else.

Who is normally reading this stuff?

[+] jejones3141|7 years ago|reply

I'm disappointed. From the title it sounded like someone had created a language specifically for dynamic programming (https://en.wikipedia.org/wiki/Dynamic_programming).

[+] jakeinspace|7 years ago|reply

Plenty already exist which support dynamic programming "natively." For example, the 'memoize' function in clojure.

[+] basil-rash|7 years ago|reply

Slap an @memoize decorator on a python function and you're good to go.

[+] znpy|7 years ago|reply

One might argue that this guy actually wrote a transpiler and not a true compiler as the target language is D and not assembly or machine language.

I don't want to diminish the work this guy has done, but one of the biggest challenges, unless you want to take credit for someone else's work, is to do all the various kind of optimization that a mature compiler would do, at various levels.

So yeah, that's cool, but I was expecting more given the use of the word "compiling".

[+] n4r9|7 years ago|reply

I think the use of the word is fine. Whilst "compile" is often used to refer to programs whose target language is assembly/machine code, it generally refers to any translation from one computer language to another [0].

A "transpiler" is therefore a specific type of compiler.

[0] http://www.compilers.net/paedia/compiler/index.htm

[+] peterkelly|7 years ago|reply

I consider the term "compiler" to be very generic, and to cover a wide spectrum of tools ranging from those which compile from one high-level programming language to a slightly lower-level language (e.g. TypeScript), right down those which produce machine code (e.g. GCC/Clang+LLVM). In fact several major compilers (such as GHC and the first C++ compiler, cfront) started out by using C as their backend.

"Transpiler", which is arguably somewhat of a buzzword, typically refers to compilers where the source and target language are either the same, or at a very similar level of abstraction; I agree with n4r9 in considering them a particular type of compiler. Even so, the implementation described in the article compiles from a high-level language, JavaScript, to a considerably lower-level language, C++; so even if we exclude transpilers from our definition of what constitutes a true compiler I still think it's a valid term to apply in this case.

[+] gnufx|7 years ago|reply

I don't remember much argument about the term "compiler" when compilation to C started appearing. That goes back at least as far as Scheme->C [1] and f2c [2]. (f2c was "just" a re-targeting of the first Fortran 77 implementation to a C backend.)

1. http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-89-1.pdf

2. http://www.netlib.org/f2c/

36 comments