I tried this out on my system last night. Compile time was quite large (I would say about an hour and a half on my Ryzen 3600X.) I use the DOOM Emacs config, and was surprised to find most things working out of the box with the native compilation. I noticed no difference in startup time. The speed boost was surprisingly noticeable, however, when e.g. opening a buffer that causes a language server to start. Opening CCLS was... instant. It's usually quite quick, but this was noticeably faster.
does it do anything to change/fix Emacs shitting itself on large buffers?
last time i gave it a try opening a large file would completely kill performance, and iirc in particular really long lines (ex. 1000+ chars) would make the thing chug even if the actual file wasn't super big or anything.
I have a Gentoo ebuild [0] almost working for this (I think I'm missing the step where the eln files are loaded before dumping the base image). The compile times are ... substantial. However they don't affect the development process for new code since the interpreter is always there, and I am excited to see what performance gains we will see.
From an engineering perspective this is a excellent example of a direct path from interpreted to compiled code. The trade-offs are clear (heck, they are number 0-3), and while there is complexity, all the engineering time has been effectively concentrated inside a single project, rather than forced upon tens of thousands of maintainers and users. Bravo. I wonder what other bytecode interpreters could benefit from this toolchain.
Compile times from qlop.
On a Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz
Without comp `2020-05-03T15:33:27 >>> app-editors/emacs: 6′40″`
With compile `2020-05-03T18:35:43 >>> app-editors/emacs: 2:11:45`
As a exclusive emacs user for the last 20 years, I'm quite excited; my main complain about emacs is its slow down with some more sophisticated packages. I wonder if it improves magit performance with large codebases.
One reason magit is slow (that last I checked still hasn't been fixed) is that it spawns a large number of processes calling out to git for each operation. Most of these are redundant and could/should go away with either a redesign or a working caching scheme. This issue is more than obvious on platforms (e.g. some macOS versions) where fork/vfork are not as fast as one would expect. I like the paradigm behind magit, but the implementation leaves a lot to be desired. So the pig in this case is definitely not Emacs.
Here some rough, non-scientific testing/benchmark statistics.
It took me 124.98 mins to build the native-comp branch with a 4-cores/8-threads i7-4790k, 32GB RAM on a LXC instance while the master branch took me 244 seconds. Both branches are obtained from latest available git snapshot as 2020-5-4 20:20 CDT.
The following function is passed to (benchmark-run-compiled 10 ...) for each run.
Before each run, (garbage-collect) function is called.
With Emacs's built-in core lisp functions, cl-lib functions and the above benchmark function are all native-compiled, it took 0.5823 second to complete.
With Emacs built-ins and cl-lib are native-compiled but the benchmark function is byte-compiled, it takes 0.6411 second to complete.
With byte-compiled Emacs built-in/cl-lib/benchmark functions, it takes 1.3574 second to complete.
With byte-compiled Emacs built-in/cl-lib functions and interpreted benchmark function, it takes 78.054 seconds with 1 GC taking 75.094 second, which implies the execution roughly takes 2.96 seconds.
I also ran same benchmark on a 4-core A10-6800k. and observed similar ratios on the builds from 2020-5-3.
That's an interesting question. My other guess would have been Gimp, and based on a quick glance at Debian's popcon, I'd say Gimp might be slightly in the lead.
Then again, like Open Firmware deployed Forth on millions of computers, right under our noses, it would not surprise me at all if there were a simple Lisp implementation hidden on every computer in the world.
Looks really cool! One thing I noticed was the generated code seemed to have fairly poor register allocation, looking more like it was just pulling things straight out from locals into registers and immediately storing them back. From the talk, it looks like that was what was being provided to libgccjit, but surely it could optimize that further?
Compilers are "just" a series of transformations / translations from higher-level code to lower-level code. The top is code like C, python, elisp, whatever, and the bottom is machine code for amd64, arm7, whatever. All the in-between code is in some intermediate representation (IR).
Each successive step takes care of different optimizations, modifying the code as it goes down. At the last step, he converts LIMPLE to an IR (intermediate representation) that libgccjit understands, and hands it off to gcc for native compilation.
Could you just start with elisp and emit amd64 machine code in one step? Absolutely, but it would be hell to maintain, and then you lose out on all the pluggability of modern compilers. If you (consume and/or) emit standard(-ish) IRs, you get to participate in a pretty amazing ecosystem.
Did anyone successfully compile it for x86 32-bit target? The docker image is 64-bit and the compilation seems to be hitting the gcc-i386 limit of 3GiB.
This seems convoluted compared to say moving the Lisp implementation from Emacs Lisp to Common Lisp, of which several native code compiling implementations exists.
It brings GCC into the address space via libgccjit. GCC is not robust enough to be integrated into applications that stay running, and be repeatedly invoked.
I assume you mean writing an Emacs lisp interpreter in Common Lisp — which could be done with a small lexer, a lot of macros, and some library support. That would probably be a win.
[+] [-] poidos|5 years ago|reply
Great job all around!
[+] [-] tincholio|5 years ago|reply
I haven't had any stability issues with it, either, it just works.
[+] [-] s-km|5 years ago|reply
last time i gave it a try opening a large file would completely kill performance, and iirc in particular really long lines (ex. 1000+ chars) would make the thing chug even if the actual file wasn't super big or anything.
[+] [-] gumby|5 years ago|reply
[+] [-] pmiller2|5 years ago|reply
Here is the corresponding paper: https://arxiv.org/pdf/2004.02504.pdf
[+] [-] tgbugs|5 years ago|reply
From an engineering perspective this is a excellent example of a direct path from interpreted to compiled code. The trade-offs are clear (heck, they are number 0-3), and while there is complexity, all the engineering time has been effectively concentrated inside a single project, rather than forced upon tens of thousands of maintainers and users. Bravo. I wonder what other bytecode interpreters could benefit from this toolchain. Compile times from qlop.
On a Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz Without comp `2020-05-03T15:33:27 >>> app-editors/emacs: 6′40″` With compile `2020-05-03T18:35:43 >>> app-editors/emacs: 2:11:45`
0. https://github.com/tgbugs/tgbugs-overlay/blob/master/app-edi...
[+] [-] aoeuhtns|5 years ago|reply
[+] [-] gpderetta|5 years ago|reply
Pigs fly just fine with with enough thrust.
[+] [-] metroholografix|5 years ago|reply
Example:
[+] [-] gongyiliao|5 years ago|reply
It took me 124.98 mins to build the native-comp branch with a 4-cores/8-threads i7-4790k, 32GB RAM on a LXC instance while the master branch took me 244 seconds. Both branches are obtained from latest available git snapshot as 2020-5-4 20:20 CDT.
The following function is passed to (benchmark-run-compiled 10 ...) for each run.
gc-cons-threshold is set as 268435456 (~ 256MB)Before each run, (garbage-collect) function is called.
With Emacs's built-in core lisp functions, cl-lib functions and the above benchmark function are all native-compiled, it took 0.5823 second to complete.
With Emacs built-ins and cl-lib are native-compiled but the benchmark function is byte-compiled, it takes 0.6411 second to complete.
With byte-compiled Emacs built-in/cl-lib/benchmark functions, it takes 1.3574 second to complete.
With byte-compiled Emacs built-in/cl-lib functions and interpreted benchmark function, it takes 78.054 seconds with 1 GC taking 75.094 second, which implies the execution roughly takes 2.96 seconds.
I also ran same benchmark on a 4-core A10-6800k. and observed similar ratios on the builds from 2020-5-3.
[+] [-] register|5 years ago|reply
No reaction at all. I am really curios to understand what I did wrong at the time.
[+] [-] saagarjha|5 years ago|reply
[+] [-] guenthert|5 years ago|reply
[+] [-] ken|5 years ago|reply
That's an interesting question. My other guess would have been Gimp, and based on a quick glance at Debian's popcon, I'd say Gimp might be slightly in the lead.
Then again, like Open Firmware deployed Forth on millions of computers, right under our noses, it would not surprise me at all if there were a simple Lisp implementation hidden on every computer in the world.
Is there anything more recent than DSSSL?
[+] [-] saagarjha|5 years ago|reply
[+] [-] harrygeez|5 years ago|reply
[+] [-] philsnow|5 years ago|reply
Each successive step takes care of different optimizations, modifying the code as it goes down. At the last step, he converts LIMPLE to an IR (intermediate representation) that libgccjit understands, and hands it off to gcc for native compilation.
Could you just start with elisp and emit amd64 machine code in one step? Absolutely, but it would be hell to maintain, and then you lose out on all the pluggability of modern compilers. If you (consume and/or) emit standard(-ish) IRs, you get to participate in a pretty amazing ecosystem.
[+] [-] schiffl|5 years ago|reply
[+] [-] nemoniac|5 years ago|reply
[+] [-] fmakunbound|5 years ago|reply
[+] [-] widdershins|5 years ago|reply
[+] [-] kazinator|5 years ago|reply
[+] [-] gumby|5 years ago|reply
[+] [-] wtetzner|5 years ago|reply