Bringing GNU Emacs to native code [video]

[+] poidos|5 years ago|reply

I tried this out on my system last night. Compile time was quite large (I would say about an hour and a half on my Ryzen 3600X.) I use the DOOM Emacs config, and was surprised to find most things working out of the box with the native compilation. I noticed no difference in startup time. The speed boost was surprisingly noticeable, however, when e.g. opening a buffer that causes a language server to start. Opening CCLS was... instant. It's usually quite quick, but this was noticeably faster.

Great job all around!

[+] tincholio|5 years ago|reply

Similar story for me (though unfortunately on a few years old i7, so the compile time was well over 2h).

I haven't had any stability issues with it, either, it just works.

[+] s-km|5 years ago|reply

does it do anything to change/fix Emacs shitting itself on large buffers?

last time i gave it a try opening a large file would completely kill performance, and iirc in particular really long lines (ex. 1000+ chars) would make the thing chug even if the actual file wasn't super big or anything.

[+] gumby|5 years ago|reply

Note: this is a video. Here are the slides: http://akrl.sdf.org/gccemacs_els2020.pdf

[+] pmiller2|5 years ago|reply

I'd recognize a LaTeX beamer presentation anywhere.

Here is the corresponding paper: https://arxiv.org/pdf/2004.02504.pdf

[+] tgbugs|5 years ago|reply

I have a Gentoo ebuild [0] almost working for this (I think I'm missing the step where the eln files are loaded before dumping the base image). The compile times are ... substantial. However they don't affect the development process for new code since the interpreter is always there, and I am excited to see what performance gains we will see.

From an engineering perspective this is a excellent example of a direct path from interpreted to compiled code. The trade-offs are clear (heck, they are number 0-3), and while there is complexity, all the engineering time has been effectively concentrated inside a single project, rather than forced upon tens of thousands of maintainers and users. Bravo. I wonder what other bytecode interpreters could benefit from this toolchain. Compile times from qlop.

On a Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz Without comp `2020-05-03T15:33:27 >>> app-editors/emacs: 6′40″` With compile `2020-05-03T18:35:43 >>> app-editors/emacs: 2:11:45`

0. https://github.com/tgbugs/tgbugs-overlay/blob/master/app-edi...

[+] aoeuhtns|5 years ago|reply

nice, thanks for the ebuild!

[+] gpderetta|5 years ago|reply

As a exclusive emacs user for the last 20 years, I'm quite excited; my main complain about emacs is its slow down with some more sophisticated packages. I wonder if it improves magit performance with large codebases.

Pigs fly just fine with with enough thrust.

[+] metroholografix|5 years ago|reply

One reason magit is slow (that last I checked still hasn't been fixed) is that it spawns a large number of processes calling out to git for each operation. Most of these are redundant and could/should go away with either a redesign or a working caching scheme. This issue is more than obvious on platforms (e.g. some macOS versions) where fork/vfork are not as fast as one would expect. I like the paradigm behind magit, but the implementation leaves a lot to be desired. So the pig in this case is definitely not Emacs.

Example:

  # Doing a simple magit refresh results in the following processes being spawned (tracked with dtrace)
  # Notice how many calls are completely redundant

  2020 May  4 12:46:22 11819 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
  2020 May  4 12:46:22 11820 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
  2020 May  4 12:46:22 11821 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
  2020 May  4 12:46:22 11822 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
  2020 May  4 12:46:22 11823 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
  2020 May  4 12:46:22 11824 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
  2020 May  4 12:46:22 11825 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
  2020 May  4 12:46:22 11826 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
  2020 May  4 12:46:22 11827 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
  2020 May  4 12:46:22 11828 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
  2020 May  4 12:46:22 11829 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
  2020 May  4 12:46:22 11830 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
  2020 May  4 12:46:22 11831 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
  2020 May  4 12:46:22 11832 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
  2020 May  4 12:46:22 11833 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
  2020 May  4 12:46:22 11834 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
  2020 May  4 12:46:22 11835 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
  2020 May  4 12:46:22 11836 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
  2020 May  4 12:46:22 11837 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
  2020 May  4 12:46:22 11838 <67471> 64b  /opt/local/bin/git --no-pager -c core.preloadindex=true -c log.showSignature=false <...>
  2020 May  4 12:46:22 11839 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
  2020 May  4 12:46:22 11840 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
  2020 May  4 12:46:22 11841 <67471> 64b  /opt/local/bin/git --no-pager -c core.preloadindex=true -c log.showSignature=false <...>
  2020 May  4 12:46:22 11842 <67471> 64b  /opt/local/bin/git --no-pager -c core.preloadindex=true -c log.showSignature=false <...>
  2020 May  4 12:46:22 11843 <67471> 64b  /opt/local/bin/git --no-pager -c core.preloadindex=true -c log.showSignature=false <...>
  2020 May  4 12:46:22 11844 <67471> 64b  /opt/local/bin/git --no-pager -c core.preloadindex=true -c log.showSignature=false <...>
  2020 May  4 12:46:22 11845 <67471> 64b  /opt/local/bin/git --no-pager -c core.preloadindex=true -c log.showSignature=false <...>
  2020 May  4 12:46:22 11846 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>

[+] gongyiliao|5 years ago|reply

Here some rough, non-scientific testing/benchmark statistics.

It took me 124.98 mins to build the native-comp branch with a 4-cores/8-threads i7-4790k, 32GB RAM on a LXC instance while the master branch took me 244 seconds. Both branches are obtained from latest available git snapshot as 2020-5-4 20:20 CDT.

The following function is passed to (benchmark-run-compiled 10 ...) for each run.

  ;; -*- lexical-binding: t -*-

  (require 'cl-lib)

  (defun bf-1 nil
     (/
       (apply '+
        (cl-loop repeat 300000
         collect (cl-random 1.0)))
     300000.0))

gc-cons-threshold is set as 268435456 (~ 256MB)

Before each run, (garbage-collect) function is called.

With Emacs's built-in core lisp functions, cl-lib functions and the above benchmark function are all native-compiled, it took 0.5823 second to complete.

With Emacs built-ins and cl-lib are native-compiled but the benchmark function is byte-compiled, it takes 0.6411 second to complete.

With byte-compiled Emacs built-in/cl-lib/benchmark functions, it takes 1.3574 second to complete.

With byte-compiled Emacs built-in/cl-lib functions and interpreted benchmark function, it takes 78.054 seconds with 1 GC taking 75.094 second, which implies the execution roughly takes 2.96 seconds.

I also ran same benchmark on a 4-core A10-6800k. and observed similar ratios on the builds from 2020-5-3.

[+] register|5 years ago|reply

I posted the same content 5 days ago here: https://news.ycombinator.com/item?id=23021574

No reaction at all. I am really curios to understand what I did wrong at the time.

[+] saagarjha|5 years ago|reply

Nothing at all. You just lost the Hacker News lottery.

[+] guenthert|5 years ago|reply

And to add insult to injury, your comment is being down-voted. It's a cruel web.

[+] ken|5 years ago|reply

"Arguably the most deployed Lisp today?"

That's an interesting question. My other guess would have been Gimp, and based on a quick glance at Debian's popcon, I'd say Gimp might be slightly in the lead.

Then again, like Open Firmware deployed Forth on millions of computers, right under our noses, it would not surprise me at all if there were a simple Lisp implementation hidden on every computer in the world.

Is there anything more recent than DSSSL?

[+] saagarjha|5 years ago|reply

Looks really cool! One thing I noticed was the generated code seemed to have fairly poor register allocation, looking more like it was just pulling things straight out from locals into registers and immediately storing them back. From the talk, it looks like that was what was being provided to libgccjit, but surely it could optimize that further?

[+] harrygeez|5 years ago|reply

Just when I thought I know enough compsci to understand everything on a sufficiently high level, this guy totally lost me starting from LIMPLE

[+] philsnow|5 years ago|reply

Compilers are "just" a series of transformations / translations from higher-level code to lower-level code. The top is code like C, python, elisp, whatever, and the bottom is machine code for amd64, arm7, whatever. All the in-between code is in some intermediate representation (IR).

Each successive step takes care of different optimizations, modifying the code as it goes down. At the last step, he converts LIMPLE to an IR (intermediate representation) that libgccjit understands, and hands it off to gcc for native compilation.

Could you just start with elisp and emit amd64 machine code in one step? Absolutely, but it would be hell to maintain, and then you lose out on all the pluggability of modern compilers. If you (consume and/or) emit standard(-ish) IRs, you get to participate in a pretty amazing ecosystem.

[+] schiffl|5 years ago|reply

Did anyone successfully compile it for x86 32-bit target? The docker image is 64-bit and the compilation seems to be hitting the gcc-i386 limit of 3GiB.

[+] nemoniac|5 years ago|reply

Very keen to try this out but that branch won't compile for me. What's the trick?

[+] fmakunbound|5 years ago|reply

This seems convoluted compared to say moving the Lisp implementation from Emacs Lisp to Common Lisp, of which several native code compiling implementations exists.

[+] widdershins|5 years ago|reply

Rewriting thousands of packages (some of which have had man-decades of work poured into them) would also be rather convoluted.

[+] kazinator|5 years ago|reply

It brings GCC into the address space via libgccjit. GCC is not robust enough to be integrated into applications that stay running, and be repeatedly invoked.

[+] gumby|5 years ago|reply

I assume you mean writing an Emacs lisp interpreter in Common Lisp — which could be done with a small lexer, a lot of macros, and some library support. That would probably be a win.

[+] wtetzner|5 years ago|reply

You don't think it would be difficult to keep compatibility with all of the existing Emacs Lisp packages that exist now?

83 comments