top | item 37268671

(no title)

I've tried AST memory layout work in the Zig style, look for "shrinkage" in these two posts:

- https://nnethercote.github.io/2022/10/27/how-to-speed-up-the...

- https://nnethercote.github.io/2023/03/24/how-to-speed-up-the...

It's hard work. Small AST changes often require hundreds of changes to the code. The required changes usually make the AST less ergonomic to work with. And the perf benefits I obtained were very small. Even after shrinking `ast::Expr` (by far the most common AST node kind) from over 100 bytes to 64 bytes on 64-bit.

The linked Zig PR has very impressive reductions in walltime/cycles/etc. but if you read closely it's restricted just to parsing, which is usually a very small slice of compile time, at least for rustc. My experience with these kinds of changes was disappointing. I concluded "I’d love to be proven wrong, but it doesn’t feel like this work is taking place in a good part of the effort/benefit curve."

discuss

Rusky|2 years ago

Zig did go on to apply the same style to their two later IRs, ZIR (https://github.com/ziglang/zig/pull/8266) and AIR (https://github.com/ziglang/zig/pull/9353). These don't look as thoroughly benchmarked (at least not on the PRs) but it looks like they got significant wins there as well.

Of course Zig is a very different language and its compiler handles a rather different workload. It's totally possible that their approach makes more sense in younger codebases, or with a different source language design, or whatever else. But I also don't think node size tells the whole story- there's a synergy between memory usage, memory layout, and memory access patterns. For example Cranelift gets a lot of mileage from tweaking their algorithms in combination with their data structures, e.g. the "half-move" design mentioned in https://cfallin.org/blog/2022/06/09/cranelift-regalloc2/#per...

AndyKelley|2 years ago

We also migrated all compile-time types and values to be stored in a similar fashion: https://github.com/ziglang/zig/pull/15569

Perf was a bit of a wash on this one but it means we can serialize most of the compiler's state with a single pwritev syscall. For a 300,000 line codebase, this data takes up 30 MiB and uses the same format on disk as in memory. On my laptop, 30 MiB can be written from memory to disk in 25ms. This is one puzzle piece for incremental compilation. More puzzle pieces are listed in the PR description here: https://github.com/ziglang/zig/pull/16917

AndyKelley|2 years ago

This is interesting work - thank you for sharing!