top | item 32329062

(no title)

eddyb | 3 years ago

No, at most we would've invested more in e.g. ensuring¹ you cannot accidentally (or intentionally) mix-and-match shared objects not built by the same build process, or in ways to reduce the cost² of safe ("full relro") dynamic linking.

¹ via forcing the dynamic linker to eagerly resolve some symbols, global constructors, etc. - see https://github.com/rust-lang/rust/issues/73917#issuecomment-... for some recent discussion

² such as exporting a single base symbol per dylib (with a hash of code/data contents in the symbol name, for integrity), and statically resolving imports (to constant offsets from that base symbol), when linking object files into dylibs/executables - because of how position-independent code indirects through the "GOT", in practice this would mean that the dynamic linker would only need to lookup one symbol per dependency .so instead of one per import (of which there can be hundreds or thousands)

Also, "dynamic linking" in Rust was never really "dynamic" or about "drop-in replacement to avoid rebuilds", it was only about the storage reuse of "shared objects", with the "late-binding" semantics inherent in dynamic linking a negative, not a positive.

For example, rustc, rustdoc, clippy, miri, etc. are all relatively small executables that link against librustc_driver-*.so - on a recent nightly that's 120MiB, with only static linking that'd be half a GiB of executables instead. The relationship is "one .so shared between N executables" not "one .so per library". Also, if we could remove runtime symbol lookup but keep the separate .so (like ² above), we totally would.

---

At the same time, Rust's continued evolution depends on being able to (almost) "never ask for permission" when changing internals that were never guaranteed to be one thing or another.

Rust recently fixed most platforms to use simpler and more performant locks based on futex(-like) APIs (see https://github.com/rust-lang/rust/issues/93740 and PRs like https://github.com/rust-lang/rust/pull/95035), and this meant that e.g. Mutex<T> has a 32-bit (atomic) integer where a pointer used to be.

An even more aggressive change was decoupling Ipv{4,6}Addr/SocketAddrV{4,6} from the C types they used to wrap internally (https://github.com/rust-lang/rust/pull/78802) - that one actually required library authors to fix their code in a few cases (that were very incorrectly reinterpreting the std types as the C ones, without this ever being officially supported or condoned).

Imagine trying to do that in C++. The stories I keep hearing from people involved in the C++ standard are tragic, more and more otherwise-uncontroversial improvements are being held back by poorly-motivated stubbornness around ABI stability. Large-scale users of C++, like Google, have long since migrated from C++'s standard library to their own replacements (e.g. abseil) for many of their codebases, and I kept hearing Google were "forking C++" (over the frozen ABI debacle specifically), long before it came out as "Carbon".

---

The technology to coherently³ avoid today's rebuild-the-world cost in compiled languages⁴ isn't here yet IMO.

³ as in, guarantee identical behavior to the full rebuild from scratch, without introducing inconsistencies that could cause logic errors, memory corruption, etc.

⁴ some/most C libraries of course being the biggest exception - and while it's possible to intentionally design libraries like this in any language that can do FFI, it's fundamentally antithetical to "zero-cost abstractions" and compiler optimizations - nowadays the latter are forced even on otherwise-patchable C dependencies via LTO, in the interest of performance

What I'm referring to is incremental recompilation with the following properties:

...

[oops, hit comment size limit! I'll post the rest separately]

discuss

mwcampbell|3 years ago

Has the Rust compiler team considered using a busybox-style approach (single binary, multiple entry points) for rustc and friends? Even Windows supports hard linking, so AFAIK this should be feasible.

eddyb|3 years ago

We explicitly support building against librustc_driver-*.so for both "custom drivers" (what we call those binaries I mentioned) and generally "rustc as a library" usecases. We should maybe rename it to librustc and remove as much potentially-confusing "driver" terminology as possible.

Pinning a nightly and installing the "rustc-dev" rustup component are both necessary, the former because internal APIs of rustc aren't stable, but it's a supported usecase.

Both clippy and miri are developed out-of-tree like that, and sync'd using `git subtree` (or `git submodule` but we want to move everything to subtree).

eddyb|3 years ago

[continued from above due to size limit]

What I'm referring to is incremental recompilation with the following properties:

1. automatic correctness

` - that is, a compiler change not explicitly updating anything related to dependency tracking should at most result in conservative coarser-grained behavior, not unsoundly cause untracked dependencies

` - manual cache invalidation logic is a clear indicator a compiler isn't this - e.g. this rules out https://gcc.gnu.org/wiki/IncrementalCompiler (maybe also Roslyn/C# and the Swift compiler, but I'm not sure - they might be hybrid w/ automated coarse-grained vs manual fine-grained?)

` - the practical approach to this is to split workload into work units (aka tasks/queries/etc.) and then force information flow through centralized "request"/"query" APIs that automatically track dependencies - see https://github.com/salsa-rs/salsa for more information

` - research along the lines of ILC (Incremental Lambda Calculus) might yield more interesting results long-term

` - outside of a compiler, the only examples I'm aware of are build systems like tup (https://gittup.org/tup/) or RIKER (https://github.com/curtsinger-lab/riker) which use filesystem sandboxing (FUSE for tup, ptrace/seccomp-BPF for RIKER) for "automatically correct" build dependencies

2. fine-grained enough

` - at the very least, changes within function bodies should be isolated from other bodies, with only IPOs ("interprocedural optimizations", usually inlining) potentially introducing additional dependencies between function bodies

` - one extreme here is a "delta-minimizing" mode that attempts to (or at least warns the developer when it can't) reduce drift during optimizations and machine code generation, such that some fixes can end up being e.g. "patch a dozen bytes in 200 distro packages" and get quickly distributed to users

` - but even with history-agnostic incremental recompilation (i.e. output only depends on the current input, and past compilations only affect performance, not other behavior), function-level incremental linking (with one object file per function symbol) could still be employed at the very end, to generate a binary patch that redirects every function that grew in size, to additional segments added at the end of the file, without having to move any other function

3. propagating only effective "(query) output" changes

` - also called "firewalling" because it blocks irrelevant details early (instead of redoing all work transitively)

` - example 1: editing one line changes the source byte offsets of everything lower down in the file, but debuginfo doesn't need to be updated since it tracks lines not source byte offsets

` - example 2: adding/removing explicit types in a function should always cause type checking/inference to re-run, but nothing using those type inference results should re-run if the types haven't changed

4. extending all the way "down" (to machine code generation/object files/linking)

` - existing optimizing compilers may have a hard time with this because they didn't design their IRs/passes to be trackable in the first place (e.g. a LLVM call instruction literally contains a pointer to the function definition, with no context/module/pass-manager required to "peek" at the callee body)

` - can be theoretically be worked around with one object file per function, which https://gcc.gnu.org/wiki/IncrementalCompiler does mention (under "Code Generation" / "Long term the plan is ..."), but that alone isn't sufficient if you want optimizations (I've been meaning to write a rustc proposal about this, revolving around LLVM's ThinLTO having a more explicit split between "summary" and "full definition")

5. extending all the way "up" (to lexing/parsing/macros)

` - this is probably the least necessary in terms of reducing output delta, but it can still impede practical application if it becomes the dominant cost - AFAICT it's one of the factors that doomed the GCC incremental experiment ("I’m pretty much convinced now that incremental preprocessing is a necessity" - http://tromey.com/blog/?p=420)

` - outside of "rebuild the world", this also makes a compiler much more suitable for IDE usage (as e.g. a LSP server)

My experience is mostly with the Rust compiler, rustc, whose incremental support:

- started off as coarse skipping of generating/optimizing LLVM IR

- has since evolved to cover 1./2./3.

- while many passes in the "middle-end" were already both "local" and "on-demand", sometimes allowing incrementalization by changing only a dozen lines or so, 4. is indefinitely blocked by LLVM (at best we can try to work around it), and 5. (incremental "front-end") has been chipped at for years with several factors conspiring to make it orders of magnitude more difficult:

` - macro expansion and name resolution being intertwined (with the former mutating the AST in-place while the latter being a globally stateful algorithm)

` - "incremental front-end" used to be seen as essential for IDE usage and that notion started falling apart in 2018 or so (see rust-analyzer aka "RA" below, though RA itself is not itself directly responsible, nor do I hold it against them - it's a long messy story)

` - this work (without the IDE focus, AFAICT) has mostly driven by random volunteer work, last I checked (i.e. without much "official" organization, or funding, though I'm not sure what the latter would even be)

- its design has been reimagined into the "salsa" framework (https://github.com/salsa-rs/salsa - also linked above)

` - most notable user I'm aware of is the rust-analyzer (aka "RA") LSP, which hits 1./2./3./5. (4. isn't relevant, as RA stops at IDE-oriented analysis, no machine code output) - without getting too into the weeds, RA is a reimplementation of "Rust {front,middle}-end", and ideally eventually rustc should be able to also be just as good at 5. (see above why it's taking so long)

I am not aware of any other examples of industrial compilers having both 1. and 2. (or even academic ones but I suspect some exist for simple enough languages) - every time someone says "incremental compilation? X had that Y decades ago!", it tends to either be manually approximating what's safe to reuse (i.e. no 1.), or have file-level granularity (i.e. no 2. - the "more obviously correct" ones are close to the "separate compilation" of C/C++/many others, where a build system handles parallel and/or incremental rebuilds by invoking the compiler once per file), or both.

While 2./5. are enough to be impressive for IDE usage all by themselves, the interactive nature also serves to hide issues of correctness (lack of 1. may only show up as rare glitches that go away with more editing) or inefficiencies (lack of 3. leading to all semantic analyses being redone, may still be fast if only requested for the current function).

Another thing I haven't seen is distro build servers talking about keeping incremental caches around (and compiling incrementally in the first place), for builds of Rust "application" packages (most of them CLI tools like ripgrep, I would assume) - it's probably too early for the smaller stuff, but a cost/benefit analysis (i.e. storage space vs time saved rebuilding) would still be interesting.