Rust: Zero-Cost Abstraction in Action

[+] bdd|6 years ago|reply

These are not about abstractions but compile time optimizations. These are also not implemented in the Rust compiler but LLVM. So any any language frontend in front of LLVM would yield the same optimizations. According to Wikipedia the list is:

> [...] variety of front ends: languages with compilers that use LLVM include ActionScript, Ada, C#, Common Lisp, Crystal, CUDA, D, Delphi, Dylan, Fortran, Graphical G Programming Language,Halide, Haskell, Java bytecode, Julia, Kotlin, Lua, Objective-C, OpenGL Shading Language, Ruby, Rust, Scala, Swift, Xojo, and Zig.

Sometimes I think mention of Rust in the title just gets upvotes without even reading the article, here.

[+] devit|6 years ago|reply

Only those that emit LLVM code that can be optimized like this.

In poorly performing languages like Python and Ruby calling a method on a value (like sum() or even the += operator) essentially means looking up the method name in a hash table, and then doing a dynamic function call on the resulting function pointer, so this sort of optimizations cannot be done in an ahead-of-time compiler unless all objects are created locally and thus have known method dictionaries (and even then, it depends on whether they are available to the compiler in IR form, whether the lookup code is inlined and whether the code can actually be simplified).

[+] habitue|6 years ago|reply

Actually, so there's a very good chance that this particular optimization is happening due to MIRI, which allows very sophisticated constant evaluation of MIR (a rust intermediate representation)

https://rust-lang.github.io/rustc-guide/miri.html

So while, yes, in general pretty much all rust optimizations are actually due to LLVM, this one in particular might not be

[+] JoeCamel|6 years ago|reply

I think it's both about abstraction and compile time optimization. The point was that by default you can write code that uses higher level abstraction but compiles to the same code as an imperative-style loop. Most used languages or default implementations don't use LLVM, moreover they are interpreted or JIT compiled, so there might be different performance between e.g. for-loops vs. forEach with a closure.

[+] pjmlp|6 years ago|reply

Even worse, the lack of proper compiler design courses in many universities or "engineer" bootcamps, means that many attribute to language X, what is actually a compiler backend capability, like you quite well explain.

[+] thcz|6 years ago|reply

The C# compiler uses LLVM? Is that for something like Xamarin or something? I was under the impression that it was self-contained. Anyone knows the details of this?

[+] unknown|6 years ago|reply

[deleted]

[+] lasagnaphil|6 years ago|reply

This isn’t really talking about Rust, it’s actually talking about the optimization capabilities of LLVM (the compiler backend, which quite a lot of languages use, such as Clang for C++, Rust, Swift, Julia, Zig, ...) These languages all have similar chances of performing those same optimizations, at least in those simple cases.

What I’m interested is how the IL code for the compiler frontends for each of those languages are more well-optimizable for LLVM (in practical situations, not just a few lines of simple numerical code.) I’ve heard that you need to be careful about encoding IL code in the right way such that LLVM does not generate needless memcpy’s or something but I’m not that much of a compiler expert...

[+] kibwen|6 years ago|reply

rustc does indeed deliberately seek to emit LLVM IR that resembles what Clang would emit, in order to benefit from the same sorts of optimizations that default LLVM is tuned for.

[+] kbenson|6 years ago|reply

It's both, it's just complicated by the fact that LLVM is optimizing it to a known formula.

The zero cost abstractions are the fact that you get the same output, special optimization and all, when you do the original loop, or when you use the "(1..n).fold(0, |x,y| x + y)" variant, or "(1..n).sum()" variant. Rust is converting those to intermediate code that is the same, or close enough, that LLVM is able to apply the same optimization.

Would it be better is some sample code was chosen that wasn't special cased to the degree that the entire algorithm was replaced wholesale? Probably. It doesn't invalidate the premise though, just slightly obscures it.

[+] tiziano88|6 years ago|reply

There is nothing about zero const abstractions in this article, just basic compiler optimizations.

[+] kibwen|6 years ago|reply

The OP doesn't explicitly demonstrate any zero cost abstractions, however these "basic compiler optimizations" are only feasible to achieve statically (i.e. without a JIT) in the presence of sufficiently transparent abstractions, which takes a good deal of language design effort to enable.

[+] microtonal|6 years ago|reply

Iterators in Rust are (usually) zero-cost abstractions of loops. Bjarne Stroustrup's definition:

What you don’t use, you don’t pay for. And further: What you do use, you couldn’t hand code any better.

Rust iterators fit in the second part.

[+] Ericson2314|6 years ago|reply

Well, functions are abstractions too, but yes to me the interesting part of "zero const abstraction" is "zero cost data structure composition", and that indeed is not covered.

[+] SeekingMeaning|6 years ago|reply

For anyone interested in reading more about this, I would recommend Zero Cost Abstractions[1] by withoutboats, who is a contributor to Rust.

1: https://boats.gitlab.io/blog/post/zero-cost-abstractions/

[+] drej|6 years ago|reply

This is not Rust specific, this is a compiler thing, both LLVM and GCC can detect sums and generate closed form formulas instead. There are other fun algorithm detections - e.g. if you try to do bitcounts yourself, LLVM will use popcnt instead.

Compilers are awesome, check out Matt Godbolt's talk on this very topic: https://www.youtube.com/watch?v=nAbCKa0FzjQ

[+] univerio|6 years ago|reply

I assume it's doing `(N-2)(N-3)/2 + 2N - 3` instead of `N(N-1)/2` due to overflow concerns? But couldn't `(N-2)(N-3)` also possibly overflow, just supporting a larger range of `N`?

[+] devit|6 years ago|reply

In this assembly code it cannot overflow because N is a 32-bit integer and the multiplication gives a 64-bit result, which is converted to 32-bit only after shifting.

I can't figure out why it doesn't use the simpler formula (other than the optimizer being bad).

[+] Ericson2314|6 years ago|reply

The earlier transformations are actually more impressive. Constant folding is almost always has a good RoI, so lots of compilers do it, and it's simple because, well, it's a constant there is no variables or partial eval needed. The others require lots of inlining before the final rule fires, and so are more ambitious.

Seeing what

  pub fn sum3(n: i32) -> i32 {
     (1..n).sum() + (1..2*n).sum() + (1..(n + 2)).sum()
  }

does would be more interesting to me.

Also, while all the inline is rustc, I assume the "triangle number trick" is LLVM.

[+] SAI_Peregrinus|6 years ago|reply

Godbolt supports Rust, and can show the LLVM IR: https://godbolt.org/z/GsccW3

[+] ojosilva|6 years ago|reply

I wonder if the const syntax introduced in ES6 will ever result in const folding, or any JIT optimizations for code running in JS runtimes. Apparently, from what I've read, const is being ignored as far as optimiztions go and is only used as a way to prevent the developer from ever reassigning certain variables.

[+] pkilgore|6 years ago|reply

The ocaml compiler does constant folding too, and I consistently look at the (usually javascript because of Bucklescript) output in amazement when it finds shit like that. Unrolling all my tail recursion is great too.

[+] pjmlp|6 years ago|reply

For anyone that wants to learn about optimizations in AOT compiled ML languages, have a look at:

"The Implementation of Functional Programming Languages"

"Compiling with Continuations"

"Modern Compiler Implementation in ..." (C, Java and ML variants)

Although oriented towards Lisp, "Lisp in small pieces" is a classical book as well, with many optimizations like inlining of lambda calls across multiple call levels.

[+] incadenza|6 years ago|reply

Are compiler optimizations like summing a series done on an ad-hoc basis? Certainly the compiler couldn’t have discovered or inferred (not sure what term to use) that formula, no?

Just generally curious.

[+] exacube|6 years ago|reply

Ad-hoc basis. Usually comes out of studying the most common classes of computation and optimizing for it. For example, the "summing a series" probably falls out of loop optimizations: https://en.wikipedia.org/wiki/Loop_optimization

[+] kibwen|6 years ago|reply

Those optimizations are happening on LLVM's end (though the higher-level language will have to expose sufficiently transparent abstractions to make these optimizations possible). I'd love to read a book on the optimization techniques that are implemented in LLVM/GCC to make transformations like this possible.

[+] boomer_joe|6 years ago|reply

There is nothing impressive about this. https://godbolt.org/z/S2tDDh

[+] shmerl|6 years ago|reply

> he was very disappointed because Rust version was twice as fast than the C version which was hand-optimised by pulling off all the tricks he knew to make it perform well.

Why disappointed? It just highlights the quality of Rust's approach.

[+] ncmncm|6 years ago|reply

Evidently the tricks were what made it slow.

This happens all too often: once the code changes enough that the optimizer doesn't recognize the pattern anymore, it throws up its hands, and you're on your own. Some people call this optimizer roulette.

It's not just compilers, either. CPUs have their own peephole optimizers, and patterns they recognize, or don't, and it can easily make a 2x difference in your run time depending on if it cottons to what you're trying, or doesn't.

[+] yahyaheee|6 years ago|reply

This is neat but there is still cost in compile time

[+] JoeCamel|6 years ago|reply

Usually in Rust community zero cost means zero runtime cost. Obviously, there are many other costs you could define.

44 comments