Sccache, Mozilla’s distributed compiler cache, now written in Rust

[+] zeotroph|9 years ago|reply

Running this without amazon seems to require replacing the hardcoded "{}://{}.s3{}.amazonaws.com" url with the location where the s3 compatible service is running on the local network. Now if that was just a default and could be overwritten with an env var as well... :)

And minio [1] seems to be the easiest pseudo-s3 there is ($ ./minio server CacheDir/, done?), or are there better alternatives by now?

1: https://github.com/minio/minio

[+] tedmielczarek|9 years ago|reply

I'd take a patch to allow overriding the URL! I actually could use that for testing the S3 storage.

The S3 code went through a few revisions. I was originally using Rusoto (https://github.com/rusoto/rusoto), which is nice but just didn't quite meet my needs, so then I borrowed some code from the crates.io codebase and then rewrote most of it.

You can also run it using a local disk cache, similar to ccache, but it doesn't have any code to limit the size of the cache so it's not very good right now. (It's used in all the tests, though.) Fixing that specific issue is next on my plate.

[+] buckhx|9 years ago|reply

Rust's format macro only takes string literals as a template, so you can't provide a runtime string template. The solution seems to be adding an external templating lib, which seems like overkill for a single template.

[+] actuallyalys|9 years ago|reply

Congrats to Mozilla for not only creating a programming language research project that engaged the community, but also growing it into a successful language that's useful and robust for real-world projects.

Mozilla sometimes gets flak for its experiments (or abandoning them), sometimes deservedly, but by doing so many of them and not being afraid to cancel them, they occasionally get big wins like Rust.

[+] jtille|9 years ago|reply

I've been using distcc for quicker distributed builds without issue for many years. What extra features does sccache bring to the table other than it being rewritten in Rust?

[+] tedmielczarek|9 years ago|reply

distcc is great, and as I mentioned in that Reddit comment some of my colleagues are using icecream to great success.

I won't repeat my entire comment from Reddit, but one other notable point is that tools like ccache/distcc don't generally support MSVC, and we build Firefox for Windows with MSVC, so that's pretty important to us. And frankly, our Windows builds are slow enough that we can use all the build time wins we can get.

[+] Sean1708|9 years ago|reply

This comment[1] goes into some of the reasoning.

[1]: https://www.reddit.com/r/rust/comments/5e654a/sccache_mozill...

[+] tedmielczarek|9 years ago|reply

Just as a neat data point, we hadn't been using sccache for Firefox builds in our new CI infrastructure (Taskcluster) for various reasons. After rolling out the sccache rewrite I fixed that, and build times in that environment dropped by 32-38%: https://treeherder.mozilla.org/perf.html#/alerts?id=4335

[+] jononor|9 years ago|reply

No tests? :(

EDIT: my bad, they were inside src/

[+] cesarb|9 years ago|reply

In Rust, it's very common to put tests right next to the code; all functions tagged with #[test] will be run as tests, and examples within the function documentation comments will also be run as tests by default.

For instance, randomly looking at the first file in that repository, cache/cache.rs, you can see at the end a submodule tagged with #[cfg(test)] (meaning it will be compiled only when running the tests), and within it several functions tagged with #[test].

[+] tedmielczarek|9 years ago|reply

It does have a bunch of tests, although probably not enough. I've been meaning to add code coverage stats to its CI, although I've only used kcov for Rust code coverage so far, and that only works on Linux it wouldn't reflect reality very well. I've done a lot of work on test automation at Mozilla over the years so automated testing is pretty important to me. :)

I've tried to write unit tests where they made sense, they're generally placed within the relevant source file in the tree, like the cache tests here: https://github.com/mozilla/sccache/blob/6eadccc9d6752747766d...

I also wrote some higher-level tests that test the server functionality: https://github.com/mozilla/sccache/blob/master/src/test/test...

I rolled my own set of traits and structs to mock process execution for those tests so they would run independent of the compilers installed on the system: https://github.com/mozilla/sccache/blob/master/src/mock_comm...

I actually like how that code turned out, I've thought about cleaning it up and publishing it as a standalone crate. Rust doesn't have a real test mocking solution yet, and even if it did this is a special case since the standard library's process execution types don't implement a trait that could be mocked.

As development went on I realized I didn't have automated tests that tested the whole program, especially running against real compilers (which is important given that the tool is a compiler wrapper), so I wrote some "system" tests that run the actual binary with an actual compiler and local disk cache and verify that it works as expected: https://github.com/mozilla/sccache/blob/master/src/test/syst...

[+] luckydude|9 years ago|reply

I didn't understand his comment on Rust's match expression. Other than it returning a value (and insisting a default clause) it's just like C's switch, right?

[+] Tarean|9 years ago|reply

It is much more like haskell's pattern matching. Notably it handles tagged unions (in rust enum). Here is a good example from the [book](https://doc.rust-lang.org/book/match.html):

    enum Message {
        Quit,
        ChangeColor(i32, i32, i32),
        Move { x: i32, y: i32 },
        Write(String),
    }

    fn quit() { /* ... */ }
    fn change_color(r: i32, g: i32, b: i32) { /* ... */ }
    fn move_cursor(x: i32, y: i32) { /* ... */ }

    fn process_message(msg: Message) {
        match msg {
            Message::Quit => quit(),
            Message::ChangeColor(r, g, b) => change_color(r, g, b),
            Message::Move { x: x, y: y } => move_cursor(x, y),
            Message::Write(s) => println!("{}", s),
        };
    }

[+] masklinn|9 years ago|reply

> and insisting a default clause

It does not insist on a default clause it requires match completeness, which depending on the matched value may require a default clause (e.g. it does for integrals[0], it does not for enums as you can match each case individually)

> it's just like C's switch, right?

Only in its most simplistic form (though even then it does not ever fall through — whether by default or optionally — is type-safe and requires match-completeness), match performs refutable pattern matching on possibly complex values and allows additional per-match conditionals.

    match foo {
        // complex destructuring and multiple patterns for a case
        Some((42, _)) | Some((55, _)) => { println!("1") }
        // simple destructuring + conditional
        Some(a) if a.0 % 5 == 0 => { println!("2") }
        // matching + wildcard
        Some(..) => { println!("3") }
        // trivial matching
        None => { println!("4") }
    }

or

    match (i % 3, i % 5) {
        (0, 0) => {}
        (0, _) => {}
        (_, 0) => {}
        _ => {}
    }

[0] because the completeness checking doesn't really handle non-enum values currently

[+] stormbrew|9 years ago|reply

All the other answers are correct, in so far as they go, but they kind of fail to explain the bigger picture imo. A match statement like rust has is part of a larger concept where types are an integral part of control flow.

C/C++ switch makes no assertion about what is or isn't valid in its branches. That is, you might be switching on a tag field of a union or something like that, and then in one (or more) of the switch branches you may act on that information. But there is no compiler constraint on the correctness of that decision.

Pattern matching insists that the code inside a particular branch matches the type expectations asserted in its case clause. If you're in the branch for enum-type Blah, you can only act on Blah and not on Blorp. The compiler will force this on you.

To put this in practical terms, one area I have found this incredibly valuable (in Swift, but it applies here too) is in state machines. If you represent your state machine as an enum/union with fields for the information any particular state needs, every iteration through the machine you can be sure you are acting on the correct information. The compiler won't let you do otherwise.

[+] kibwen|9 years ago|reply

Differences from C's switch:

1. No fallthrough-by-default.

2. Checked for exhaustiveness. This is different from "insisting on a default clause", because if you see a match with no default clause, then you know that the compiler is verifying that every possible case is being handled.

3. The cases of the match can be arbitrary patterns, not just integers. This allows you to perform very natural and powerful conditional control flow, especially when using tagged unions.

4. Can return a value.

Finally, remember the context of this post: Python doesn't have a C-style switch statement at all. :P

[+] fjh|9 years ago|reply

match also lets you destructure values, which is useful when you're dealing with enums. It also doesn't have C's fall-through behaviour.

Edit: match doesn't insist on a default clause, it enforces exhaustiveness. Which can be achieved by having a default clause, but quite often you just have an explicit branch for every possible case.

[+] contras1970|9 years ago|reply

it's just like C's switch, right?

not at all, absolutely not. consider:

    switch (count % 8) {
    case 0: do { *to = *from++;
    case 7:      *to = *from++;
    case 6:      *to = *from++;
    case 5:      *to = *from++;
    case 4:      *to = *from++;
    case 3:      *to = *from++;
    case 2:      *to = *from++;
    case 1:      *to = *from++;
            } while (--n > 0);
    }

https://en.wikipedia.org/wiki/Duff's_device

[+] stonemetal|9 years ago|reply

You can also have guards in rust's match expression. They must be bool a typed expression but other than that they can be as complex as necessary.

[+] tedmielczarek|9 years ago|reply

I think other commenters have explained it pretty well, but suffice to say that if you get comfortable writing Rust you grow to really like using match, and most languages don't have an equivalent. :)

[+] unknown|9 years ago|reply

[deleted]

[+] throwaway40483|9 years ago|reply

When I see these rewrites, I'm always left wondering about the speedup achieved. How much was due to:

1) Going from programming language X to Y 2) Rewriting the algorithm in language Y

I suspect (without any proof) that too much is attributed to 1) and not enough to 2).

[+] kibwen|9 years ago|reply

I'm not sure what this comment is implying. The author isn't complaining about Python's performance, rather, it's noting that concurrency in Python isn't as painless as it is in Rust (which isn't a controversial statement, Rust is explicitly designed for robust concurrency). It also isn't controversial that Rust code should end up faster than Python, considering that Rust is designed to prioritize runtime performance. This isn't a case of comparing the performance of two dynamic languages (as we so often had with all the "I switched to Ruby" or "I switched to Node" posts in prior years); nobody is going to hold up this blog post as proof that Rust is generally faster than Python, because nobody in the world argues otherwise (and I say this as a prolific Python user, not just as a Rust user).

[+] sangnoir|9 years ago|reply

3) Removing "unnecessary cruft" during the rewrite, measuring the speedup and then gradually adding the "cruft" (features) back one by one as previously unknown boundary conditions are encountered.

Restart the process again after a few years of years, as is tradition.

[+] chriswarbo|9 years ago|reply

The article specifically mentions that the program gets invoked over and over again, e.g. by "configure" scripts checking for compiler features.

I don't think anyone would argue that interpreters like CPython have longer startup times than compiled machine code.

[+] tedmielczarek|9 years ago|reply

For the record, I wasn't actually looking for any speedups with this rewrite. Most of it is a straight port of the Python code. The build time speedups were just a nice bonus. (I was measuring just to make sure I didn't regress build times.)

[+] Ericson2314|9 years ago|reply

Parts of Mozilla also use Nix so...

[+] gmfawcett|9 years ago|reply

...so what, in particular? Don't leave us hanging!

[+] bpicolo|9 years ago|reply

That's some tiny font.

[+] nilved|9 years ago|reply

It's using one of Phu Ly's WordPress themes from 10 years ago so that may be why. I'd recognize his themes anywhere. Back then 8pt was standard and flexible units were just catching on. I feel old.

[+] Matthias247|9 years ago|reply

Yep, nearly to impossible to read without a big zoom. The culprit: font-size: 62.5%;

[+] tedmielczarek|9 years ago|reply

Sorry, I don't remember if I've ever actually configured the Wordpress settings on that blog.

[+] kibwen|9 years ago|reply

A good time to try out your browser's Reader Mode. :P

61 comments