top | item 37714020

(no title)

grokys | 2 years ago

Only tangentially related to the post, but I don't see it mentioned there: what do people use to run benchmarks on CI? If I understand correctly, standard OSS GH Actions/Azure Pipelines runners aren't going to be uniform enough to provide useful benchmark results. What does the rust project use? What do other projects use?

discuss

capableweb|2 years ago

> what do people use to run benchmarks on CI?

Typically, you purchase/rent a server that does nothing but sequentially run queued benchmarks (and the size/performance of this server doesn't really matter, as long as the performance is consistent), then sends the report somewhere for hosting and processing. Of course, this could be triggered by something running in CI, and the CI job could wait for the results, if benchmarking is an important part of your workflow. Or if your CI setup allows it, you tag one of the nodes as a "benchmarking" node which only run jobs tagged as "benchmark", but I don't think a lot of the hosted setups allow this, mostly seen this in self-hosted CI setups.

But CI and benchmarks really shouldn't be run on the same host.

> What does the rust project use?

It's not clear exactly where the Rust benchmark "perf-runner" is hosted, but here are the specifications of the machine at least: https://github.com/rust-lang/rustc-perf/blob/414230abc695bd7...

> What do other projects use?

Essentially what I described above, a dedicated machine that runs benchmarks. The Rust project seems to do it via GitHub comments (as I understand https://github.com/rust-lang/rustc-perf/tree/master/collecto...), others have API servers that respond to HTTP requests done from CI/chat, others have remote GUIs that triggers the runs. I don't think there is a single solution that everyone/most are using.

weinzierl|2 years ago

Do I really need dedicated hardware? How bad is a VPS? I mean it makes sense but has anyone measure how big the variance is on a VPS?

JoshTriplett|2 years ago

Rust uses a dedicated consistent server that runs exclusively benchmark loads, so that nothing else is interfering with the benchmark results.

shepmaster|2 years ago

A solution is mentioned in the article, but perhaps obliquely:

> while I also wanted to measure hardware counters

As I understand it, hardware counters would remain consistent in the face of the normal noisy CI runner.

The article talks about using Cachegrind (via the iai crate) and Linux perf events.

I use iai in one of my projects to run performance diffs for each commit.

the8472|2 years ago

> As I understand it, hardware counters would remain consistent in the face of the normal noisy CI runner.

With cloud CI runners you'd still have issues with hardware differences, e.g. different CPUs counting slightly differently. even memcpy behavior is hardware-dependent! And if you're measuring multi-threaded programs then concurrent algorithms may be sensitive to timing. Also microcode updates for the latest CPU vulnerabilities. And that's just instruction counts. Other metrics such as cycle counts, cache misses or wall-time are far more sensitive.

To make sure we're not slowly accumulating <1% regressions hidden in the noise and to be able to attribute regressions to a specific commit we need really low noise levels.

So for reliable, comparable benchmarks dedicated is needed.

capableweb|2 years ago

The thing is that things like Cachegrind are supposed to be used as complements to time-based profilers, not to replace them.

If you're getting +-20% different for each time based benchmark, it might just be noisy neighbors but could also be some other problem that actually manifests for users too.

IshKebab|2 years ago

I've looked into this before and there are very few tools for this. The only vaguely generic one I've found is Codespeed: https://github.com/tobami/codespeed

However it's not very good. Seems like most people just write their own custom performance monitoring tooling.

As for how you actually run it, you can get fairly low noise runtimes by running on a dedicated machine on Linux. You have to do some tricks like pinning your program to dedicated CPU cores and making sure nothing else can run on them. You can get under 1% variance that way, but in general I found you can't really get low enough variance on wall time to be useful in most cases, so instruction count is a better metric.

I think you could do better than instruction count though but it would be a research project - take all the low noise performance metrics you can measure (instruction count, branch misses etc), measure a load of wall times for different programs and different systems (core count, RAM size etc.). Feed it into some kind of ML system and that should give you a decent model to get a low noise wall time estimate.

Good tips here:

https://llvm.org/docs/Benchmarking.html

https://easyperf.net/blog/2019/08/02/Perf-measurement-enviro...

vlovich123|2 years ago

Surely it’s possible to build some benchmark to demonstrate the difference right? Otherwise, what’s the point of making that improvement in the first place?

I think what you’re saying though is that having benchmarks/micro benchmarks that are cheap to run is valuable and in those instruction counts may be the only way to measure a 5% improvement (you’d have to run the test for a whole lot longer to prove that a 5% instruction count improvement is a real 1% wall clock improvement and not just noise). Even criterion gets real iffy about small improvements and it tries to build a statistical model.

GolDDranks|2 years ago

On my workplace we use self-hosted GitLab and GitLab CI. The CI allows you to allocate dedicated server instances to specific CI tasks. We run a e2e test battery on CI, and it's quite resource heavy compared to normal tests, so we have some dedicated instances for this. I'd imagine the same strategy would work for benchmarks, but I'm not sure whether cloud instances fit the bill. I think that the CI also allows you to bring your own hardware although I don't have experience taking it that far.

capableweb|2 years ago

> I'd imagine the same strategy would work for benchmarks, but I'm not sure whether cloud instances fit the bill. I think that the CI also allows you to bring your own hardware although I don't have experience taking it that far.

Typically you use the solution between cloud hosted VPS and your own hardware, dedicated servers :)

unknown|2 years ago

[deleted]