top | item 43322110

(no title)

nelhage | 11 months ago

(author here)

> The problem is 95% about laying out the instruction dispatching code for the branch predictor to work optimally.

A fun fact I learned while writing this post is that that's no longer true! Modern branch predictors can pretty much accurately predict through a single indirect jump, if the run is long enough and the interpreted code itself has stable behavior!

Here's a paper that studied this (for both real hardware and a certain simulated branch predictor): https://inria.hal.science/hal-01100647/document

My experiments on this project anecdotally agree; they didn't make it into the post but I also explored a few of the interpreters through hardware CPU counters and `perf stat`, and branch misprediction never showed up as a dominant factor.

discuss

vkazanov|11 months ago

Yes, this was already becoming true around the time I was writing the linked article. And I also read the paper. :-) I also remember I had access to a pre-Haswell era Intel CPUs vs something a bit more recent, and could see that the more complicated dispatcher no longer made as much sense.

Conclusion: the rise of popular interpreter-based languages lead to CPUs with smarter branch predictors.

What's interesting is that a token threaded interpreter dominated my benchmark (https://github.com/vkazanov/bytecode-interpreters-post/blob/...).

This trick is meant to simplify dispatching logic and also spread branches in the code a bit.

celeritascelery|11 months ago

How do you reconcile that with the observation that moving to a computed goto style provides better codegen in zig[1]? They make the claim that using their “labeled switch” (which is essentially computed goto) allows you to have multiple branches which improves branch predictor performance. They even get a 13% speedup in their parser from switch to this style. If modern CPU’s are good at predicting through a single branch, I wouldn’t expect this feature to make any difference.

[1] https://ziglang.org/download/0.14.0/release-notes.html#Code-...

dwattttt|11 months ago

While it's unlikely as neat as this, the blog post we're all commenting on is a "I thought we had a 10-15% speedup, but it turned out to be an LLVM optimisation misbehaving". And Zig (for now) uses LLVM for optimised builds too