(no title)
nelhage | 11 months ago
> The problem is 95% about laying out the instruction dispatching code for the branch predictor to work optimally.
A fun fact I learned while writing this post is that that's no longer true! Modern branch predictors can pretty much accurately predict through a single indirect jump, if the run is long enough and the interpreted code itself has stable behavior!
Here's a paper that studied this (for both real hardware and a certain simulated branch predictor): https://inria.hal.science/hal-01100647/document
My experiments on this project anecdotally agree; they didn't make it into the post but I also explored a few of the interpreters through hardware CPU counters and `perf stat`, and branch misprediction never showed up as a dominant factor.
vkazanov|11 months ago
Conclusion: the rise of popular interpreter-based languages lead to CPUs with smarter branch predictors.
What's interesting is that a token threaded interpreter dominated my benchmark (https://github.com/vkazanov/bytecode-interpreters-post/blob/...).
This trick is meant to simplify dispatching logic and also spread branches in the code a bit.
celeritascelery|11 months ago
[1] https://ziglang.org/download/0.14.0/release-notes.html#Code-...
dwattttt|11 months ago