top | item 32648235

(no title)

tehmillhouse | 3 years ago

As a compiler guy, I'd appreciate some look at the layers of abstraction in-between (so, ASM). Microbenchmarks are famously jittery on modern CPUs due to cache effects, branch prediction, process eviction, data dependencies, pipeline stalls, OoO execution, instruction level parallelism, etc.

You have to be really careful to ensure the numbers you're getting are really coming from the thing you're trying to benchmark, and aren't corrupted beyond recognition by your benchmarking harness.

Some of these questions would surely be trivial if I actually knew any Go, but I'm left wondering:

* What does the machine code / assembly look like for this? What does the cast compile down to?

* What's `int` an alias for? I assume 64-bit-signed-integer?

* Are integer casts checked in go? Would an overflowing cast fault?

discuss

order

titzer|3 years ago

All are good points. Also, loop unrolling and induction variable analysis. LLVM is particularly aggressive at trying to optimize inductions. It will literally turn a "sum of i from 0 to N" into "n(n+1)/2", among others.

It's really important to look at the actual machine code.

zasdffaa|3 years ago

An aside.

> It will literally turn a "sum of i from 0 to N" into "n(n+1)/2", among others[1]

Yeah, seen that on a godbolt youtube vid. Question is, should it do this? Or should it force you to use a library, by reporting what you're trying to do and telling you there's an easier way ( "sum of 1 to n is <formula>, instead of a loop use library function 'sum1toN()" )

I think getting too clever risks hurting the user by not letting them know there's a better way.

[1] actually it seems to do a slightly different version of this to prevent risk of overflow, but same result.

silvestrov|3 years ago

micro benchmarks are especially problematic in real-world code where you load stuff from random addresses in memory.

If the code after the cast is blocked on a memory load, then you have a lot of free instructions while the cpu is waiting for the memory load to complete. In this case it doesn't matter if the cast is free or takes a handfull of instructions.

Sometimes code becomes faster by using more instructions to make the data more compact so more of the data stays in the caches.

Cthulhu_|3 years ago

Microbenchmarks are only valid for the one function under test, and only on the current machine; they're all right for optimizing on particular hardware, but not so much to go out into the world and go "X is faster than Y"

That said, I did like this website where you could set up JS benchmarks, they would run on your own machine and you could compare how it ran on other people's systems. It wasn't perfect, but it gave a decent indication if X was faster than Y. Of course, it's a snapshot in time, JS engines have gone through tons of optimizations over the years.

anonymoushn|3 years ago

int is defined to be the pointer width (or something like this), so probably int64 where the OP is running their code.

integer casts are unchecked.

stargrazer|3 years ago

and is it aligned properly