top | item 37538616

(no title)

lysium | 2 years ago

Why do execution times drop so drastically with increasing number of iterations? Shouldn’t the caches be filled after one iteration already? There is no JIT in C++, or is it?

discuss

order

xcvb|2 years ago

I only had a quick look at the code, but it looks like it's timing memory allocation. For example the sprintf part uses std::string str(100, '\0'). I'm not a C++ expert, but I believe this is essentially doing a malloc and memset of 100 bytes for every call to sprintf. So this is probably a poorly setup benchmark.

lysium|2 years ago

I think that might be it. Too bad, the results of this kind of benchmark would have been interesting.

Quekid5|2 years ago

Your CPU is effectively a virtual machine with stuff like branch prediction, speculative execution w/rollback, pipelining, implicit parallelism, etc. etc.

Of course, it isn't able to do quite as much as a VM running in software (because fixed buffers for everything, etc.), but even so...

deaddodo|2 years ago

> There is no JIT in C++, or is it?

This question doesn't make sense for the context*. C++ is Ahead of Time, by design; there is nothing to "just in time" compile.

JIT (as a concept) only makes sense if you are, in some way, abstracting your code from native machine code (usually via some sort of VM, like Python or Java's), which the "system" languages (C, Rust, Zig, C++, etc) do not.

What I think you are trying to reference are "runtime optimizations"; in which case, the answer is probably no. Base and STD C++ are pretty conservative about what they put into the runtime. Extended runtimes like Windows' and glibc might do some conditional optimizations, however.

* Yes, some contrarian is going to point out a project like Cling or C++/CLI. This is why I'm being very clear about "context".

mananaysiempre|2 years ago

> C++ is Ahead of Time, by design; there is nothing to "just in time" compile.

Can I talk to you about our Lord and Savior the CPU trace cache[1]?

That is to say, I know next to nothing about how modern CPUs are actually designed and hardly more about JITs, but a modern CPU’s frontend with a microop cache sure looks JITy to me. The trace cache on NetBurst looks even more classically JITy, but by itself it was a miserable failure, so meh.

In any event, a printf invocation seems like it should be too large for the cache to come into play;—on the other hand, all the predictors learning stuff over the iterations might make a meaningful impact?

Seems to me like that learning, if present, would make the benchmark less interesting, not more, as an actual prospective application of string formatting seems unlikely to go through formatting the same (kind of) thing and nothing else in a tight loop.

[1] https://chipsandcheese.com/2022/06/17/intels-netburst-failur...

ergeysay|2 years ago

Could be dynamic frequency scaling. To minimize the impact of it when benchmarking one can pin the process to a single core and warm it up before running the benchmark.

IshKebab|2 years ago

Branch predictor maybe.

mike_hock|2 years ago

That was my guess. Training the branch predictor on all those virtual calls.