Why do execution times drop so drastically with increasing number of iterations? Shouldn’t the caches be filled after one iteration already? There is no JIT in C++, or is it?
I only had a quick look at the code, but it looks like it's timing memory allocation. For example the sprintf part uses std::string str(100, '\0'). I'm not a C++ expert, but I believe this is essentially doing a malloc and memset of 100 bytes for every call to sprintf. So this is probably a poorly setup benchmark.
Your CPU is effectively a virtual machine with stuff like branch prediction, speculative execution w/rollback, pipelining, implicit parallelism, etc. etc.
Of course, it isn't able to do quite as much as a VM running in software (because fixed buffers for everything, etc.), but even so...
This question doesn't make sense for the context*. C++ is Ahead of Time, by design; there is nothing to "just in time" compile.
JIT (as a concept) only makes sense if you are, in some way, abstracting your code from native machine code (usually via some sort of VM, like Python or Java's), which the "system" languages (C, Rust, Zig, C++, etc) do not.
What I think you are trying to reference are "runtime optimizations"; in which case, the answer is probably no. Base and STD C++ are pretty conservative about what they put into the runtime. Extended runtimes like Windows' and glibc might do some conditional optimizations, however.
* Yes, some contrarian is going to point out a project like Cling or C++/CLI. This is why I'm being very clear about "context".
> C++ is Ahead of Time, by design; there is nothing to "just in time" compile.
Can I talk to you about our Lord and Savior the CPU trace cache[1]?
That is to say, I know next to nothing about how modern CPUs are actually designed and hardly more about JITs, but a modern CPU’s frontend with a microop cache sure looks JITy to me. The trace cache on NetBurst looks even more classically JITy, but by itself it was a miserable failure, so meh.
In any event, a printf invocation seems like it should be too large for the cache to come into play;—on the other hand, all the predictors learning stuff over the iterations might make a meaningful impact?
Seems to me like that learning, if present, would make the benchmark less interesting, not more, as an actual prospective application of string formatting seems unlikely to go through formatting the same (kind of) thing and nothing else in a tight loop.
Could be dynamic frequency scaling. To minimize the impact of it when benchmarking one can pin the process to a single core and warm it up before running the benchmark.
xcvb|2 years ago
lysium|2 years ago
Quekid5|2 years ago
Of course, it isn't able to do quite as much as a VM running in software (because fixed buffers for everything, etc.), but even so...
unknown|2 years ago
[deleted]
deaddodo|2 years ago
This question doesn't make sense for the context*. C++ is Ahead of Time, by design; there is nothing to "just in time" compile.
JIT (as a concept) only makes sense if you are, in some way, abstracting your code from native machine code (usually via some sort of VM, like Python or Java's), which the "system" languages (C, Rust, Zig, C++, etc) do not.
What I think you are trying to reference are "runtime optimizations"; in which case, the answer is probably no. Base and STD C++ are pretty conservative about what they put into the runtime. Extended runtimes like Windows' and glibc might do some conditional optimizations, however.
* Yes, some contrarian is going to point out a project like Cling or C++/CLI. This is why I'm being very clear about "context".
mananaysiempre|2 years ago
Can I talk to you about our Lord and Savior the CPU trace cache[1]?
That is to say, I know next to nothing about how modern CPUs are actually designed and hardly more about JITs, but a modern CPU’s frontend with a microop cache sure looks JITy to me. The trace cache on NetBurst looks even more classically JITy, but by itself it was a miserable failure, so meh.
In any event, a printf invocation seems like it should be too large for the cache to come into play;—on the other hand, all the predictors learning stuff over the iterations might make a meaningful impact?
Seems to me like that learning, if present, would make the benchmark less interesting, not more, as an actual prospective application of string formatting seems unlikely to go through formatting the same (kind of) thing and nothing else in a tight loop.
[1] https://chipsandcheese.com/2022/06/17/intels-netburst-failur...
ergeysay|2 years ago
IshKebab|2 years ago
mike_hock|2 years ago