Some of this can be reduced to a trivial form, which is to say practiced in reality on a reasonable scale, by getting your hands on a microcontroller. Not RTOS or Linux or any of that, but just a microcontroller without an OS, and learning it and learning its internal fetching architecture and getting comfortable with timings, and seeing how the latency numbers go up when you introduce external memory such as SD Cards and the like. Knowing to read the assembly printout and see how the instruction cycles add up in the pipeline is also good, because at least you know what is happening. It will then make it much easier to apply the same careful mentality to this which is ultimately what this whole optimization game is about - optimizing where time is spent with what data. Otherwise, someone telling you so-and-so takes nanoseconds or microseconds will be alien to you because you wouldn’t normally be exposed to an environment where you regularly count in clock cycles. So consider this a learning opportunity.
simonask|2 months ago
A lot of code can be pessimized by golfing instruction counts, hurting instruction-level parallelism and microcode optimizations by introducing false data dependencies.
Compilers outperform humans here almost all the time.
Pannoniae|2 months ago
However, that doesn't mean that looking at the generated asm / even writing some is useless! Just because you can't globally outperform the compiler, doesn't mean you can't do it locally! If you know where the bottleneck is, and make those few functions great, that's a force multiplier for you and your program.
jesse__|2 months ago
I'm going to be annoying and nerd-snipe you here. It's, generally, really easy to beat the compiler.
https://scallywag.software/vim/blog/simd-perlin-noise-i
dardeaup|2 months ago
Can you explain what this phrase means?
barfoure|2 months ago
Compilers make mistakes too and they can output very erroneous code. But that’s a different topic.