I don't think this is correct. The difference between well optimized code and unoptimized code on the CPU is frequently at least an order of magnitude performance.
Reason it doesn't seem that way is that the CPU is so fast we often bottleneck on I/O first. However, for compute-workloads like inference, it really does matter.
While this is true, the most effective optimizations you don't do yourself. The compiler or runtime does it. They get the low-hanging fruit. You can further optimize yourself, but unless your design is fundamentally bad, you're gonna be micro-optimizing.
gcc -O0 and -O2 has a HUGE performance gain. We don't really have anything to auto-magically do this for models, yet. Compilers are intimately familiar with x86.
marginalia_nu|1 year ago
Reason it doesn't seem that way is that the CPU is so fast we often bottleneck on I/O first. However, for compute-workloads like inference, it really does matter.
consteval|1 year ago
gcc -O0 and -O2 has a HUGE performance gain. We don't really have anything to auto-magically do this for models, yet. Compilers are intimately familiar with x86.