(no title)
dosshell | 1 year ago
When talking about not assuming optimizations...
32bit float is slower than 64bit float on reasonable modern x86-64.
The reason is that 32bit float is emulated by using 64bit.
Of course if you have several floats you need to optimize against cache.
jcranmer|1 year ago
x86-64 requires the hardware to support SSE2, which has native single-precision and double-precision instructions for floating-point (e.g., scalar multiply is MULSS and MULSD, respectively). Both the single precision and the double precision instructions will take the same time, except for DIVSS/DIVSD, where the 32-bit float version is slightly faster (about 2 cycles latency faster, and reciprocal throughput of 3 versus 5 per Agner's tables).
You might be thinking of x87 floating-point units, where all arithmetic is done internally using 80-bit floating-point types. But all x86 chips in like the last 20 years have had SSE units--which are faster anyways. Even in the days when it was the major floating-point units, it wasn't any slower, since all floating-point operations took the same time independent of format. It might be slower if you insisted that code compilation strictly follow IEEE 754 rules, but the solution everybody did was to not do that and that's why things like Java's strictfp or C's FLT_EVAL_METHOD were born. Even in that case, however, 32-bit floats would likely be faster than 64-bit for the simple fact that 32-bit floats can safely be emulated in 80-bit without fear of double rounding but 64-bit floats cannot.
dosshell|1 year ago
tombert|1 year ago
sgerenser|1 year ago
dosshell|1 year ago
SIMD/MIMD will benefit of working on smaller width. This is not only true because they do more work per clock but because memory is slow. Super slow compared to the cpu. Optimization is alot about cache misses optimization.
(But remember that the cache line is 64 bytes, so reading a single value smaller than that will take the same time. So it does not matter in theory when comparing one f32 against one f64)