top | item 42552566

(no title)

Unrelated, do FPUs on modern CPUs use FMAs to both multiply and add or do they use mul/add-only units?

discuss

I don't think there is a generally optimal design. There are cons and pros to using the same homogeneous FMAs units for adds, multiplies and fmas, even at the cost of making adds slower (simpler design, and having all instructions of the same latency greatly simplifies scheduling). IIRC intel cycled through 4 cycles fma, add and mul, then to 4 cycles add and mul and 5 cycles fmas, then with a dedicated 3 cycles add.

The optimal design depends a lot on the rest of the microarchitecture, the loads the core is being optimized for, the target frequency, the memory latency, etc.

bonzini|1 year ago

Probably to do multiplies, as the extra add is basically free. Adds are cheaper.

thesz|1 year ago

Adds are cheaper only for fixed-point computations. Floating point addition needs to denormalize one of its' arguments, perform an (integer) addition and then normalize the result.

Usually FP adds take a cycle or two longer than FP multiplication.