top | item 46611521

(no title)

jhj | 1 month ago

These flops are not the same. The 2013 phone flops are fp32, the A13 flops look to be fp32 as well (not entirely sure), while the Cray numbers (like the rest of the HPC industry) are fp64 (Cray 1 predates what would become IEEE 754 binary64 though, so not same exact arithmetic but similar in dynamic range and precision).

A modern Nvidia GB200 only does about 40 tflop/s in fp64 for instance. You can emulate higher precision/dynamic range arithmetic with multiple passes and manipulations of lower precision/dynamic range arithmetic but without an insane number of instructions it won't meet all the IEEE 754 guarantees for instance.

Certainly if Nvidia wanted to dedicate much more chip area to fp64 they could get a lot higher, but fp64 FMA units alone would be likely >30 times larger than their fp16 cousins and probably 100s of times larger than fp4 versions.

discuss

No comments yet.