TF32 is not IEEE-754 float32, it is a reduced precision format designed for machine learning usecases. The correct specsheet number for FP32 (and FP64 which is the relevant precision here) throughput on H100 is more like 60TFLOP/s, so your number is off by roughly an order of magnitude.
No comments yet.