(no title)
SuchAnonMuchWow | 1 year ago
On fp8, the estimated gate count of fp8 multipliers is 296 vs. 157 with their technique, so the power gain on the multipliers will be much lower (50% would be a more reasonable estimation), but again for fp8 the additions in the dot products are a large part of the operations.
Overall, its really disingenuous to claim 80% power gain and small drop in accuracy, when the power gain is only for fp32 operations and the small drop in accuracy is only for fp8 operators. They don't analyze the accuracy drop in fp32, and don't present the power saved for fp8 dot product.
bobsyourbuncle|1 year ago
reissbaker|1 year ago
Every major inference provider other than Hyperbolic Labs runs Llama 3.1 405b at FP8, FWIW (e.g. Together, Fireworks, Lepton), so to compare against FP32 is misleading to say the least. Even Hyperbolic runs it at BF16.
Pretraining is typically done in FP32, although some labs (e.g. Character AI, RIP) apparently train in INT8: https://research.character.ai/optimizing-inference/
unknown|1 year ago
[deleted]
ericlewis|1 year ago
unknown|1 year ago
[deleted]