top | item 45854075

(no title)

At what quantization? And if it is in fact quantized below fp8, how is the performance impacted on all the various benchmarks?

discuss

antonvs|3 months ago

They claim they don't use quantization.

The reason for their speed is this chip: https://www.cerebras.ai/chip