(no title)
frontierkodiak | 1 year ago
Typically this might mean that you simulate an 8bit forward pass to ensure that the model is robust to quantization ‘noise’. You still use FP16/32 for backward pass & weight updates for numerical stability.
It’s just a way to optimize the model in anticipation of future quantization. The experience of using an 8-bit Nemo quant should more closely mirror that of using the full-fat bf16 model compared to if they hadn’t used QAT.
No comments yet.