top | item 43019094

(no title)

mluo | 1 year ago

For quantization, very big impact for small models, can drop at much as 10% on AIME. Our model does best on bfloat16 ;)

Come checkout our repo at: https://github.com/agentica-project/deepscaler

discuss

order

No comments yet.