top | item 43018586

(no title)

mluo | 1 year ago

Hi, one of the lead authors for this work.

We recommend using Bfloat16 (not fp16), quantization for small models can really hurt performance!

discuss

order

CamperBob2|1 year ago

Have you compared it to the 1.58 bit dynamic quant model based on the original R1 (i.e., not a distillation)? Whatever unsloth did, it doesn't seem to be giving up much reasoning performance over the full Q8 version.

mluo|1 year ago

It's simply bc the model is small (1.5B), making it sensitive to weight perturbations

newman314|1 year ago

Is there a MLX version that can be added to the fullmoon iOS app?