top | item 43018586

(no title)

mluo | 1 year ago

Hi, one of the lead authors for this work.

We recommend using Bfloat16 (not fp16), quantization for small models can really hurt performance!

discuss

Have you compared it to the 1.58 bit dynamic quant model based on the original R1 (i.e., not a distillation)? Whatever unsloth did, it doesn't seem to be giving up much reasoning performance over the full Q8 version.

mluo|1 year ago

It's simply bc the model is small (1.5B), making it sensitive to weight perturbations

simonw|1 year ago

Is there a GGUF version of your model anywhere that you recommend? I'm on a Mac.

mluo|1 year ago

Think there are some people who made GGUFs as branches of our model, try it out!

https://huggingface.co/models?other=base_model:quantized:age...

newman314|1 year ago

Is there a MLX version that can be added to the fullmoon iOS app?