top | item 43018586 (no title) mluo | 1 year ago Hi, one of the lead authors for this work.We recommend using Bfloat16 (not fp16), quantization for small models can really hurt performance! discuss order hn newest CamperBob2|1 year ago Have you compared it to the 1.58 bit dynamic quant model based on the original R1 (i.e., not a distillation)? Whatever unsloth did, it doesn't seem to be giving up much reasoning performance over the full Q8 version. mluo|1 year ago It's simply bc the model is small (1.5B), making it sensitive to weight perturbations simonw|1 year ago Is there a GGUF version of your model anywhere that you recommend? I'm on a Mac. mluo|1 year ago Think there are some people who made GGUFs as branches of our model, try it out!https://huggingface.co/models?other=base_model:quantized:age... newman314|1 year ago Is there a MLX version that can be added to the fullmoon iOS app?
CamperBob2|1 year ago Have you compared it to the 1.58 bit dynamic quant model based on the original R1 (i.e., not a distillation)? Whatever unsloth did, it doesn't seem to be giving up much reasoning performance over the full Q8 version. mluo|1 year ago It's simply bc the model is small (1.5B), making it sensitive to weight perturbations
mluo|1 year ago It's simply bc the model is small (1.5B), making it sensitive to weight perturbations
simonw|1 year ago Is there a GGUF version of your model anywhere that you recommend? I'm on a Mac. mluo|1 year ago Think there are some people who made GGUFs as branches of our model, try it out!https://huggingface.co/models?other=base_model:quantized:age...
mluo|1 year ago Think there are some people who made GGUFs as branches of our model, try it out!https://huggingface.co/models?other=base_model:quantized:age...
CamperBob2|1 year ago
mluo|1 year ago
simonw|1 year ago
mluo|1 year ago
https://huggingface.co/models?other=base_model:quantized:age...
newman314|1 year ago