I generally download the safetensors and make my own GGUFs, usually at Q8_0.
Is there any measurable benefit to your dynamic quants at that quant level?
I looked at your dynamic quant 2.0 page, but all the charts and graphs appear to cut off at Q4.
There definitely is a benefit for dynamically selecting layers to be at diff bit rates - I wrote about the difference between naively quantizing and selectively quantizing: https://unsloth.ai/blog/deepseekr1-dynamic
Thanks Daniel. I know you upload them, but I was hoping for some solid numbers on your dynamic q8 vs a naive quant. There doesn't seem to be anything on either of those links to show improvement at those quant levels.
My gut feeling is that there's not enough benefit to outweigh the risk of putting a middleman in the chain of custody from the original model to my nvme.
However, I can't know for sure without more testing than I have the time or inclination for, which is why I was hoping there had been some analysis you could point me to.
danielhanchen|7 months ago
Oh the blog at https://docs.unsloth.ai/basics/unsloth-dynamic-2.0-ggufs does talk about 1, 2, 3, 4, 5, 6 and 8bit dynamic GGUFs as well!
There definitely is a benefit for dynamically selecting layers to be at diff bit rates - I wrote about the difference between naively quantizing and selectively quantizing: https://unsloth.ai/blog/deepseekr1-dynamic
DrPhish|7 months ago
My gut feeling is that there's not enough benefit to outweigh the risk of putting a middleman in the chain of custody from the original model to my nvme.
However, I can't know for sure without more testing than I have the time or inclination for, which is why I was hoping there had been some analysis you could point me to.