At 4 bits that model won't fit into 128GB so you're spilling over into swap which kills performance. I've gotten great results out of glm-4.5-air which is 4.5 distilled down to 110B params which can fit nicely at 8 bits or maybe 6 if you want a little more ram left over.
hasperdi|2 months ago
- https://huggingface.co/unsloth/GLM-4.6-GGUF/blob/main/GLM-4.... - 84GB, Q1 - https://huggingface.co/unsloth/GLM-4.6-REAP-268B-A32B-GGUF/t... - 92GB, Q2
I ensure that there are enough RAM leftover ie limited context window setting, so no swapping.
As for GLM-4.5-Air, I run that daily, switching between noctrex/GLM-4.5-Air-REAP-82B-A12B-MXFP4_MOE-GGUF and kldzj/gpt-oss-120b-heretic
andai|2 months ago
I can't tell if it's some bug regarding message formats or if it's just genuinely giving up, but it failed to complete most tasks I gave it.