top | item 46361668

(no title)

jwitthuhn | 2 months ago

At 4 bits that model won't fit into 128GB so you're spilling over into swap which kills performance. I've gotten great results out of glm-4.5-air which is 4.5 distilled down to 110B params which can fit nicely at 8 bits or maybe 6 if you want a little more ram left over.

discuss

hasperdi|2 months ago

Correction, my GLM-4.6 models are not Q4, I can only run lower ones eg:

- https://huggingface.co/unsloth/GLM-4.6-GGUF/blob/main/GLM-4.... - 84GB, Q1 - https://huggingface.co/unsloth/GLM-4.6-REAP-268B-A32B-GGUF/t... - 92GB, Q2

I ensure that there are enough RAM leftover ie limited context window setting, so no swapping.

As for GLM-4.5-Air, I run that daily, switching between noctrex/GLM-4.5-Air-REAP-82B-A12B-MXFP4_MOE-GGUF and kldzj/gpt-oss-120b-heretic

andai|2 months ago

Are you getting any agentic out of gpt-oss-120b?

I can't tell if it's some bug regarding message formats or if it's just genuinely giving up, but it failed to complete most tasks I gave it.