top | item 46681220 (no title) disiplus | 1 month ago yeah there is no way to run 4.7 on a 32g vram this flash is something that im also waiting to try later tonight discuss order hn newest omneity|1 month ago Why not? Run it with vLLM latest and enable 4bit quantization with bnb, and it will quantize the original safetensors on the fly and fit your vram. disiplus|1 month ago because how huge glm 4.7 is https://huggingface.co/zai-org/GLM-4.7 load replies (1)
omneity|1 month ago Why not? Run it with vLLM latest and enable 4bit quantization with bnb, and it will quantize the original safetensors on the fly and fit your vram. disiplus|1 month ago because how huge glm 4.7 is https://huggingface.co/zai-org/GLM-4.7 load replies (1)
disiplus|1 month ago because how huge glm 4.7 is https://huggingface.co/zai-org/GLM-4.7 load replies (1)
omneity|1 month ago
disiplus|1 month ago