top | item 44821364 (no title) iamnotagenius | 6 months ago Not quite true. Depends on number of KV heads. GLM4 32b at IQ4 quant and Q8 context can run full context with only 20GiB VRAM. discuss order hn newest No comments yet.
No comments yet.