(no title)
jplusequalt | 12 days ago
https://apxml.com/models/glm-5
To run GLM-5 you need access to many, many consumer grade GPUs, or multiple data center level GPUs.
>They will likely get cheaper to run over time as well (better hardware).
Unless they magically solve the problem of chip scarcity, I don't see this happening. VRAM is king, and to have more of it you have to pay a lot more. Let's use the RTX 3090 as an example. This card is ~6 years old now, yet it still runs you around $1.3k. If you wanted to run GLM-5 I4 quantization (the lowest listed in the link above) with a 32k context window, you would need *32 RTX 3090's*. That's $42k dollars you'd be spending on obsolete silicon. If you wanted to run this on newer hardware, you could reasonable expect to multiply that number by 2.
RGamma|12 days ago
Also, how much bang for the buck do those 3090s actually give you compared to enterprise-grade products?