top | item 47047656

(no title)

>Open models like GLM 5 are very good. Even if companies decide to crank up the costs, the current open models will still be available.

https://apxml.com/models/glm-5

To run GLM-5 you need access to many, many consumer grade GPUs, or multiple data center level GPUs.

>They will likely get cheaper to run over time as well (better hardware).

Unless they magically solve the problem of chip scarcity, I don't see this happening. VRAM is king, and to have more of it you have to pay a lot more. Let's use the RTX 3090 as an example. This card is ~6 years old now, yet it still runs you around $1.3k. If you wanted to run GLM-5 I4 quantization (the lowest listed in the link above) with a 32k context window, you would need *32 RTX 3090's*. That's $42k dollars you'd be spending on obsolete silicon. If you wanted to run this on newer hardware, you could reasonable expect to multiply that number by 2.

discuss

RGamma|12 days ago

I mean it would make sense to see this as a hardware investment into a virtual employee, that you actually control (or rent from someone who makes this possible for you), not as private assistant. Ballparking your numbers, we would need at least an order of magnitude price-performance improvement for that I think.

Also, how much bang for the buck do those 3090s actually give you compared to enterprise-grade products?