top | item 47204329

(no title)

holoduke | 23 hours ago

Your Gemini or Opus question got send to a Texas datacenter where it got queued and processed by a subunit of 80 h200 140gb 1000w cards running a many billion or trillion parameter model. It took less that 200ms to process a single request. Your Claude cliënt decided to spawn 30 sub agents and iterated in a total of 90 requests totalling about 45000ms. Now compare that to your 100b transistor cpu doing something similar. Yes that would be slow.

discuss

mstaoru|20 hours ago

Right, it was more of a rhetorical question :) With my point being - how are these local models really useful to me now? Is the Only Way ™ to sell my house and build a 8x5090 monster?.. How does that compare to $20/month Opus? (Privacy aside.)

The second order thought from this is... will we get a value-based price leveling soon? If the alternative to a hosted LLM is to build $10-20k+ machine with $500+ monthly energy bills, will hosted price asymptotically climb up to reflect this reality?

Something to think about.

regularfry|16 hours ago

Looked at from the other end of the telescope, the other factor is how fast low-end local models can gain capability. This 35b model is absolutely fine on a 4090 in a machine that was about £3000 when I bought it three years ago. Where will what you can run on a 4090, or a 5090, be in six months? That's the interesting question, but we're already well past the point where the uses to which you will be able to put a local model dramatically increase within the depreciation lifespan of the hardware.

etyhhgfff|19 hours ago

We would need a super high end AI accelerator with specialised cooling for less than 3k bucks to make it happen. Consumer gaming graphics card wont fit the bill. Problem is all TSMC capacity is already booked for years to come by the big players to build data center grade hardware with price tags and setup requirements out of consumer reach.