top | item 44776312

(no title)

mustyoshi | 7 months ago

Yeah this is the thing people miss a lot. 7,32b models work perfectly fine for a lot of things, and run on previously high end consumer hardware.

But we're still in the hype phase, people will come to their senses once the large model performance starts to plateau

discuss

order

_heimdall|7 months ago

I expect people to come to their senses when LLM companies stop subsidizing cost and start charging customers what it actually costs them to train and run these models.

gunalx|7 months ago

I mean, there is no reason for a inference provider og open models to subsidice you. And costs there is usually cheaper than Claude API pricing.

zamadatix|7 months ago

People don't want to guess which sized model is right for a task and current systems are neither good or efficient at trying to estimate that automatically. I see only the power users tweaking more and more as performance plateaus and the average user only changing when it's automatic.

bakugo|7 months ago

> 7,32b models work perfectly fine for a lot of things

Like what? People always talk about how amazing it is that they can run models on their own devices, but rarely mention what they actually use them for. For most use cases, small local models will always perform significantly worse than even the most inexpensive cloud models like Gemini Flash.

totaa|7 months ago

Gemma 3n E4B has been crazy good for me - fine tune running on Google Cloud Run via Ollama, completely avoiding token based pricing at the cost of throughput limitations