top | item 43755836

(no title)

aazo11 | 10 months ago

By "too hard" I do not mean getting started with them to run inference on a prompt. Ollama especially makes that quite easy. But as an application developer, I feel these platforms are too hard to build around. The main issues being: getting the correct small enough task specific model and how long it takes to download these models for the end user.

discuss

order

thot_experiment|10 months ago

I guess it depends on expectations, if your expectation is an CRUD app that opens in 5 seconds, then sure, it's definitely tedious. People do install things though, the companion app for DJI action cameras is 700mb (which is an abomination, but still). Modern games are > 100gb on the high side, downloading 8-16gb of tensors one time is nbd. You mentioned that there are 663 different models of dsr1-7b on huggingface, sure, but if you want that model on ollama it's just `ollama run deepseek-r1`

As a developer the amount of effort I'm likely to spend on the infra side of getting the model onto the user's computer and getting it running is now FAR FAR below the amount of time I'll spend developing the app itself or getting together a dataset to tune the model I want etc. Inference is solved enough. "getting the correct small enough model" is something that I would spend the day or two thinking about/testing when building something regardless. It's not hard to check how much VRAM someone has and get the right model, the decision tree for that will have like 4 branches. It's just so little effort compared to everything else you're going to have to do to deliver something of value to someone. Especially in the set of users that have a good reason to run locally.