top | item 46960550

(no title)

radarsat1 | 20 days ago

It's cool but do I really want a single browser tab downloading 2.5 GB of data and then just leaving it to be ephemerally deleted? I know the internet is fast now and disk space is cheap but I have trouble bringing myself around to this way of doing things. It feels so inefficient. I do like the idea of client-side compute, but I feel like a model (or anything) this big belongs on the server.

discuss

order

tyushk|19 days ago

I don't think local as it stands with browsers will take off simply from the lead time (of downloading the model), but a new web API for LLMs could change that. Some standard API to communicate with the user's preferred model, abstracting over local inference (like what Chrome does with Gemini Nano (?)) and remote inference (LM Studio or calling out to a provider). This way, every site that wants a language model just has to ask the browser for it, and they'd share weights on-disk across sites.

radarsat1|17 days ago

It sounds good, but I'm not sure that in practice sites will want to "let go" of control this way, knowing that some random model can be used. Usually sites with chatbots want a lot of control over the model behaviour, and spend a lot of time working on how it answers, be it through context control, guardrails or fine tuning and base model selection. Unless everyone standardizes on a single awesome model that everyone agrees is the best for everything, which I don't see happening any time soon, I think this idea is DOA.

Now I could imagine such an API allowing to request a model from huggingface for example, and caching it long term that way, yes just like LM Studio does. But doing this based on some external resource requesting it, vs you doing it purposefully, has major security implications, not to mention not really getting around the lead time problem you mention whenever a new model is requested.

xandrius|19 days ago

There will always be someone unhappy for literally any aspect of something new. Finding 2.5gb for a local LLM problematic in 2026, I really cannot think what is safe anymore.

We went from impossible to centralised to local in a couple of years and the "cost" is 2.5gb of hard drive.

radarsat1|17 days ago

I didn't say that 2.5gb is unreasonable for an LLM. I said it's an unreasonable payload size for a website. Not the same.