top | item 47021328

(no title)

noodletheworld | 15 days ago

Nice idea, but isnt this kind of daft?

There are basically no useful models that run on phone hardware.

> Results vary by model size and quantization.

I bet they do.

Look, if you cant run models on your desktop, theres no way in hell they run on your phone.

The problem with all of these self hosting solutions is that the actual models you can run on them aren't any good.

Not like, “chat gpt a year ago” not good.

Like, “its a potato pop pop” no good.

Unsloth has a good guide on running qwen3 (1), and the tldr is basically, its not really good unless you run a big version.

The iphone 17 pro has 12GB of ram.

That is, to be fair, enough to run some small stable diffusion models, but it isnt enough to run run a decent quant of qwen3.

You need about 64 GB for that.

So… i dunno. This feels like a bunch of empty promises; yes, technically it can run some models, but how useful is it actually?

Self hosting needs next gen hardware.

This gen of desktop hardware isnt good enough, even remotely, to compare to server api options.

Running on mobile devices is probably still a way away.

(1) - https://unsloth.ai/docs/models/qwen3-how-to-run-and-fine-tun...

discuss

order

resonious|15 days ago

The app is basically just a wrapper that makes it super easy to set this up, which I'm very thankful for. I sometimes want to toy with this stuff but the amount of tinkering and gluing things together needed to just get a chat going is always too much for me. The fact that the quality of the AI isn't good is just the models not being quite there yet. If the models get better, this app will be killer.

If there's a similar app for desktop that can set up the stronger models for me, I'd love to hear about it.

ali_chherawalla|15 days ago

LM Studio does it well. Along with being a system integrator for SD, and text models I've tried to create a very good chat experience. So theres some sauce over there with Prompt enhancements, Auto detection of images, English Transcription suppor, etc

noodletheworld|14 days ago

> If the models get better, this app will be killer.

Any random thing might happen in the future.

That doesnt have any bearing on how useful this is right now.

All we can do is judge right now how this compares to what it promises.

K0balt|15 days ago

Yeah. The solution if you want to have your own AI is to put a box online or rent cloud inference, and access it over a browser or a phone app.

We have on-prem AI for my microgrid community, but it’s a nascent effort and we can only run <100b models. At least that size is extremely useful for most stuff, and we have a selection of models to choose from on openAI /ollama compatible API endpoints.

ali_chherawalla|15 days ago

I actually think you should give it a spin. IMO you don't need claude level performance for a lot of day to day tasks. Qwen3 8B, or even 4B quantized is actually quite good. Take a look at it. You can offload to the GPU as well so it should really help with speed. Theres a setting for it

noodletheworld|14 days ago

> Qwen3 8B, or even 4B quantized is actually quite good.

No, it’s not.

Trust me, I don't write this from a position of vague hand waving.

Ive tried a lot of self hosted models at a lot of sizes; those small models are not good enough, and do not have a context long enough to be useful for most everyday operations.

iddan|15 days ago

I think if people will people know how accessible it is to run local LLMs on their device they will consider buying devices with more memory that will be able to run better models. Local LLMs in the long run are game changers

ali_chherawalla|15 days ago

I agree. I mean mobile devices have only been getting more and more powerful.

jeroenhd|15 days ago

> The iphone 17 pro has 12GB of ram.

I'm surprised Apple is still cheaping out on RAM on their phones, especially with the effort they've been putting into running AI locally and all of their NPU marketing.

ali_chherawalla|14 days ago

with the metal infra its actually quite good. Agreed you can't run really large models, but inference is very fast and TTFT is very low. It's a beautiful experience

ImPostingOnHN|15 days ago

It seems like a good solution for those living under a regime that sensors communication, free information flow, and LLM usage. Especially with a model that contains useful information.