top | item 46211492

(no title)

ph4rsikal | 2 months ago

On-device moves all compute cost (incl. electricity) to the consumer. I.e., as of 2025 that means much less battery life, a much warmer device, and much higher electricity costs. Unless the M-series can do substantially more with less this is a dead end.

discuss

veunes|2 months ago

That's fair for brute force (running a model on the GPU), but that's exactly where NPUs come in - they are orders of magnitude more energy-efficient for matrix operations than GPUs. Apple has been putting NPUs in every chip for years for a reason. For short, bursty tasks (answer a question, generate an image), the battery impact will be minimal. It's not 24/7 crypto mining, it's impulse load

WatchDog|2 months ago

For the occasional local LLM query, running locally probably won't make much of a dent in the battery life, smaller models like mistral-7b can run at 258 tokens/s on an iPhone 17[0].

The reason why local LLMs are unlikely to displace cloud LLMs is memory footprint, and search. The most capable models require hundreds of GB of memory, impractical for consumer devices.

I run Qwen 3 2507 locally using llama-cpp, it's not a bad model, but I still use cloud models more, mainly due to them having good search RAG. There are local tools for this, but they don't work as well, this might continue to improve, but I don't think it's going to get better than the API integrations with google/bing that cloud models use.

[0]: https://github.com/ggml-org/llama.cpp/discussions/4508

ph4rsikal|2 months ago

I used Mistral 7B a lot in 2023. It was a good model then. Now its not anywhere near where SOTA models are.

Marsymars|2 months ago

Battery isn't relevant to plugged-in devices, and in the end, electricity costs roughly the same to generate and deliver to a data center as to a home. The real cost advantage that cloud has is better amortization of hardware since you can run powerful hardware at 100% 24/7 spread across multiple people. I wouldn't bet on that continuing indefinitely, consumer hardware tends to catch up to HPC-exclusive workloads eventually.

fn-mote|2 months ago

You could have an AppleTV with 48 GB VRAM backing the local requests, but... the trend is "real computers" disappearing from homes, replaced by tablets and phones. The advantage the cloud has is Real Compute Power for the few seconds you need to process the interaction. That's not coming home any time soon.

ph4rsikal|2 months ago

One of the costs I see at the end of a month. The other I don't.

wooger|2 months ago

For me, when the AI service is operatied by the OS vendor, with root... What is the possible benefit of on device processing?

* If you trust the OS vendor, why wouldn't you trust them to handle AI queries in a responsible, privacy respecting manner?

* If you don't trust your OS vendor, you have a bigger problem than just privacy. Stop using it.

What makes people think that on-device processed queries can't be logged and sent off for analysis anyway?

reaperducer|2 months ago

What is the possible benefit of on device processing?

I envy your very simple, sedentary life where you are never outside of a high-speed wifi bubble.

Look at almost every Apple ad: It's people climbing rocks, surfing, skiing, enjoying majestic vistas, and all those things that very often come with reduced or zero connectivity.

Apple isn't trying to reach couch potatoes.

SchemaLoad|2 months ago

Apple runs all the heavy compute stuff overnight when your device is plugged in. The cost of the electricity is effectively nothing. And there is no impact on your battery life or device performance.