On-device moves all compute cost (incl. electricity) to the consumer. I.e., as of 2025 that means much less battery life, a much warmer device, and much higher electricity costs. Unless the M-series can do substantially more with less this is a dead end.
veunes|2 months ago
WatchDog|2 months ago
The reason why local LLMs are unlikely to displace cloud LLMs is memory footprint, and search. The most capable models require hundreds of GB of memory, impractical for consumer devices.
I run Qwen 3 2507 locally using llama-cpp, it's not a bad model, but I still use cloud models more, mainly due to them having good search RAG. There are local tools for this, but they don't work as well, this might continue to improve, but I don't think it's going to get better than the API integrations with google/bing that cloud models use.
[0]: https://github.com/ggml-org/llama.cpp/discussions/4508
ph4rsikal|2 months ago
Marsymars|2 months ago
fn-mote|2 months ago
ph4rsikal|2 months ago
wooger|2 months ago
* If you trust the OS vendor, why wouldn't you trust them to handle AI queries in a responsible, privacy respecting manner?
* If you don't trust your OS vendor, you have a bigger problem than just privacy. Stop using it.
What makes people think that on-device processed queries can't be logged and sent off for analysis anyway?
reaperducer|2 months ago
I envy your very simple, sedentary life where you are never outside of a high-speed wifi bubble.
Look at almost every Apple ad: It's people climbing rocks, surfing, skiing, enjoying majestic vistas, and all those things that very often come with reduced or zero connectivity.
Apple isn't trying to reach couch potatoes.
SchemaLoad|2 months ago