top | item 45490820

(no title)

randomname4325 | 4 months ago

The bubble pops when Apple releases an iPhone that runs a good enough for most things LLM locally. At that point cloud hardware investments will plateau (unless some new GPU melting use case comes out). Investors will move from nvidia, AMD into Apple.

discuss

order

jeswin|4 months ago

As a local LLM enthusiast, I can tell you that it's useless for most real work - even on desktop form factors. Phones catching up is ever farther out.

randomname4325|4 months ago

Based on the recently released graph of how people are using chatgpt. ~80% of use cases (practical guidance, seeking information, writing) could presumably run on a local model.

aswegs8|4 months ago

What's the advantage of that, exactly? Why would you want something very compute intensive run on your phone instead of just using an API to data centers with great economy of scale?

randomname4325|4 months ago

My assumption is that most users won't actually care if the LLM is in the cloud or device. That said, quite a few folks have iPhones and Apple's only way into the AI race is to go to it's strength, 1B+ hardware devices that they design the silicon for. They will produce a phone that runs a local LLM and market it as private and secure. People upgrade every couple of years (lose or breaks) so this will drive adoption. I'm not saying people will vibe code on their iphones.

lelanthran|4 months ago

Price, for one. I don't mind running a local model at half the speed if all it costs is electricity.

A local model basically allows me to experiment with running an agent 24x7, 365 days a year with continuous prompting.

SaaS won't be able to match that.

whitehexagon|4 months ago

Or just a mini configured default 128GB or 256GB.

I've been using Qwen3:32b on a 32GB M1 (asahi) and it does most of what I need, albeit a bit slow, but not slow enough that I´d pay monthly for remote ad delivery.

I suspect this huge splurge of hardware spending is partially an attempt to starve the market of cheap RAM and thus limit companies releasing 128GB/256GB standalone LLM boxes.

simianwords|4 months ago

why do you think LLM's will get good enough that they can run locally but the ones requiring nvidia GPU's will not get better?

nerdix|4 months ago

The models running on $50k GPUs will get better but the models running on commodity hardware will hit an inflection point where they are good enough for most use cases.

If I had to guess I would say that's probably 10 or 15 years away for desktop class hardware and longer for mobile (maybe another 10 years).

Maybe the frontier models of 2040 are being used for more advanced things like medical research and not generating CRUD apps or photos of kittens. That would mean that the average person is likely using the commodity models that are either free or extremely cheap to use.

vachina|4 months ago

ok, you can technically upload all your photos to Google cloud for all the same semantic labeling features as iOS Photos app, but having local, always available and fast local inferencing is arguably more useful and valuable to the end user.

TiredOfLife|4 months ago

The new iPhones barely got 12gb of ram. The way Apple is going iPhones will have enough ram for llms in about 100 years

vachina|4 months ago

Trying to compare RAM size and CPU cores is so yesterday. Apple owns the entire stack they can make anything fit into their core if they so desire.

baq|4 months ago

that's... some years... from now

kcb|4 months ago

What's the benefit to running LLMs locally? Data is already remote, LLM inferencing isn't particularly constrained by Internet latency. So you get worse models, performance, and battery life. Local compute on a power constrained mobile device is required for applications that require low latency or significant data throughput and LLM inferencing is neither.

fkyoureadthedoc|4 months ago

> What's the benefit to running LLMs locally?

At work:

That I don't rent $30,000 a month of PTUs from Microsoft. That I can put more restricted data classifications into it.

> LLM inferencing isn't particularly constrained by Internet latency

But user experience is

mrweasel|4 months ago

The data you need is mostly not remote. A friend works at a software development company, they can use LLMs, but only local ones (local as in their datacenter) and it can only be trained on their code base). Customer service LLMs need to be trained on in-house material, not generic Internet sources.

The general advantage is that you know that you're not leaking information, because there's nowhere to leak it to. You know the exact input, because you provided it. You also get the benefit of being able to have on device encryption, the data is no good in the datacenter if it's encrypted.