The bubble pops when Apple releases an iPhone that runs a good enough for most things LLM locally. At that point cloud hardware investments will plateau (unless some new GPU melting use case comes out). Investors will move from nvidia, AMD into Apple.
jeswin|4 months ago
randomname4325|4 months ago
waltercool|4 months ago
[deleted]
aswegs8|4 months ago
randomname4325|4 months ago
lelanthran|4 months ago
A local model basically allows me to experiment with running an agent 24x7, 365 days a year with continuous prompting.
SaaS won't be able to match that.
whitehexagon|4 months ago
I've been using Qwen3:32b on a 32GB M1 (asahi) and it does most of what I need, albeit a bit slow, but not slow enough that I´d pay monthly for remote ad delivery.
I suspect this huge splurge of hardware spending is partially an attempt to starve the market of cheap RAM and thus limit companies releasing 128GB/256GB standalone LLM boxes.
simianwords|4 months ago
nerdix|4 months ago
If I had to guess I would say that's probably 10 or 15 years away for desktop class hardware and longer for mobile (maybe another 10 years).
Maybe the frontier models of 2040 are being used for more advanced things like medical research and not generating CRUD apps or photos of kittens. That would mean that the average person is likely using the commodity models that are either free or extremely cheap to use.
vachina|4 months ago
TiredOfLife|4 months ago
vachina|4 months ago
baq|4 months ago
kcb|4 months ago
fkyoureadthedoc|4 months ago
At work:
That I don't rent $30,000 a month of PTUs from Microsoft. That I can put more restricted data classifications into it.
> LLM inferencing isn't particularly constrained by Internet latency
But user experience is
mrweasel|4 months ago
The general advantage is that you know that you're not leaking information, because there's nowhere to leak it to. You know the exact input, because you provided it. You also get the benefit of being able to have on device encryption, the data is no good in the datacenter if it's encrypted.