When output is good enough, other considerations become more important. Most people on this planet cannot afford even an AI subscription, and cost of tokens is prohibitive to many low margin businesses. Privacy and personalization matter too, data sovereignty is a hot topic. Besides, we already see how focus has shifted to orchestration, which can be done on CPU and is cheap - software optimizations may compensate hardware deficiencies, so it’s not going to be frozen. I think the market for local hardware inference is bigger than for clouds, and it’s going to repeat Android vs iOS story.
This is the same justification that was used to ship the (now almost entirely defunct) NPUs on Apple and Android devices alike.
The A18 iPhone chip has 15b transistors for the GPU and CPU; the Taalas ASIC has 53b transistors dedicated to inference alone. If it's anything like NPUs, almost all vendors will bypass the baked-in silicon to use GPU acceleration past a certain point. It makes much more sense to ship a CUDA-style flexible GPGPU architecture.
Is progress still exponential? Feels like its flattening to me, it is hard to quantify but if you could get Opus 4.2 to work at the speed of the Taalas demo and run locally I feel like I'd get an awful lot done.
Bake in a Genius Bar employee, trained on your model's hardware, whose entire reason for existence is to fix your computer when it breaks. If it takes an extra 50 cents of die space but saves Apple a dollar of support costs over the lifetime of the device, it's worth it.
Yeah, the space moves so quickly that I would not want to couple the hardware with a model that might be outdated in a month. There are some interesting talking points but a general purpose programmable asic makes more sense to me.
ivan_gammel|8 days ago
bigyabai|8 days ago
The A18 iPhone chip has 15b transistors for the GPU and CPU; the Taalas ASIC has 53b transistors dedicated to inference alone. If it's anything like NPUs, almost all vendors will bypass the baked-in silicon to use GPU acceleration past a certain point. It makes much more sense to ship a CUDA-style flexible GPGPU architecture.
wmf|7 days ago
padjo|8 days ago
sowbug|8 days ago
r0b05|8 days ago
RobertDeNiro|8 days ago
selcuka|8 days ago
Planned obsolescence? /s
Jokes aside, they can make the "LLM chip" removable. I know almost nothing is replaceable in MacBooks, but this could be an exception.