I’m 100% sure the future consists of many models running on device. LLMs will be the mobile apps of the future (or a different architecture, but still intelligence).
1. Generic model that calls other highly specific, smaller, faster models.
2. Models loaded on demand, some black box and some open.
3. There will be a Rust model specifically for Rust (or whatever language) tasks.
In about 5-8 years we will have personalized models based upon all our previous social/medical/financial data that will respond as we would, a clone, capable of making decisions similar with direction of desired outcomes.
The big remaining blocker is that generic model that can be imprinted with specifics and rebuilt nightly. Excluding the training material but the decision making, recall, and evaluation model. I am curious if someone is working on that extracted portion that can be just a 'thinking' interface.
If anything, memory ain't getting cheaper, disks aren't either, and as for graphics cards, forget it.
People wont be competing with even a current 2026 SOTA from their home LLM nowhere soon. Even actual SOTA LLM providers are not competing either - they're losing money on energy and costs, hopping to make it up on market capture and win the IPO races.
I don’t think anyone needs to compete with the LLM SOTA to get the benefits of these technologies on-device.
Consumers don’t need a 100k context window oracle that knows everything about both T-Cells and the ancient Welsh Royal lineage. We need focused & small models which are specialised, and then we need a good query router.
ajnin|7 hours ago
latchkey|6 hours ago
pizzafeelsright|5 hours ago
1. Generic model that calls other highly specific, smaller, faster models. 2. Models loaded on demand, some black box and some open. 3. There will be a Rust model specifically for Rust (or whatever language) tasks.
In about 5-8 years we will have personalized models based upon all our previous social/medical/financial data that will respond as we would, a clone, capable of making decisions similar with direction of desired outcomes.
The big remaining blocker is that generic model that can be imprinted with specifics and rebuilt nightly. Excluding the training material but the decision making, recall, and evaluation model. I am curious if someone is working on that extracted portion that can be just a 'thinking' interface.
coldtea|4 hours ago
People wont be competing with even a current 2026 SOTA from their home LLM nowhere soon. Even actual SOTA LLM providers are not competing either - they're losing money on energy and costs, hopping to make it up on market capture and win the IPO races.
OtherShrezzing|4 hours ago
Consumers don’t need a 100k context window oracle that knows everything about both T-Cells and the ancient Welsh Royal lineage. We need focused & small models which are specialised, and then we need a good query router.