(no title)
mstaoru | 1 day ago
So far Opus 4.6 and Gemini Pro are very satisfactory, producing great answers fairly fast. Gemini is very fast at 30-50 sec, Opus is very detailed and comes at about 2-3 minutes.
Today I ran the question against local qwen3.5:35b-a3b - it puffed for 45 (!) minutes, produced a very generic answer with errors, and made my laptop sound like it's going to take off any moment.
Wonder what am I doing wrong?.. How am I supposed to use this for any agentic coding on a large enough codebase? It will take days (and a 3M Peltor X5A) to produce anything useful.
lm28469|1 day ago
You're comparing 100b parameters open models running on a consumer laptop VS private models with at the very least 1t parameters running on racks of bleeding edge professional gpus
Local agentic coding is closer to "shit me the boiler plate for an android app" not "deep research questions", especially on your machine
vlovich123|1 day ago
Speculation is that the frontier models are all below 200B parameters but a 2x size difference wouldn’t fully explain task performance differences
shlomo_z|20 hours ago
delaminator|1 day ago
aspenmartin|1 day ago
zozbot234|1 day ago
But if you've got that kind of equipment, you aren't using it to support a single user. It gets the best utilization by running very large batches with massive parallelism across GPUs, so you're going to do that. There is such a thing as a useful middle ground. that may not give you the absolute best in performance but will be found broadly acceptable and still be quite viable for a home lab.
adam_patarino|14 hours ago
The reality in ML is that small models can perform better at a narrow problem set than large ones.
The key is the narrow problem set. Opus can write you a poem, create a shopping list, and analyze your massive code base.
We trained our model to only focus on coding with our specific agent harness, tools, and context engine. And it’s small enough to fit on an M2 16GB. It’s as good as sonnet 4.5 and way better than qwen3.5:35b-a3b
Our beta will be out soon / rig.ai
amritananda|2 hours ago
wolvoleo|1 day ago
Even on servers this can happen. At work we have a 2U sized server with two 250W class GPUs. And I found that by pinning the case fans at 100% I can get 30% more performance out of GPU tasks which translates to several days faster for our usecase. It does mean I can literally hear the fans screaming in the hallway outside the equipment room but ok lol. Who cares. But a laptop just can't compare.
Something with a desktop GPU or even better something with HBM3 would run much better. Local models get slow when you use a ton of context and the memory bandwidth of a MacBook Pro while better than a pc is still not amazing.
And yeah the heaviest tasks are not great on local models. I tend to run the low hanging fruit locally and the stuff where I really need the best in the cloud. I don't agree local models are on par, however I don't think they really need to be for a lot of tasks.
pamcake|1 day ago
meatmanek|23 hours ago
I'm too GPU-poor to run it, but r/LocalLLaMa is full of people using it.
boutell|15 hours ago
On the plus side, it did figure out the question even without the first sentence that's intended as a bit of a giveaway.
regularfry|13 hours ago
__mharrison__|1 day ago
Admittedly, I haven't tried these models on my Mac, but I have on my DGX Spark, and they ran fine. I didn't see the slowdown you're mentioning.
mstaoru|18 hours ago
zozbot234|1 day ago
stavros|1 day ago
I really, really want open weights models to be great, but I've been disappointed with them. I don't even run them locally, I try them from providers, but they're never as good as even the current Sonnet.
mstaoru|11 hours ago
PS: I can understand that isolated "valuable" problems like sorting photo collection or feeding a cat via ESPHome can be solved with local models.
wat10000|1 day ago
satvikpendem|1 day ago
xtn|23 hours ago
On the other hand, if indeed open source models and Macbooks can be as powerful as those SOTA models from Google, etc, then stock prices of many companies would already collapsed.
muyuu|20 hours ago
notreallya|1 day ago
culi|1 day ago
holoduke|20 hours ago
mstaoru|18 hours ago
The second order thought from this is... will we get a value-based price leveling soon? If the alternative to a hosted LLM is to build $10-20k+ machine with $500+ monthly energy bills, will hosted price asymptotically climb up to reflect this reality?
Something to think about.
rienko|1 day ago
if you are able to run something like mlx-community/MiniMax-M2.5-3bit (~100gb), my guess if the results are much better than 35b-a3b.
furyofantares|1 day ago
andxor|1 day ago
gigatexal|23 hours ago
CamperBob2|1 day ago
Also, performance on research-y questions isn't always a good indicator of how the model will do for code generation or agent orchestration.
regularfry|13 hours ago
Paddyz|1 day ago
[deleted]