I’m fascinated by this new paradigm. We’ve more or less perfected Mixture-of-Experts inside a single model, where routing happens between subnetworks. What GPT-5 auto (and this paper) are doing is a step further: “LLM routing” across multiple distinct models. It’s still rough right now, but it feels inevitable that this will get much better over time.
NitpickLawyer|6 months ago
Yeah, the signals they get will improve things over time. You can do a lot of heavy lifting with embedding models nowadays, get "satisfaction" signals from chats, and adjust your router based on those. It will be weird at first, some people will complain, but at the end of the day, you don't need imo-gold levels of thinking to write a fitness plan that most likely the user won't even follow :)
Signal gathering is likely the driver of most of the subsidised model offerings we see today.
phi-go|6 months ago
nico|6 months ago
And then maybe you could just customize and optimize your own mode for local use. Almost like mixing and matching different modules. It would be nice to have a model that only knows and does what you need it to
mrbald|6 months ago
akavi|6 months ago
My understanding is that GPT5 already does this by varying the quantity of CoT done (in addition to the kind of super-model-level routing described in the post), and I strongly suspect it's only going to get more sophisticated
imtringued|6 months ago
This approach is much more efficient than the paper of this HN submission, because request based routing requires you to recalculate the KV cache from scratch as you switch from model to model.
krackers|6 months ago
CuriouslyC|6 months ago