(no title)
akavi | 6 months ago
My understanding is that GPT5 already does this by varying the quantity of CoT done (in addition to the kind of super-model-level routing described in the post), and I strongly suspect it's only going to get more sophisticated
imtringued|6 months ago
This approach is much more efficient than the paper of this HN submission, because request based routing requires you to recalculate the KV cache from scratch as you switch from model to model.