top | item 40457231

(no title)

Interesting, do you have any hunch as to why this is? We've seen in more verticalized apps where the underlying model is hidden from the user (sales call agent, autopilot tool, support agent etc.) that trying to reach high quality on hard prompts and high speed on the remaining prompts makes routing an appealing option.

discuss

weird-eye-issue|1 year ago

We charge users different amounts of credits based on the model used. They also just generally have a personal preference for each model. Some people love Claude, some hate it, etc

For something like a support agent why couldn't the company just choose a model like GPT-4o and stick with one? Would they really trust some responses going to 3.5 (or similar)?

danlenton|1 year ago

Currently the motivation is mainly speed. For the really easy ones like "hey, how's it going?" or "sorry I didn't hear you, can you repeat?" you can easily send to Llama3 etc. Ofc you could do some clever caching or something, but training a custom router directly on the task to optimize the resultant performance metric doesn't require any manual engineering.

Still, I agree that routing in isolation is not thaaat useful in many LLM domains. I think the usefulness will increase when applying to multi-step agentic systems, and when combining with other optimizations such as end-to-end learning of the intermediate prompts (DSPy etc.)

Thanks again for diving deep, super helpful!