top | item 40455575

(no title)

Thanks for weighing in. I'm sure for your setup right now, our router in it's current form would not be useful for you. This is the very first version, and the scope is therefore relatively limited.

On our roadmap, we plan to support:

- an API which returns the neural scores directly, enabling model selection and model-specific prompts to all be handled on the client side

- automatic learning of intermediate prompts for agentic multi-step systems, taking a similar view as DSPy, where all intermediate LLM calls and prompts are treated as latent variables in an optimizable end-to-end agentic system.

With these additions, the subtleties of the model + prompt relationship would be better respected.

I also believe that LLMs will become more robust to prompt subtleties over time. Also, some tasks are less sensitive to these minor subtleties you refer to.

For example, if you have a sales call agent, you might want to optimize UX for easy dialgoue prompts (so the person on the other end isn't left waiting), and take longer thinking about harder prompts requiring the full context of the call.

This is just an example, but my point is that not all LLM applications are the same. Some might be super sensitive to prompt subtleties, others might not be.

Thoughts?

discuss

weird-eye-issue|1 year ago

I don't want/need any of that

It's already hard enough to get consistent behavior with a fixed model

If we need to save money we will switch to a cheaper model and adapt our prompts for that

If we are going more for quality we'll use and more expensive model and adapt our prompts for that

I fail to see any use case where I would want a third party choosing which model we are using at run time...

We are adding a new model this week and I've spent dozens of hours personally evaluating output and making tweaks to make it feasible

Making it sound like models are interchangeable is harmful

danlenton|1 year ago

Makes sense, however I would clarify that we don't need to make the final decision. If you're using the neural scoring function as an API, then you can just get predictions about how each model will likely perform on your prompt, and then use these predictions however you want (if at all). Likewise, the benchmarking platform [https://youtu.be/PO4r6ek8U6M] can be used to just assess different models on your prompts without needing to do any routing. Nonetheless, this perspective is very helpful.