In offline training of our router, we run extensive cross-domain evaluations to determine when a smaller model can handle a request without any quality loss relative to more powerful models. In an online setting like our chat app, there's probably some more rigorous post-hoc analysis we could do on response quality—could make for a good follow-up post.
No comments yet.