top | item 44356830

Always get the best LLM performance for your $?

3 points| romain_batlle | 8 months ago

Hey, I built an inference router that literally makes provider of LLM compete in real-time on speed, latency, price to serve each call. So it works on open and closed model, and for closed model price is fixed so provider only “compete” on speed and latency.

Spent quite some time normalizing APIs, handling tool-calls, and managing prompt caching, but the end result sounds very cool: You always get the absolute best value for your \$ at the exact moment of inference.

Currently runs perfectly on a Roo and Cline fork, and on any OpenAI compatible BYOK app (so kind of everywhere)

Feedback very much welcomed! Please tear it apart: [https://makehub.ai](https://makehub.ai/)

discuss

No comments yet.