(no title)
cxie | 1 year ago
Instead of one massive model trying to do everything, you'd have specialized models for OCR, code generation, image understanding, etc. Then a "router LLM" would direct queries to the appropriate specialized model and synthesize responses.
The efficiency gains could be substantial - why run a 1T parameter model when your query just needs a lightweight OCR specialist? You could dynamically load only what you need.
The challenge would be in the communication protocol between models and managing the complexity. We'd need something like a "prompt bus" for inter-model communication with standardized inputs/outputs.
Has anyone here started building infrastructure for this kind of model orchestration yet? This feels like it could be the Kubernetes moment for AI systems.
arcfour|1 year ago
fnordpiglet|1 year ago
unboxingelf|1 year ago