What are some of the better use cases of fast inference? From my experience using ChatGPT, I don't need it to generate faster than I can read, but waiting for code generation is painful because I'm waiting for the whole code block to format correctly, be available to copy or execute (in the case of code interpreter). Anything else fall under this pattern?
rfw300|2 years ago
lmeyerov|2 years ago
* reading: If you want it to do inference over a lot of context, you'll need to do multiple inferences. If each inference is faster, you can 'read' more in the same time on the same hardware
* thinking: a lot of analytical approaches essentially use writing as both memory & thinking. Imagine iterative summarization, or automatically iteratively refining code until it's right
For louie.ai sessions, that's meant a fascinating trade-off here when doing the above:
* We can use smarter models like gpt-4 to do fewer iterations...
* ... or a faster but dumber model to get more iterations in the same amount of time
It's entirely not obvious. For example, the humaneval leaderboard has gpt4 for code being beat by gpt 3.5 for code when run by a LATS agent: https://paperswithcode.com/sota/code-generation-on-humaneval . This highlights that the agent framework is the one really responsible for final result quality, so their ability to run many iterations in the same time window matters.
jasonjmcghee|2 years ago
Most use cases outside of classic chat.
For example, I made an on-demand educational video project, and the slowest part was by far the content generation. RAG, TTS, Image generation, text rendering, and video processing were all a drop in the bucket, in comparison.
It would be an even wider gap now, and TTS is super-realtime, and image generation can be single step.
ClarityJones|2 years ago
- Hook LLM to VMs
- Ask for code that [counts to 10]
- Run code on VM
- Ask different LLM to Evaluate Results.
- Repeat for sufficient volume.
- Train.
The faster it can generate results the faster those results can be tested against the real world, e.g. a VM, users on X, other models with known accuracies.
wedn3sday|2 years ago
dnnssl2|2 years ago
What's a good use case for an order of magnitude decrease in price per token? Web scale "analysis" or cleaning of unstructured data?