(no title)
seldo
|
1 year ago
For agentic use cases, where you might need several round-trips to the LLM to reflect on a query, improve a result, etc., getting fast inference means you can do more round-trips while still responding in reasonable time. So basically any LLM use-case is improved by having greater speed available IMO.
freediver|1 year ago