(no title)
tgtweak | 10 days ago
There are use cases for fast/ultrafast inferrence models - classifying text, scoring things, extracting information - but for coding and other knowledge tasks - you're not going to get to your solution faster at 16,000 tokens/s if the solution never comes (or is the wrong one).
No comments yet.