(no title)
jjcm | 9 days ago
10b daily tokens growing at an average of 22% every week.
There are plenty of times I look to groq for narrow domain responses - these smaller models are fantastic for that and there's often no need for something heavier. Getting the latency of reponses down means you can use LLM-assisted processing in a standard webpage load, not just for async processes. I'm really impressed by this, especially if this is its first showing.
jtr1|9 days ago
ethmarks|9 days ago
For example, searching a database of tens of millions of text files. Very little "intelligence" is required, but cost and speed are very important. If you want to know something specific on Wikipedia but don't want to figure out which article to search for, you can just have an LLM read the entire English Wikipedia (7,140,211 articles) and compile a report. Doing that would be prohibitively expensive and glacially slow with standard LLM providers, but Taalas could probably do it in a few minutes or even seconds, and it would probably be pretty cheap.
spot5010|9 days ago
freakynit|9 days ago
LLM's have opened-up natural language interface to machines. This chip makes it realtime. And that opens a lot of use-cases.
redman25|9 days ago
SkyPuncher|8 days ago
So many problems simply don't require a full LLM, but more than traditional software. Training a novel model isn't really a compelling argument at most tech startups right now, so you need to find an LLM-native way to do things.