top | item 47087187

(no title)

jjcm | 9 days ago

A lot of naysayers in the comments, but there are so many uses for non-frontier models. The proof of this is in the openrouter activity graph for llama 3.1: https://openrouter.ai/meta-llama/llama-3.1-8b-instruct/activ...

10b daily tokens growing at an average of 22% every week.

There are plenty of times I look to groq for narrow domain responses - these smaller models are fantastic for that and there's often no need for something heavier. Getting the latency of reponses down means you can use LLM-assisted processing in a standard webpage load, not just for async processes. I'm really impressed by this, especially if this is its first showing.

discuss

jtr1|9 days ago

Maybe this is a naive question, but why wouldn't there be market for this even for frontier models? If Anthropic wanted to burn Opus 4.6 into a chip, wouldn't there theoretically be a price point where this would lower inference costs for them?

ethmarks|9 days ago

Because we don't know if this would scale well to high-quality frontier models. If you need to manufacture dedicated hardware for each new model, that adds a lot of expense and causes a lot of e-waste once the next model releases. In contrast, even this current iteration seems like it would be fantastic for low-grade LLM work.

For example, searching a database of tens of millions of text files. Very little "intelligence" is required, but cost and speed are very important. If you want to know something specific on Wikipedia but don't want to figure out which article to search for, you can just have an LLM read the entire English Wikipedia (7,140,211 articles) and compile a report. Doing that would be prohibitively expensive and glacially slow with standard LLM providers, but Taalas could probably do it in a few minutes or even seconds, and it would probably be pretty cheap.

spot5010|9 days ago

These seem ideal for robotics applications, where there is a low-latency narrow use case path that these chips can serve, maybe locally.

freakynit|9 days ago

Exactly. One easily relatable use-case is structured content extraction or/and conversion to markdown for web page data. I used to use groq for same (gpt-oss20b model), but even that used to feel slow when doing theis task at scale.

LLM's have opened-up natural language interface to machines. This chip makes it realtime. And that opens a lot of use-cases.

redman25|9 days ago

Many older models are still better at "creative" tasks because new models have been benchmarking for code and reasoning. Pre-training is what gives a model its creativity and layering SFT and RL on top tends to remove some of it in order to have instruction following.

SkyPuncher|8 days ago

I have such a deep need for something that's just a step above semantic search. These non-frontier models running blazingly fast can solve that.

So many problems simply don't require a full LLM, but more than traditional software. Training a novel model isn't really a compelling argument at most tech startups right now, so you need to find an LLM-native way to do things.