top | item 47146586

(no title)

estsauver | 6 days ago

I think there's clearly a "Speed is a quality of it's own" axis. When you use Cereberas (or Groq) to develop an API, the turn around speed of iterating on jobs is so much faster (and cheaper!) then using frontier high intelligence labs, it's almost a different product.

Also, I put together a little research paper recently--I think there's probably an underexplored option of "Use frontier AR model for a little bit of planning then switch to diffusion for generating the rest." You can get really good improvements with diffusion models! https://estsauver.com/think-first-diffuse-fast.pdf

discuss

order

refulgentis|6 days ago

I'm very worried for both.

Cerebras requires a $3K/year membership to use APIs.

Groq's been dead for about 6 months, even pre-acquisition.

I hope Inception is going well, it's the only real democratic target at this. Gemini 2.5 Flash Lite was promising but it never really went anywhere, even by the standards of a Google preview

nl|6 days ago

Taalas is interesting. 16,000 TPS for Llama on a chip.

https://taalas.com/

freeqaz|6 days ago

You can call Cerebras APIs via OpenRouter if you specify them as the provider in your request fyi. It's a bit pricier but it exists!

ainch|6 days ago

I don't think it's a good comparison given Inception work on software and Cerebras/Groq work on hardware. If Inception demonstrate that diffusion LLMs work well at scale (at a reasonable price) then we can probably expect all the other frontier labs to copy them quickly, similarly to OpenAI's reasoning models.

7thpower|6 days ago

What do you mean by Grow is dead since about 6 months ago? Not refuting your point, but I’m curious.

estsauver|6 days ago

I am currently using their APIs on a paygo plan, I think it might just be a capacity issue for new sign ups.

Leynos|6 days ago

Cerebras are on OpenRouter.

behnamoh|6 days ago

Once again, it's a tech that Google created but never turned into a product. AFAIK in their demo last year, Google showed a special version of Gemini that used diffusion. They were so excited about it (on the stage) and I thought that's what they'd use in Google search and Gmail.