top | item 45853975

(no title)

cschneid | 3 months ago

so apparently they have custom hardware that is basically absolutely gigantic chips - across the scale of a whole wafer at a time. Presumably they keep the entire model right on chip, in effectively L3 cache or whatever. So the memory bandwidth is absurdly fast, allowing very fast inference.

It's more expensive to get the same raw compute as a cluster of nvidia chips, but they don't have the same peak throughput.

As far as price as a coder, I am giving a month of the $50 plan a shot. I haven't figured out how to adapt my workflow yet to faster speeds (also learning and setting up opencode).

discuss

order

bigyabai|3 months ago

For $50/month, it's a non-starter. I hope they can find a way to use all this excess bandwidth to put out a $10 equivalent to Claude Code instead of a 1000 tok/s party trick I can't use properly.

typpilol|3 months ago

I feel the same and it's also why I can't understand all these people using small local models.

Every local model I've used and even most open source are just not good

wyre|3 months ago

Cerebras offers pay-per-token. What are you asking for? Claude Code starts at $100, or $15/mtok. Cerebras is already much cheaper, but you want it to be even cheaper at $10?

xadhominemx|3 months ago

$600 per year is a trivial cost for a professional tool