top | item 41956213

Cerebras Trains Llama Models to Leap over GPUs

64 points| rbanffy | 1 year ago |nextplatform.com | reply

33 comments

order
[+] latchkey|1 year ago|reply

  1x MI300x has 192GB HBM3.

  1x MI325x has 256GB HBM3e.
They cost less, you can fit more into a rack and you can buy/deploy at least the 300's today and 325's early next year. AMD and library software performance for AI is improving daily [0].

I'm still trying to wrap my head around how these companies think they are going to do well in this market without more memory.

[0] https://blog.vllm.ai/2024/10/23/vllm-serving-amd.html

[+] krasin|1 year ago|reply
> I'm still trying to wrap my head around how these companies think they are going to do well in this market without more memory.

Cerebras and Groq provide the fastest (by an order of magnitude) inference. This is very useful for certain workflows, which require low-latency feedback: audio chat with LLM, robotics, etc.

Outside that narrow niche, AMD stuff seems to be the only contender to NVIDIA, at the moment.

[+] wmf|1 year ago|reply
Groq and Cerebras only make sense at massive scale which is why I guess they pivoted to being API providers so they can amortize the hardware over many customers.
[+] YetAnotherNick|1 year ago|reply
2x 80GB A100 is better in all the metrics than MI300x while being cheaper.
[+] arisAlexis|1 year ago|reply
The article explains in depth the issues with memory, did you read through ?
[+] 7e|1 year ago|reply
"So, the delta in price/performance between Cerebras and the Hoppers in the cloud when buying iron is 2.75X but for renting iron it is 5.2X, which seems to imply that Cerebras is taking a pretty big haircut when it rents out capacity. That kind of delta between renting out capacity and selling it is not a business model, it is a loss leader from a startup trying to make a point."

As always, it is about TCO, not who can make the biggest monster chip.

[+] asdf1145|1 year ago|reply
clickbait title: inference is not training
[+] mentalically|1 year ago|reply
The value proposition of Cerebras is that they can compile existing graphs to their hardware and allow inference at lower costs and higher efficiencies. The title does not say anything about creating or optimizing new architectures from scratch.
[+] 7e|1 year ago|reply
"It would be interesting to see what the delta in accuracy is for these benchmarks."

^ the entirety of it

[+] htrp|1 year ago|reply
Title is about training.... article about inference
[+] KTibow|1 year ago|reply
Why is nobody mentioning that there is no such thing as Llama 3.2 70B
[+] pk-protect-ai|1 year ago|reply
Wow, 44GB SRAM, not HBM3 or HBM3e, but actual SRAM ...
[+] asdf1145|1 year ago|reply
did they release MLPerf data yet or wouldn't help their IPO?