top | item 39434144

(no title)

The Groq demo was indeed impressive. I work with LLM alot in work, and a generation speed of 500+ tokens/s would definitely change how we use these products. (Especially considering it's an early-stage product)

But the "completely novel silicon architecture" and the "self-developed LPU" (claiming not to use GPUs)... makes me bit skeptical. After all, pure speed might be achievable through stacking computational power and model quantization. Shouldn't innovation at the GPU level be quite challenging, especially to achieve such groundbreaking speeds?

discuss

avivweinstein|2 years ago

I work at Groq. We arent using GPUs at all. This is a novel hardware architecture of ours that allows this high throughput and latency. Nothing sketchy about it.

Jensson|2 years ago

> Shouldn't innovation at the GPU level be quite challenging, especially to achieve such groundbreaking speeds?

GPUs are general purpose, a for purpose built chip that is better isn't that hard to make at all. Google didn't have to work hard at all to invent TPUs which is that idea as well, they said their first tests proved the idea worked so it didn't require anything near Nvidias scale or expertise.

ggnore7452|2 years ago

more on the LPU and data center: https://wow.groq.com/lpu-inference-engine/

price and speed benchmark: https://wow.groq.com/