top | item 46075522

(no title)

leo_e | 3 months ago

Impressive numbers on paper, but looking at their site, this feels dangerously close to vaporware.

The bottleneck for inference right now isn't just raw FLOPS or even memory bandwidth—it's the compiler stack. The graveyard of AI hardware startups is filled with chips that beat NVIDIA on specs but couldn't run a standard PyTorch graph without segfaulting or requiring six months of manual kernel tuning.

Until I see a dev board and a working graph compiler that accepts ONNX out of the box, this is just a very expensive CGI render.

discuss

mg|3 months ago

Six months of one developer tuning the kernel?

That seems like not much compared to the hundreds of billions of dollars US companies currently invest into their AI stack? OpenAI pays thousands of engineers and researchers full time.

SilverBirch|3 months ago

It is. The problem is latency. All these fields are moving very fast, and so it doesn't sound bad spending 6 months tuning something, but in reality what is happening is that during those 6 months the guy who built the thing you're tuning has iterated 5 more times and what you started on 6 months ago is now much much better than what you got handed 6 months ago whilst simultaneously being much worse than what that person has in their hands today. If the field you're working in is relatively static, or your performance gap is large enough it makes sense. But in most fields the performance gap is large in absolutely terms but small in temporal terms. You could make something run 10x faster, but you can't build something that will run faster than what will be state of the art in 2 months.

NaomiLehman|3 months ago

more like 100 developers for 2 years

IshKebab|3 months ago

This 100x. I used to work for one of those startups. You need something crazy like a 10x performance advantage to get people to switch from Nvidia to some here-today-gone-tomorrow startup with a custom compiler framework that requires field engineer support to get anything to run.

The outcome is that most of custom chips end up not being sold on the open market; instead their manufacturers run them themselves and sell LLM-as-a-service. E.g. Cerebras, Samba Nova, and you could count Google's TPUs there too.

vlovich123|3 months ago

Inference accelerators are not where Nvidia is maintaining their dominance afaik.

m00dy|3 months ago

very good point leo_e

indeed no mention of PyTorch in their website...honestly it looks a bit scammy as well