top | item 39484276

(no title)

> I have quite a lot of concurrency so I think my ideal hardware is a whole lot of little CPU cores with decent cache and matmul intrinsics

Back in 2015 I thought this would be the dominant model in 2022. I thought that the AI startups challenging Nvidia would be about that. Instead, they all targetted inference instead of programmability. I thought that a Tenstorrent hardware would be about what you are talking about - lots of tiny cores, local memory, message passing between them, AI/matmult intrinsics.

I've been hyped about Tenstorrent for a long time, but now that it is finally coming out with something, I can see that the Grayskulls are very overpriced. And if you look at the docs for their low-level kernel programming, you will see that Tensix cores can only have four registers, have no register spilling, and also don't support function calls. What would one be able to program with that?

It would have been interesting had the Grayskull cards been released in 2018. But in 2024 I have no idea what the company wants to do with them. It's over five years behind what I was expecting.

My expectations for how the AI hardware wave would unfold were fit for another world entirely. If this is the best the challengers can do, the most we can hope for is that they depress Nvidia's margins somewhat so we can buy its cards cheaper in the future. As we go towards the Singularity, I've gone from expecting revolutionary new hardware from AI startups to hoping Nvidia can keep making GPUs faster and more programmable.

Ironically, that latter thing is one trend that I missed, and going from Maxwell cards to the last generation, the GPUs have gained a lot in terms of how general purpose they are. The range of domains they can be used for is definitely going up as time goes on. I thought that AI chips would be necessary for this, and that GPUs would remain as toys, but it has been the other way around.

discuss

cjbgkagh|2 years ago

I wasn't as optimistic that there would be a broad adoption of some of the more advanced techniques I was working on so I did figure back in 2013 that most people would stick to the GEMMs and Convs with rather simple loss functions - I had a hard enough time explaining BPR triplet loss to people. Now with LLMs people will be doubling down on GEMMs for the foreseeable future.

My customers won't touch non-commodity hardware as they see it as a potential vector for vendors to screw them over, and they're not wrong about that. In a post apocalyptic they could just pull a graphics card out of a gaming computer to get things working again which gives them a strong feeling of security. Having very capable GPU cards as a commodity means I can re-use the same ops for my training and inference which roughly halves my workload.

My approach to hardware companies is that I'll believe it when I see it, I'll wait until something is publically available that I can buy off the shelf before looking too closely at it's architecture. NVidia with their Tensor Cores got so good so quickly that I never really looked too closely at alternatives. I'm kind of hopeful that AMD SoC would provide a good edge compute option so I might give that a go.

I had a look at tenstorrent given this article and the Grendel architecture seems interesting.

imtringued|2 years ago

Grayskull shipped in 2020 and each tensix cores has five RISC-V cores. Get your basic facts right before you complain. The dev kit is just that, a dev kit. Groq sells their dev kit for $20k even though a single LPU is useless.

abstractcontrol|2 years ago

> Groq sells their dev kit for $20k even though a single LPU is useless.

I find this a very questionable business decision.