(no title)
samsartor | 9 months ago
What I'm saying is, tensorrent couldn't find a more excitable third-party developer if they grew one in a lab. And you know what? I can't make heads or tails out of all their various abstractions. I've tried! I've read the docs, I've read the examples, I've gone to meetups. I think OP is right that "one more abstraction bro" probably doesn't solve the problem.
At a guess, the problem isn't a technical one, it is an organizational one. They don't have anybody to stand in for me, or devs like me (eg dumb people). There is no product leadership on the API design. Just a lot of really brilliant engineers obsessively tuning for their own usecases, unwilling to ever trade-off a hit in performance or expressivity for readability or writeability.
liaopeiyuan|9 months ago
I don't think anyone is seriously training an NN on TT hardware at the moment and I think that's an issue. I think tinygrad works not only because geohot is one hell of an engineer but also because comma dogfoods it. TT's engineers are absolutely brilliant (from reading their commits) but I think they are stretched too thin. Bounties are not gonna work - you can't expect an outsider with no internal access/bandwidth/knowledge to suddenly make e.g. Mixtral work as the issue spans at least across tt-xla/tt-mlir. And to agree with ^ training is a kind of artifact where good CX can only be derived from strong leadership and a leaner view of the stack. NVIDIA accumulated that over the decades and the rest are trying to catch up by aggressive hiring (not to say that hiring is necessary). e.g. Annapurna has a presence on the CMU campus when I was there and has the Anthropic team to test it out.
I'm an incredibly excited third-party developer as I think the pitch appeals a lot to grad students (who do model research) who need to run small experiments within the 13B range and reasonably scale them up to draw the first half of the scaling curve.
I lose too much productivity to abstractions and incomplete e2e support in TT's current shape. I'd love to give it another go in 6 months.
1024bees|9 months ago
There are clear reasons why a hardware company would use a graph compiler -- they think such an approach is higher performance, and makes tenstorrent look better on price per dollar when compared to competitors (read: nvda).
There is some legitimate criticism of TT here, their hardware is composed or simple blocks that compose into a complex system (5 separate CPUs being programmed per tensix tile, many tiles per chip), and that complexity has to be wrangled in the software stack -- paying that complexity in hardware so there is less of a VLIW model in software might remove a few abstractions.