top | item 44093873

(no title)

1024bees | 9 months ago

There is a natural tension between developing an API that is nice to use and having a full fledged graph compiler. Most graph compilers, and the hardware that requires them will be complex and difficult to approach. The "original sin" was pytorch vs tensorflow -- tensorflow capturing the entire graph and then compiling it with XLA (or whatever it was before, I'm probably mixing up tf1 and tf2 here) was such an intractable mess to actually hack on (also the runtime had unapproachable complexity, from what I recall). This has probably changed, but pytorch won out because it was both nice to use and develop.

There are clear reasons why a hardware company would use a graph compiler -- they think such an approach is higher performance, and makes tenstorrent look better on price per dollar when compared to competitors (read: nvda).

There is some legitimate criticism of TT here, their hardware is composed or simple blocks that compose into a complex system (5 separate CPUs being programmed per tensix tile, many tiles per chip), and that complexity has to be wrangled in the software stack -- paying that complexity in hardware so there is less of a VLIW model in software might remove a few abstractions.

discuss

No comments yet.