top | item 44098807

(no title)

mlazos | 9 months ago

I’m amazed this is even viewed as a “hot take” tbh most of what he said here is pretty high level of abstraction and standard practice for custom hardware. In essence I feel like he’s saying nothing really controversial other than publicly calling out TT for too many abstraction layers (and tbh it’s just in a readme). This is completely fine, he’s a user and this is his experience.

I’m a dev working on torch.compile at meta (previously I worked on ML focused FPGAs) and the approach I would use is build a static graph compiler, use torch.compile (and probably JAX) as graph extraction front-ends and call it a day. I feel like hardware companies don’t know how to handle the flexibility of PyTorch and as a result develop their own APIs which is mistake #1 and virtually makes it impossible to get any market penetration once you head down that path because nobody will ever ever rewrite their models for your hardware when they don’t even know what perf they will get, the risk is just too high. As a result, hardware companies offer inference APIs which hide all of this behind a REST API to basically paper over the lack of generality of the software/hardware interface. This is convenient because then nobody actually knows the perf/$ and they can burn VC money for as long as they want. Whether this is a viable business model or not, we will have to wait until they go public to actually see what their true inference costs are.

To sum it up, start from PyTorch and work your way down to your hardware, this is the only general way if you want to actually sell chips and not just constantly port the model of the day to your hardware.

discuss

No comments yet.