top | item 36686298

(no title)

cygn | 2 years ago

You can just use Triton which is basically TFserve for Tensorflow, Pytorch, Onnx and more.

discuss

Can you explain that?

My understand of Triton is more that this is an alternative to CUDA, but instead you write it directly in Python, and on a slightly higher-level, and it does a lot of optimizations automatically. So basically: Python -> Triton-IR -> LLVM-IR -> PTX.

https://openai.com/research/triton

chillee|2 years ago

It's confusing, there's OpenAI Triton (what you're thinking of) and Nvidia Triton server (a different thing).