top | item 36686298

(no title)

cygn | 2 years ago

You can just use Triton which is basically TFserve for Tensorflow, Pytorch, Onnx and more.

discuss

order

albertzeyer|2 years ago

Can you explain that?

My understand of Triton is more that this is an alternative to CUDA, but instead you write it directly in Python, and on a slightly higher-level, and it does a lot of optimizations automatically. So basically: Python -> Triton-IR -> LLVM-IR -> PTX.

https://openai.com/research/triton

chillee|2 years ago

It's confusing, there's OpenAI Triton (what you're thinking of) and Nvidia Triton server (a different thing).