top | item 35078743

(no title)

regecks | 3 years ago

ELI5 OpenXLA vs TensorRT? Are they solving the same problem, just that the former is not married to NVIDIA devices?

discuss

They're solving the same "high level problem", but with very different approaches.

TensorRT is proprietary to Nvidia and Nvidia hardware. You'd take a {PyTorch, Tensorflow, <insert some other ML framework>} model and "export / convert" it into essentially a binary. Assuming all goes well (and in practice rarely does, at least on first try - more on this later), you now automatically leverage other Nvidia card features such as Tensor cores and can serve a model that runs significantly faster.

The problem is TensorRT being exclusive to Nvidia. The APIs for doing more advanced ML techniques like deep learning optimization requires significant lock-in to their APIs, if they are even available in the first place. And all these assuming they work as documented.

OpenXLA (and other players in the ecosystem like TVM) aim to "democratize" this so there are more support both upstream (# of supported ML frameworks) and downstream (# of hardware accelerators other than Nvidia). It's yet another layer or two that ML compiler engineers will need to stitch together, but once implemented, they in theory can do a lot of optimization techniques largely independent of the hardware targets underneath.

Note that further down in the article they mention other compiler frameworks like MLIR. You can then hypothetically lower (compiler terminology) it to a TensorRT MLIR dialect that then in turn runs on the Nvidia GPU.

lairv|3 years ago

I still don't fully grasp what XLA is, where does XLA sits against CUDA, ROCm, OpenVino ? Against ONNX/ONNX-Runtime ? Against OpenAI Triton ?

mathisfun123|3 years ago

basically all correct but

>You can then hypothetically lower (compiler terminology) it to a TensorRT MLIR dialect that then in turn runs on the Nvidia GPU.

there's no tensorrt dialect (there are nvgpu and nvvm dialects) nor would there be as tensorrt is primarily a runtime (although arguably dialects like omp and spirv basically model runtime calls).

BlamSthzusa|3 years ago

OpenXLA is an open-source library for accelerating linear algebra computations on a variety of hardware platforms, while TensorRT is a proprietary library from NVIDIA that's specifically designed for optimizing neural network inference performance on NVIDIA GPUs.

mathisfun123|3 years ago

openxla is a ML-ish compiler ecosystem built primarily around mlir that can target (through nvptx backend in llvm) and run on nvidia devices (on iree). tensorrt is a runtime for cuda programs. certainly they have features in common as a reflection of their common goals ("fast nn program training/inference") but the scope of tensorrt is much narrower.