(no title)
SteveJS | 5 months ago
And it is a developer feature hidden from end users. e.g. - In your ollama example, does the developer ask end users to install ollama? Does the dev redistribute ollama and keep it updated?
The ONNX format is pretty much a boring de-facto standard for ML model exchange. It is under the linux foundation.
The ONNX Runtime is a microsoft thing, but it is an MIT licensed runtime for cross language use and cross OS/HW platform deployment of ML models in the ONNX format.
That bit needs to support everything because Microsoft itself ships software on everything.(Mac/linux/iOS/Android/Windows.
ORT — https://onnxruntime.ai
Here is the Windows ML part of this —https://learn.microsoft.com/en-us/windows/ai/new-windows-ml/...
The primary value claims for Windows ML (for a developer using it)— This eliminates the need to: Bundle execution providers for specific hardware vendors
Create separate app builds for different execution providers
Handle execution provider updates manually.
Since ‘EP’ is ultra-super-techno-jargon:
Here is what GPT-5 provides:
Intensional (what an EP is)
In ONNX Runtime, an Execution Provider (EP) is a pluggable backend that advertises which ops/kernels it can run and supplies the optimized implementations, memory allocators, and (optionally) graph rewrites for a specific target (CPU, CUDA/TensorRT, Core ML, OpenVINO, etc.). ONNX Runtime then partitions your model graph and assigns each partition to the highest-priority EP that claims it; anything unsupported falls back (by default) to the CPU EP.
Extensional (how you use them) • You pick/priority-order EPs per session; ORT maps graph pieces accordingly and falls back as needed. • Each EP has its own options (e.g., TensorRT workspace size, OpenVINO device string, QNN context cache). • Common EPs: CPU, CUDA, TensorRT (NVIDIA), DirectML (Windows), Core ML (Apple), NNAPI (Android), OpenVINO (Intel), ROCm (AMD), QNN (Qualcomm).
No comments yet.