top | item 36224322

(no title)

ikhatri | 2 years ago

You can use the onnx cpu runtime in python or c++ too. It doesn’t have to be rust. And if you want GPU support you can even run models saved in the onnx format on Nvidia GPUs with the TensorRT runtime.

Honestly while ggml is super cool. It started as a hobby project and you probably shouldn’t use it in production. ONNX has been the defacto standard for ML inference for years. What it is missing (compared to ggml) is 2-6bit inference which is helpful for large scale transformers on edge devices (and is what helped ggml gain adoption so fast).

discuss

order

touisteur|2 years ago

Intel OpenVINO is also quite punchy for CPU inference.

ikhatri|2 years ago

Yeah I've heard of it but never used it. Looks like they have a backend/runtime for ONNX models as well (https://pypi.org/project/onnxruntime-openvino/) neat!

ONNX really is the universal format. If you can get your model exported to ONNX, running it on various platforms becomes much easier.*

*as long as every hardware platform supports the ops you use in your network and you're not doing anything too fancy/custom :P