However from experience with an AMD Strix Halo, a couple of caveats: it's drastically slower than Ollama (tested over a few weeks, always using the official AMD vLLM nightly releases), and not all GPUs were supported for all models (but that has been fixed).
mappu|1 month ago
(PyTorch does also support ROCm generally, it shows up as a CUDA device.)
ikari_pl|1 month ago
sofixa|1 month ago
However from experience with an AMD Strix Halo, a couple of caveats: it's drastically slower than Ollama (tested over a few weeks, always using the official AMD vLLM nightly releases), and not all GPUs were supported for all models (but that has been fixed).
bildung|1 month ago
If you want more performance, you could try running llama.cpp directly or use the prebuilt lemonade nightlies.