top | item 46612042

(no title)

Now all we need is better support for AMD gpus, both CDNA and RDNA types

discuss

mappu|1 month ago

ZLUDA implements CUDA on top of AMD ROCm - they are explicitly targetting vLLM as their PyTorch compatibility test: https://vosen.github.io/ZLUDA/blog/zluda-update-q4-2025/#pyt...

(PyTorch does also support ROCm generally, it shows up as a CUDA device.)

ikari_pl|1 month ago

I feel like these technologies are named by the Polish at the companies. "CUDA" means "WONDERS" and "ZŁUDA" would be an "ILLUSION".

sofixa|1 month ago

You can run vLLM with AMD GPUs supported by ROCm: https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/infer...

However from experience with an AMD Strix Halo, a couple of caveats: it's drastically slower than Ollama (tested over a few weeks, always using the official AMD vLLM nightly releases), and not all GPUs were supported for all models (but that has been fixed).

bildung|1 month ago

vLLM ususally only plays out its strength when serving multiple users in parallel, in contrast to llama.cpp (Ollama is a wrapper around llama.cpp).

If you want more performance, you could try running llama.cpp directly or use the prebuilt lemonade nightlies.