Yeah, there is a lot of advantage to having this machine because the CUDA stack is still king. My Two AMD GPUs are suffering when it comes to working with ROCm stack. I have forks of Ollama and VLLM that took many weekends to figure out.
It takes all the work out of it, you just start llama-server in the container context and you're off doing inference without having to figure out dependencies.
androiddrew|8 days ago
Zetaphor|8 days ago
https://github.com/kyuz0/amd-strix-halo-toolboxes
It takes all the work out of it, you just start llama-server in the container context and you're off doing inference without having to figure out dependencies.
EnPissant|8 days ago
androiddrew|8 days ago
Zetaphor|8 days ago