we have vllm in certin production instances, it is a pain for most non-nvidia related architectures. A bit of digging around and we realized that most of it is just a wrapper on top of pytorch function calls. If we can do away with batch processing with vllm supports, we can be good, this is what we did here.
fazkan|1 year ago
dhruvdh|1 year ago
Also, there is a Dockerfile.rocm at the root of vLLM's repo. How is it a pain?