top | item 44804225 (no title) steren | 6 months ago > I would never want to use something like ollama in a production setting.We benchmarked vLLM and Ollama on both startup time and tokens per seconds. Ollama comes at the top. We hope to be able to publish these results soon. discuss order hn newest ekianjo|6 months ago you need to benchmark against llama.cpp as well. apitman|6 months ago Did you test multi-user cases? jasonjmcghee|6 months ago Assuming this is equivalent to parallel sessions, I would hope so, this is like the entire point of vLLM sbinnee|6 months ago vllm and ollama assume different settings and hardware. Vllm backed by the paged attention expect a lot of requests from multiple users whereas ollama is usually for single user on a local machine.
apitman|6 months ago Did you test multi-user cases? jasonjmcghee|6 months ago Assuming this is equivalent to parallel sessions, I would hope so, this is like the entire point of vLLM
jasonjmcghee|6 months ago Assuming this is equivalent to parallel sessions, I would hope so, this is like the entire point of vLLM
sbinnee|6 months ago vllm and ollama assume different settings and hardware. Vllm backed by the paged attention expect a lot of requests from multiple users whereas ollama is usually for single user on a local machine.
ekianjo|6 months ago
apitman|6 months ago
jasonjmcghee|6 months ago
sbinnee|6 months ago