top | item 44804225

(no title)

steren | 6 months ago

> I would never want to use something like ollama in a production setting.

We benchmarked vLLM and Ollama on both startup time and tokens per seconds. Ollama comes at the top. We hope to be able to publish these results soon.

discuss

ekianjo|6 months ago

you need to benchmark against llama.cpp as well.

apitman|6 months ago

Did you test multi-user cases?

jasonjmcghee|6 months ago

Assuming this is equivalent to parallel sessions, I would hope so, this is like the entire point of vLLM

sbinnee|6 months ago

vllm and ollama assume different settings and hardware. Vllm backed by the paged attention expect a lot of requests from multiple users whereas ollama is usually for single user on a local machine.