top | item 41993932

(no title)

milansuk | 1 year ago

No need to use Ollama. LLama.cpp has its own OpenAI-compatible server[0] and it works great.

[0] https://github.com/ggerganov/llama.cpp#web-server

discuss

Thanks didn't know that.

Do you happen to know the reason to use ollama rather than the built in server? How much work is required to get similar functionality? looks like just downloading the models? I find it odd that ollama took off so quickly if LLamma.cpp had the same built in functionality.

PhilippGille|1 year ago

Yes I'm aware. I was contrasting the general use of an inference server vs calling llama.cpp directly (not via HTTP request).

And among servers Ollama seems to be more popular, so it's worth mentioning when talking about support for local LLMs.