I wonder if Ollama will or plans to have other "Supported backends" than llama.cpp. It's listed on the very last line of their readme as if the llama.cpp dependency is just incidental and a very minor detail rather than Ollama as a deployment mechanism for llama.cpp and gguf based models.
Yes, we are also looking at integrating MLX [1] which is optimized for Apple Silicon and built by an amazing team of individuals, a few of which were behind the original Torch [2] project. There's also TensorRT-LLM [3] by Nvidia optimized for their recent hardware.
All of this of course acknowledging that llama.cpp is an incredible project with competitive performance and support for almost any platform.
I don't think they will move away from llama.cpp until they are forced to. The number of people contributing to llama.cpp is quite significant [1] and it wouldn't make sense to use another backend given how quickly llama.cpp is iterating and growing.
Their behaviour around llama.cpp acknowledgement is very shady. Until the very recent, there was no mention of llama.cpp in their README at all and now it's tucked away all the way down. Compare that to the originally proposed PR for example: https://github.com/ollama/ollama/pull/3700
Ollama is great. I actually wish they would wrap OpenAI and Azure and generally act as as a proxy for third party APIs. Having a consistent, well thought out API which isn't tied to a single provider would be really good for the community.
Edit: this would be useful because in many cases some workloads can be local, but others cannot... e.g. if you really need gpt4 for specific queries.
Ollama is simply great! I was quite surprised how easy it is to integrate through their API. A simple chat using Ollama + llama3 is less than 40 lines of TypeScript: https://github.com/wiktor-k/llama-chat
I actually just benchmarked Llama3 70B coding with aider, and it did quite well. It scored similar to GPT 3.5.
You can use Llama3 70B with aider via Ollama [0]. It's also available for free via Groq [1] (with rate limits). And OpenRouter has it available [2] for low cost on their paid api.
Paul's benchmarks are excellent and they're the first thing I look for to get a sense of a new model performance :)
For those looking to create their own benchmarks, promptfoo[0] is one way to do this locally:
prompts:
- "Write this in Python 3: {{ask}}"
providers:
- ollama:chat:llama3:8b
- ollama:chat:phi3
- ollama:chat:qwen:7b
tests:
- vars:
ask: a function to determine if a number is prime
- vars:
ask: a function to split a restaurant bill given individual contributions and shared items
Jumping in because I'm a big believer in (1) local LLMs, and (2) evals specific to individual use cases.
Nice, I've been looking out for something like this! What's Jina AI and how is it local if I need an API key for it? Also, this is the first time I'm hearing about poetry. Might be worth including in the prerequisites (unless I can just stick with pip?)
They do, and I was using the "new" models before the update. Perhaps there is tuning or bug fixes for them? Or they just want to confirm that these are supported. There are some new models that do have different architectures, so sometimes an update is necessary.
Because the way they are quantized takes time to get bug-free when new architectures are released. If a model was quantized with a known bug in the quantizer, then it effectively makes those quantized versions buggy and they need to be requantized with a new version of llamacpp which has this fixed.
this looks very awesome. can someone tell me why there is no chatter about this? is there something else out there that blows this out of the water in terms of ease of use and access to sample many LLM's ?
Ollama is really organized - it relies on llama but the UX and organization it provides makes it legit. We recently made a one-click wizard to run Open WebUI and Ollama together, self hosted and remotely accessible but locally hosted [1]
LM Studio is a lot more user friendly, probably the easiest UI to use out there. No terminal nonsense, no manual to read. Just double click and chat. It even explains to you what the model names mean (eg diff between Q4_1 Q4_K Q4_K_M... For whatever reason all the other tools assume you know what it means).
Why do you think there is no chatter about this? There have been hundreds of posts about ollama on HN. This is a point release of an already well known project.
I use a mix of using llamacpp directly via my own python bindings and using it via llamacpp-python for function calling and full control over parameters and loading, but otherwise ollama is just great for ease of use. There's really not a reason not to use it, if just want to load gguf models and don't have any intricate requirements.
[+] [-] jerrygenser|1 year ago|reply
[+] [-] jmorgan|1 year ago|reply
All of this of course acknowledging that llama.cpp is an incredible project with competitive performance and support for almost any platform.
[1] https://github.com/ml-explore/mlx
[2] https://en.wikipedia.org/wiki/Torch_(machine_learning)
[3] https://github.com/NVIDIA/TensorRT-LLM
[+] [-] sdesol|1 year ago|reply
[1] https://devboard.gitsense.com/ggerganov?r=ggerganov%2Fllama....
Full disclosure: This is my tool
[+] [-] sh79|1 year ago|reply
[+] [-] bigfudge|1 year ago|reply
Edit: this would be useful because in many cases some workloads can be local, but others cannot... e.g. if you really need gpt4 for specific queries.
[+] [-] Cheer2171|1 year ago|reply
[+] [-] wiktor-k|1 year ago|reply
[+] [-] oulipo|1 year ago|reply
[+] [-] anotherpaulg|1 year ago|reply
You can use Llama3 70B with aider via Ollama [0]. It's also available for free via Groq [1] (with rate limits). And OpenRouter has it available [2] for low cost on their paid api.
[0] https://aider.chat/docs/llms.html#ollama
[1] https://aider.chat/docs/llms.html#groq
[2] https://aider.chat/docs/llms.html#openrouter
[+] [-] typpo|1 year ago|reply
For those looking to create their own benchmarks, promptfoo[0] is one way to do this locally:
Jumping in because I'm a big believer in (1) local LLMs, and (2) evals specific to individual use cases.[0] https://github.com/typpo/promptfoo
[+] [-] jkh1|1 year ago|reply
[+] [-] stephen37|1 year ago|reply
[+] [-] addandsubtract|1 year ago|reply
[+] [-] yjftsjthsd-h|1 year ago|reply
[+] [-] ynniv|1 year ago|reply
[+] [-] FieryTransition|1 year ago|reply
[+] [-] Sammi|1 year ago|reply
[+] [-] prometheon1|1 year ago|reply
[+] [-] arvinsim|1 year ago|reply
[+] [-] thedatamonger|1 year ago|reply
[+] [-] brrrrrm|1 year ago|reply
[+] [-] chadsix|1 year ago|reply
[1] https://github.com/ipv6rslimited/cloudseeder
[+] [-] gertop|1 year ago|reply
Built-in model recommendations are also handy.
Very friendly tool!
However it's not open-source.
[+] [-] Cheer2171|1 year ago|reply
[+] [-] FieryTransition|1 year ago|reply
[+] [-] CharlesW|1 year ago|reply
[+] [-] perrygeo|1 year ago|reply
[+] [-] throw03172019|1 year ago|reply