top | item 40191723

Ollama v0.1.33 with Llama 3, Phi 3, and Qwen 110B

192 points| ashvardanian | 1 year ago |github.com

64 comments

order
[+] jerrygenser|1 year ago|reply
I wonder if Ollama will or plans to have other "Supported backends" than llama.cpp. It's listed on the very last line of their readme as if the llama.cpp dependency is just incidental and a very minor detail rather than Ollama as a deployment mechanism for llama.cpp and gguf based models.
[+] jmorgan|1 year ago|reply
Yes, we are also looking at integrating MLX [1] which is optimized for Apple Silicon and built by an amazing team of individuals, a few of which were behind the original Torch [2] project. There's also TensorRT-LLM [3] by Nvidia optimized for their recent hardware.

All of this of course acknowledging that llama.cpp is an incredible project with competitive performance and support for almost any platform.

[1] https://github.com/ml-explore/mlx

[2] https://en.wikipedia.org/wiki/Torch_(machine_learning)

[3] https://github.com/NVIDIA/TensorRT-LLM

[+] sdesol|1 year ago|reply
I don't think they will move away from llama.cpp until they are forced to. The number of people contributing to llama.cpp is quite significant [1] and it wouldn't make sense to use another backend given how quickly llama.cpp is iterating and growing.

[1] https://devboard.gitsense.com/ggerganov?r=ggerganov%2Fllama....

Full disclosure: This is my tool

[+] sh79|1 year ago|reply
Their behaviour around llama.cpp acknowledgement is very shady. Until the very recent, there was no mention of llama.cpp in their README at all and now it's tucked away all the way down. Compare that to the originally proposed PR for example: https://github.com/ollama/ollama/pull/3700
[+] bigfudge|1 year ago|reply
Ollama is great. I actually wish they would wrap OpenAI and Azure and generally act as as a proxy for third party APIs. Having a consistent, well thought out API which isn't tied to a single provider would be really good for the community.

Edit: this would be useful because in many cases some workloads can be local, but others cannot... e.g. if you really need gpt4 for specific queries.

[+] Cheer2171|1 year ago|reply
It is open source, so if you want to see this in ollama, pull requests are welcome. :)
[+] wiktor-k|1 year ago|reply
Ollama is simply great! I was quite surprised how easy it is to integrate through their API. A simple chat using Ollama + llama3 is less than 40 lines of TypeScript: https://github.com/wiktor-k/llama-chat
[+] oulipo|1 year ago|reply
Nice! Would there be a way to do that streaming, with streaming voice input too?
[+] anotherpaulg|1 year ago|reply
I actually just benchmarked Llama3 70B coding with aider, and it did quite well. It scored similar to GPT 3.5.

You can use Llama3 70B with aider via Ollama [0]. It's also available for free via Groq [1] (with rate limits). And OpenRouter has it available [2] for low cost on their paid api.

[0] https://aider.chat/docs/llms.html#ollama

[1] https://aider.chat/docs/llms.html#groq

[2] https://aider.chat/docs/llms.html#openrouter

[+] typpo|1 year ago|reply
Paul's benchmarks are excellent and they're the first thing I look for to get a sense of a new model performance :)

For those looking to create their own benchmarks, promptfoo[0] is one way to do this locally:

  prompts:
    - "Write this in Python 3: {{ask}}"
  
  providers:
    - ollama:chat:llama3:8b
    - ollama:chat:phi3
    - ollama:chat:qwen:7b
    
  tests:
    - vars:
        ask: a function to determine if a number is prime
    - vars:
        ask: a function to split a restaurant bill given individual contributions and shared items
Jumping in because I'm a big believer in (1) local LLMs, and (2) evals specific to individual use cases.

[0] https://github.com/typpo/promptfoo

[+] stephen37|1 year ago|reply
I love working with Ollama, I was really surprised at how easy it is to build a simple RAG system with it. For example: https://github.com/stephen37/ollama_local_rag
[+] addandsubtract|1 year ago|reply
Nice, I've been looking out for something like this! What's Jina AI and how is it local if I need an API key for it? Also, this is the first time I'm hearing about poetry. Might be worth including in the prerequisites (unless I can just stick with pip?)
[+] yjftsjthsd-h|1 year ago|reply
[Why] do models require a new version? It can already take arbitrary gguf; I assumed they just had a registry online
[+] ynniv|1 year ago|reply
They do, and I was using the "new" models before the update. Perhaps there is tuning or bug fixes for them? Or they just want to confirm that these are supported. There are some new models that do have different architectures, so sometimes an update is necessary.
[+] FieryTransition|1 year ago|reply
Because the way they are quantized takes time to get bug-free when new architectures are released. If a model was quantized with a known bug in the quantizer, then it effectively makes those quantized versions buggy and they need to be requantized with a new version of llamacpp which has this fixed.
[+] Sammi|1 year ago|reply
Is there a copilot-like autocomplete vscode plugin that uses Ollama?
[+] prometheon1|1 year ago|reply
Yes, the continue.dev plugin can use ollama as backend
[+] arvinsim|1 year ago|reply
Download deepsync coder model via Ollama and connect to the model using CodeGPT plugin?
[+] thedatamonger|1 year ago|reply
this looks very awesome. can someone tell me why there is no chatter about this? is there something else out there that blows this out of the water in terms of ease of use and access to sample many LLM's ?
[+] brrrrrm|1 year ago|reply
HN isnt really the best space for LLM news - r/LocalLlama and twitter are much better. I think HN has some cultural issues with “AI” news
[+] chadsix|1 year ago|reply
Ollama is really organized - it relies on llama but the UX and organization it provides makes it legit. We recently made a one-click wizard to run Open WebUI and Ollama together, self hosted and remotely accessible but locally hosted [1]

[1] https://github.com/ipv6rslimited/cloudseeder

[+] gertop|1 year ago|reply
LM Studio is a lot more user friendly, probably the easiest UI to use out there. No terminal nonsense, no manual to read. Just double click and chat. It even explains to you what the model names mean (eg diff between Q4_1 Q4_K Q4_K_M... For whatever reason all the other tools assume you know what it means).

Built-in model recommendations are also handy.

Very friendly tool!

However it's not open-source.

[+] Cheer2171|1 year ago|reply
Why do you think there is no chatter about this? There have been hundreds of posts about ollama on HN. This is a point release of an already well known project.
[+] FieryTransition|1 year ago|reply
I use a mix of using llamacpp directly via my own python bindings and using it via llamacpp-python for function calling and full control over parameters and loading, but otherwise ollama is just great for ease of use. There's really not a reason not to use it, if just want to load gguf models and don't have any intricate requirements.
[+] CharlesW|1 year ago|reply
I can recommend LM Studio and Msty if you're looking for something with an integrated UX.
[+] perrygeo|1 year ago|reply
Opposite reaction here. I was just thinking, man I hear about Ollama every single day on HN. Not sure a point release is news :-)
[+] throw03172019|1 year ago|reply
Lola a has been brought up many times on HN. It’s a great tool!