top | item 45602023

(no title)

qwe----3 | 4 months ago

Just a paste of llama.cpp without attribution.

discuss

order

speedgoose|4 months ago

Ollama is more than a paste. But the support for GLM 4.6 is indeed coming from llama.cpp: https://github.com/ollama/ollama/issues/12505#issuecomment-3...

I don’t know how much Ollama contributes to llama.cpp

CaptainOfCoit|4 months ago

> I don’t know how much Ollama contributes to llama.cpp

If nothing else, Ollama is free publicity for llama.cpp, at least when they acknowledge they're mostly using the work of llama.cpp, which has happened at least once! I found llama.cpp by first finding Ollama and then figured I'd rather avoid the lock-in of Ollama's registry, so ended up using llama.cpp for everything.

am17an|4 months ago

The answer is 0

swyx|4 months ago

i mean they have attributed but also it's open source software, i guess the more meaningful question is why didn't ggerganov build Ollama if it was that easy? or what is his company working on now?

monkmartinez|4 months ago

I can not answer for GG, but the early days of llama.cpp were crazy and everything was so very hacky. Remember, Textgen-webui was 'the way' to run models at first because it supported so many different quant types and file extensions. At the time, most people were using multiple different quantization methods and it was really hard to figure out which were performing better or worse objectively.

GGUF/GGML was like the 4th iteration of file type quantization from llama.cpp and I remember that I had to consciously begin watching the bandwidth usage from my ISP. Up to that point, I had never received an email warning me about reaching limits of my 2TB connection. All for the same models just in different forms. TheBloke was pumping out models like he had unlimited time/effort.

I say all that to say, llama.cpp was still trying, dare I say 'inventing', all the things throughout these transitions. Ollama comes in to make the running part easier and less CLI flag dependent building off of llama.cpp. Awesome.

GG and company are down in the trenches of the models architecture with CUDA, Vulkan, CPU, ROCm, etc. They are working on perplexity, token processing/generation and just look at the 'bin' folder when you compile the project. There are so many different aspects to make the whole thing work as well at it does. It's amazing that we have llama-server at all with the amount of work that has gone into making llama.cpp.

All that to say, Ollama shit the bed on attribution. They were called out on r/localllama very early on for not really giving credit to llama.cpp. They have a soiled reputation with the people that participate in that sub-reddit at least. They were called out for not contributing back if I remember correctly as well, which further stained their reputation among the folks who hang in that sub-reddit.

So it's not a matter of "ease" to build what Ollama built... At least from the perspective of someone who has been paying close attention from r/localllama; the problem was/is simply the perception (right or wrong) of the meme; Person 2 to person 1: "You built this?" -> Person 2: takes item/thing -> person 2: Holds up item/thing -> "I built this". A simple act that really pissed off the community in general.

homarp|4 months ago

>what is gg working on

supporting models so ollama can then 'support' them too

if you use llama.cpp server, it's quite a nice experience. you can even directly download stuff from Huggingface.