top | item 40773442

(no title)

irfn | 1 year ago

Is there a performance comparison with Ollama. Do both use llama.cpp for serving?

discuss

order

anagri|1 year ago

@irfn - that's an interesting idea. will definitely try to create benchmark using my local M2 machine and llama3-7b, just for comparison.

yes, ollama and Bodhi App both use llama.cpp but our approaches are different. Ollama embeds a binary within its binary, that it copies to a tmp folder and runs this webserver. any request that comes to ollama is then forwarded to this server, and replied sent back to the client.

Bodhi embeds the llama.cpp server, so there is no tmp binary that is copied. when a request comes to Bodhi App, it invokes the code in llama.cpp and sends the response back to client. So there is no request hopping.

Hope that approach do provide us with some benefits.

Also Bodhi uses Rust as programming language. IMHO rust have excellent interface with C/C++ libraries, so the C-code is invoked using the C-FFI bridge. And given Rust's memory safety, fearless concurrency and zero cost abstractions, should definitely provide some performance benefit to Bodhi's approach.

Will get back to you once I have results for these benchmarks. Thanks for the idea.

Hope you try Bodhi, and have some equally valuable feedback on the app.

Cheers.