top | item 39501743

(no title)

What is the point in using Ollama over huggingface if you use Python? Also, REST endpoints can be provided with huggingface transformer.

Here with Go, it seems to make sense to use an abstraction. Though don't you lose a lot of flexibility?

discuss

lolinder|2 years ago

Ollama is great for running a separate LLM inference server with very little setup required. This can be useful for a number of situations. Here are a few:

* You have a machine with a GPU that can do the inference but you need to run the application code on a smaller device. In my case, that's a Raspberry Pi with a touchscreen.

* You have multiple applications that all need to use LLM inference for different purposes. Loading the models exactly once is more efficient than loading them with huggingface for each application.

* Your python script is ephemeral and run on demand. You don't want to load the model each execution because it's slow, so you need a daemon. Ollama can serve that role without you having to write anything other than the script.

If you're writing Python and only need one application that's going to run on a powerful machine, you're probably better off running the models directly, but I'd venture a guess that that's a minority case.

As for flexibility, there are probably some things that are easier to do when you're running the inference yourself, but Ollama's API is powerful enough for most use cases.

rybosome|2 years ago

> Your python script is ephemeral and run on demand. You don't want to load the model each execution because it's slow, so you need a daemon. Ollama can serve that role without you having to write anything other than the script.

This is exactly my use case for it; I invoke a Python binary which uses the Ollama API and get a model response within seconds because it’s already resident in memory.