top | item 43344674

(no title)

l33tman | 11 months ago

For someone jumping back on the local LLM train after having been out for 2 years, what is the current best local web-server solution to host this for myself on a GPU (RTX3080) Linux server? Preferably with support for the multimodal image input and LaTeX rendering on the output..

I don't really care about insanely "full kitchen sink" things that feature 100 plugins to all existing cloud AI services etc. Just running the released models the way they are intended on a web server...

discuss

flipflipper|11 months ago

Ollama + open web-ui in a container

https://ollama.com/

https://github.com/open-webui/open-webui

lastLinkedList|11 months ago

preemptively adding for us AMD users - it’s pretty seamless to get Ollama working with rocm, and if you have a card that’s a bit below the waterline (lowest supported is a 6800xt, i bought a 6750xt), you can use a community patch that will enable it for your card anyway:

https://github.com/likelovewant/ollama-for-amd/wiki#demo-rel...

I specifically recommend the method where you grab the patched rocblas.dll for your card model, and replace the one that Ollama is using, as someone who is technical but isn’t proficient with building from source (yet!)

dunb|11 months ago

What's the benefit of the container over installing as a tool with uv? It seems like extra work to get it up and running with a GPU, and if you're using a Mac, the container slows down your models.

rahimnathwani|11 months ago

For that GPU the best Gemma 3 model you'll be able to run (with GPU-only inference) is 4-bit quantized 12b parameter model: https://ollama.com/library/gemma3:12b

You could use CPU for some of the layers, and use the 4-bit 27b model, but inference would be much slower.

genewitch|11 months ago

LM studio in API mode, then literally any frontend that talks openAI api.

Or, just use the LM studio front end, it's better than anything I've used for desktop use.

I get 35t/s gemma 15b Q8 - you'll need a smaller one, probably gemma 3 15b q4k_l. I have a 3090, that's why.

mfro|11 months ago

Librechat + ollama is the best I have tried. Fairly simple setup if you can grok yaml config.