(no title)
l33tman | 11 months ago
I don't really care about insanely "full kitchen sink" things that feature 100 plugins to all existing cloud AI services etc. Just running the released models the way they are intended on a web server...
l33tman | 11 months ago
I don't really care about insanely "full kitchen sink" things that feature 100 plugins to all existing cloud AI services etc. Just running the released models the way they are intended on a web server...
flipflipper|11 months ago
https://ollama.com/
https://github.com/open-webui/open-webui
lastLinkedList|11 months ago
https://github.com/likelovewant/ollama-for-amd/wiki#demo-rel...
I specifically recommend the method where you grab the patched rocblas.dll for your card model, and replace the one that Ollama is using, as someone who is technical but isn’t proficient with building from source (yet!)
dunb|11 months ago
rahimnathwani|11 months ago
You could use CPU for some of the layers, and use the 4-bit 27b model, but inference would be much slower.
genewitch|11 months ago
Or, just use the LM studio front end, it's better than anything I've used for desktop use.
I get 35t/s gemma 15b Q8 - you'll need a smaller one, probably gemma 3 15b q4k_l. I have a 3090, that's why.
mfro|11 months ago