top | item 43815835

(no title)

zkiihne | 10 months ago

I want this but an LLM.

discuss

notarealllama|10 months ago

OpenwebUI and you can run quantized or low end models (Llama 3 4b or gemma 4b) on a 4-6gb graphic card.

It's a game changer to run local (no usage caps for a weekend blitz project)

drittich|10 months ago

I played with gemma-3-4b-it-qat recently using a mid-tier graphics card and a few things stood out to me:

1. It was very fast, between 35 and 70 tokens per second, with initial response in under 200ms. That kind of speed is a feature.

2. It was very useful. I had a brainstorming session with it that was both fluid and fruitful

3. I can't wrap my head around so much knowledge being contained in about 3GB of data. It seems to know something about everything. Imperfect, but very useful.