top | item 45772938

(no title)

lreeves | 4 months ago

I sometimes still code with a local LLM but can't imagine doing it on a laptop. I have a server that has GPUs and runs llama.cpp behind llama-swap (letting me switch between models quickly). The best local coding setup I've been able to do so far is using Aider with gpt-oss-120b.

I guess you could get a Ryzen AI Max+ with 128GB RAM to try and do that locally but non-nVidia hardware is incredibly slow for coding usage since the prompts become very large and take exponentially longer but gpt-oss is a sparse model so maybe it won't be that bad.

Also just to point it out, if you use OpenRouter with things like Aider or roocode or whatever you can also flag your account to only use providers with a zero-data retention policy if you are truly concerned about anyone training on your source code. GPT5 and Claude are infinitely better, faster and cheaper than anything I can do locally and I have a monster setup.

discuss

fm2606|4 months ago

gpt-oss-120b is amazing. I created a RAG agent to hold most of GCP documentation (separate download, parsing, chunking, etc). ChatGPT finished a 50 question quiz in 6 min with a score of 46 / 50. gpt-oss-120b took over an hour but got 47 / 50. All the other local LLMs I tried were small and performed way worse, like less than 50% correct.

I ran this on an i7 with 64gb of RAM and an old nvidia card with 8g of vram.

EDIT: Forgot to say what the RAG system was doing which was answering a 50 question multiple choice test about GCP and cloud engineering.

embedding-shape|4 months ago

> gpt-oss-120b is amazing

Yup, I agree, easily best local model you can run today on local hardware, especially when reasoning_effort is set to "high", but "medium" does very well too.

I think people missed out on how great it was because a bunch of the runners botched their implementations at launch, and it wasn't until 2-3 weeks after launch that you could properly evaluate it, and once I could run the evaluations myself on my own tasks, it really became evident how much better it is.

If you haven't tried it yet, or you tried it very early after the release, do yourself a favor and try it again with updated runners.

whatreason|4 months ago

What do you use to run gpt-oss here? ollama, vLLM, etc

unknown|4 months ago

[deleted]

rovr138|4 months ago

> I created a RAG agent to hold most of GCP documentation (separate download, parsing, chunking, etc)

If you share the scripts to gather the GCP documentation this, that'd be great. Because I have had an idea to do something like this, and the part I don't want to deal with is getting the data

giorgioz|4 months ago

on what hardware you manate to run gpt-oss-120b locally?

lacoolj|4 months ago

you can run the 120b model on an 8GB GPU? or are you running this on CPU with the 64GB RAM?

I'm about to try this out lol

The 20b model is not great, so I'm hoping 120b is the golden ticket.

gkfasdfasdf|4 months ago

What were you using for RAG? Did you build your own or some off the shelf solution (e.g. openwebui)

adastra22|4 months ago

What quantization settings?

neilv|4 months ago

https://github.com/mostlygeek/llama-swap