What would a good coding model to run on an M3 Pro (18GB) to get Codex like workflow and quality? Essentially, I am running out quick when using Codex-High on VSCode on the $20 ChatGPT plan and looking for cheaper / free alternatives (even if a little slower, but same quality). Any pointers?
duffyjp|1 month ago
I gave one of the GPUs to my kid to play games on.
Tostino|1 month ago
medvezhenok|1 month ago
If you had more like 200GB ram you might be able to run something like MiniMax M2.1 to get last-gen performance at something resembling usable speed - but it's still a far cry from codex on high.
mittermayr|1 month ago
evilduck|1 month ago
I guess you could technically run the huge leading open weight models using large disks as RAM and have close to the "same quality" but with "heat death of the universe" speeds.
tosh|1 month ago
with 32gb RAM:
qwen3-coder and glm 4.7 flash are both impressive 30b parameter models
not on the level of gpt 5.2 codex but small enough to run locally (w/ 32gb RAM 4bit quantized) and quite capable
but it is just a matter of time I think until we get quite capable coding models that will be able to run with less RAM
adam_patarino|1 month ago
Current test version runs in 8GB @ 60tks. Lmk if you want to join our early tester group!
jgoodhcg|1 month ago
margorczynski|1 month ago
Mashimo|1 month ago
The best could be GLN 4.7 Flash, and I doubt it's close to what you want.
atwrk|1 month ago
If remote models are ok you could have a look at MiniMax M2.1 (minimax.io) or GLM from z.ai or Qwen3 Coder. You should be able to use all of these with your local openai app.
marcd35|1 month ago