top | item 46722285

Show HN: First Claude Code client for Ollama local models

44 points| SerafimKorablev | 1 month ago |github.com

Just to clarify the background a bit. This project wasn’t planned as a big standalone release at first. On January 16, Ollama added support for an Anthropic-compatible API, and I was curious how far this could be pushed in practice. I decided to try plugging local Ollama models directly into a Claude Code-style workflow and see if it would actually work end to end.

Here is the release note from Ollama that made this possible: https://ollama.com/blog/claude

Technically, what I do is pretty straightforward:

- Detect which local models are available in Ollama.

- When internet access is unavailable, the client automatically switches to Ollama-backed local models instead of remote ones.

- From the user’s perspective, it is the same Claude Code flow, just backed by local inference.

In practice, the best-performing model so far has been qwen3-coder:30b. I also tested glm-4.7-flash, which was released very recently, but it struggles with reliably following tool-calling instructions, so it is not usable for this workflow yet.

27 comments

order

oceanplexian|1 month ago

The Anthropic API was already supported by llama.cpp (The project Ollama ripped off and typically lags in features by 3-6 months), which already works perfectly fine with Claude Code by setting a simple environment variable.

davely|1 month ago

Point of clarification: llama.cpp is MIT-licensed. Using it downstream (commercially or otherwise) is exactly what that license allows, so calling it a rip-off is misleading.

xd1936|1 month ago

And they reference that announcement and related information in the second line.

d4rkp4ttern|1 month ago

As others said this was possible for months already with llama-cop’s support for Anthropic messages API. You just need to set the ANTHROPIC_BASE_URL. The specific llama-server settings/flags were a pain to figure out and required some hunting, so I collected them in this guide to using CC with local models:

https://github.com/pchalasani/claude-code-tools/blob/main/do...

One tricky thing that took me a whole day to figure out is that using Claude Code in this setup was causing total network failures due to telemetry pings, so I had to set this env var to 1: CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC

eli|1 month ago

There are already various proxies to translate between OpenAI-style models (local or otherwise) and an Anthropic endpoint that Claude Code can talk to. Is the advantage here just one less piece of infrastructure to worry about?

g4cg54g54|1 month ago

siderailing here - but got one that _actually_ works?

in particular i´d like to call claude-models - in openai-schema hosted by a reseller - with some proxy that offers anthropic format to my claude --- but it seems like nothing gets to fully line things up (double-translated tool names for example)

reseller is abacus.ai - tried BerriAI/litellm, musistudio/claude-code-router, ziozzang/claude2openai-proxy, 1rgs/claude-code-proxy, fuergaosi233/claude-code-proxy,

dsrtslnd23|1 month ago

What hardware are you running the 30b model on? I guess it needs at least 24GB VRAM for decent inference speeds.

derp-mcgee|1 month ago

Im running qwen3-coder:30b-a3b-q8_0 @ 32k context. Comes out to 36gb and Im splitting it between a 3090 24gb and a 4060ti 16gb (ollama put 20gb on the 3090 and 13.5 on the 4060ti) , runs great tbh. Ollama running in ubuntu server and Im running claude code from my windows desktop pc.

thtmnisamnstr|1 month ago

The general rule to follow is that you need as much VRAM as the model size. 30b models are usually around 19GB. So, most likely a GPU with 24GB of VRAM.

ryandrake|1 month ago

I'd like to know this, too. I'm just getting started getting my feet wet with ollama and local models using just CPU, and it's obviously terribly slow (even 24 cores, 128GB DRAM. It's hard to gauge how much GPU money I'd need to plonk down to get acceptable performance for coding workflows.

horacemorace|1 month ago

I was trying to get Claude code to work with llama.cpp but could never figure out anything functional. It always insisted on a phone home login for first time setup. In cline I’m getting better results with glm-4.7-flash than with qwen3-coder:30b

g4cg54g54|1 month ago

~/.claude.json with {"hasCompletedOnboarding":true} is the key, then ANTHROPIC_BASE_URL and ANTHROPIC_AUTH_TOKEN work as expected

d4rkp4ttern|1 month ago

Curious what llama-server flags you used. On my M1 Max 64GB MacBook I tried it in Claude Code (which has a 25K system message) and I get 3 tps.

But with Qwen3-30B-A3B I get 20 tps in CC.

dosinga|1 month ago

this is cool. not sure it is the first claude code style coding agent that runs against Ollama models though. goose, opencode and others have been able to do that a while no?

d0100|1 month ago

Does this UI work with Open Code?