top | item 45938517

Show HN: Continuous Claude – run Claude Code in a loop

170 points| anandchowdhary | 3 months ago |github.com

Continuous Claude is a CLI wrapper I made that runs Claude Code in an iterative loop with persistent context, automatically driving a PR-based workflow. Each iteration creates a branch, applies a focused code change, generates a commit, opens a PR via GitHub's CLI, waits for required checks and reviews, merges if green, and records state into a shared notes file.

This avoids the typical stateless one-shot pattern of current coding agents and enables multi-step changes without losing intermediate reasoning, test failures, or partial progress.

The tool is useful for tasks that require many small, serial modifications: increasing test coverage, large refactors, dependency upgrades guided by release notes, or framework migrations.

Blog post about this: https://anandchowdhary.com/blog/2025/running-claude-code-in-...

62 comments

apapalns|3 months ago

> codebase with hundreds of thousands of lines of code and go from 0% to 80%+ coverage in the next few weeks

I had a coworker do this with windsurf + manual driving awhile back and it was an absolute mess. Awful tests that were unmaintainable and next to useless (too much mocking, testing that the code “works the way it was written”, etc.). Writing a useful test suite is one of the most important parts of a codebase and requires careful deliberate thought. Without deep understanding of business logic (which takes time and is often lost after the initial devs move on) you’re not gonna get great tests.

To be fair to AI, we hired a “consultant” that also got us this same level of testing so it’s not like there is a high bar out there. It’s just not the kind of problem you can solve in 2 weeks.

simonw|3 months ago

I find coding agents can produce very high quality tests if and only if you give them detailed guidance and good starting examples.

Ask a coding agent to build tests for a project that has none and you're likely to get all sorts of messy mocks and tests that exercise internals when really you want them to exercise the top level public API of the project.

Give them just a few starting examples that demonstrate how to create a good testable environment without mocking and test the higher level APIs and they are much less likely to make a catastrophic mess.

You're still going to have to keep an eye on what they're doing and carefully review their work though!

LASR|3 months ago

There is no free lunch. The amount of prompt writing to give the LLM enough context about your codebase etc is comparable to writing the tests yourself.

Code assistance tools might speed up your workflow by maybe 50% or even 100%, but it's not the geometric scaling that is commonly touted as the benefits of autonomous agentic AI.

And this is not a model capability issue that goes away with newer generations. But it's a human input problem.

id00|3 months ago

I agree. It is very easy to fall in the trap: "I let AI write all the tests" and then find yourself in a situation where you have an unmaintainable mess with the only way to fix broken test within a reasonable time is to blindly accept AI to do that. Which exposes you to the similar level of risk as running any unchecked AI code - you just can't trust that it works correctly

colechristensen|3 months ago

With recent experience I'm thinking the correct solution is a separate agent with prompting to exclusively be a test critic given a growing list of bad testing patterns to avoid, agent 2 gives feedback to agent 1. Separating agents into having unique jobs.

An agent does a good job fixing it's own bad ideas when it can run tests, but the biggest blocker I've been having is the agent writing bad tests and getting stuck or claiming success by lobotomizing a test. I got pretty far with myself being the test critic and that being mostly the only input the agent got after the initial prompt. I'm just betting it could be done with a second agent.

andai|3 months ago

I had a funny experience with Claude (Web) the other day.

Uploaded a Prolog interpreter in Python and asked for a JS version. It surprised my by not just giving me a code block, but actually running a bunch of commands in its little VM, setting up a npm project, it even wrote a test suite and ran it to make sure all the tests pass!

I was very impressed, then I opened the tests script and saw like 15 lines of code, which ran some random functions, did nothing to test their correctness, and just printed "Test passed!" regardless of the result.

PunchyHamster|3 months ago

Cleanroom design of "this is a function's interface, it does this and that, write tests for that function to pass" generally can get you pretty decent results.

But "throw vague prompt at AI direction" does about as well as doing same thing with an intern.

gloosx|3 months ago

Ah yes, classic "increase test coverage for the sake of increasing test coverage".

Aligns with vibe-coding values well: number go up – exec happy.

cpursley|3 months ago

Which language? I've found Claude very good at Elixir test coverage (surprisingly) but a dumpster fire with any sort JS/TS testing.

janaagaard|3 months ago

Kudos on making Bash readable.

(https://github.com/AnandChowdhary/continuous-claude/blob/mai...)

jdc0589|3 months ago

im not saying OP did this, but I've actually had AI spit out some pretty stellar bash scripts, surprisingly

kami23|3 months ago

I've dubbed my loop of this as 'sicko mode' at work as I've become a bit obsessed with automating everything little thing in my flow, so I can focus on just features and bugs. It feels like a game to me and I enjoy it a lot.

lizardking|3 months ago

It's oddly satisfying to watch your tooling improve itself.

i4k|3 months ago

I was expecting to see links to a bunch of opensource successful examples, projects self-managed and continuously adding code and getting better.

99.9999% of AI software is vaporware.

DeathArrow|3 months ago

>run Claude code in a loop

And watch your bank account go brrr!

lizardking|3 months ago

If you have a max plan you just watch your usage get throttled

namanyayg|3 months ago

Exactly what I needed! I might use it for test coverage on an ancient project I need to improve...

jes5199|3 months ago

can it read code review comments? I've been finding that having claude write code but letting codex review PRs is a productive workflow, claude code is capable of reading the feedback left in comments and is pretty good at following the advice.

stpedgwdgfhgdd|3 months ago

I’m letting Claude Code review the code as part of a gitlab CI job. It adds inline comments (using curl and the http API, nightmare to get right as glab does not support this)

CC can also read the inline comments and creates fixes. Now thinking of adding an extra CI job that will address the review comments in a separate MR.

jerezzprime|3 months ago

Have you tried GitHub Copilot? I've been trying it out directly in my PRs like you suggest. Works pretty well sometimes.

cog-flex|3 months ago

Does this exist for Codex?

tinodb|3 months ago

You can do this with a single while loop in bash. No need for this project. Search for “ralph wiggum ai coding” and find a couple of guys that share plenty of examples and nerd out about it

leobg|3 months ago

Missed opportunity to call it Claude Incontinent (CLI).

decide1000|3 months ago

How does it handle questions asked by Claude?

anandchowdhary|3 months ago

It sends a flag that dangerously allows Claude to just do whatever it wants and only give us the final answer. It doesn't do the back-and-forth or ask questions.

unknown|3 months ago

[deleted]

mrwill84|3 months ago

[deleted]

RonanSoleste|3 months ago

[deleted]

dang|3 months ago

Please don't use quotation marks to make it look like you're quoting something when you're not.

Especially not for snark purposes - https://news.ycombinator.com/newsguidelines.html.

_dark_matter_|3 months ago

Fyi