top | item 46922779

(no title)

The ralph loop is a great shape — "read, identify, task, execute" is how autonomous refactoring should work.

The part I'd push on: what happens on loop N+1? 250 services refactored means 250 places where the spec the agent built against might have already changed, cross-references got broken across context windows, and comments now point at functions that were renamed three loops ago.

I've been working on this problem from the other end. The generation side is largely solved — agents can build. The unsolved part is drift: the slow, silent divergence between what you intended and what actually exists. Spec drift, behavioral drift, comment drift. If you don't measure it, you don't see it until something breaks in production.

The intelligence buying framing is right, but I think the real cost isn't the tokens — it's the maintenance surface area those tokens create. Every autonomous refactor is a bet that the output stays aligned with intent over time. Without something watching for divergence, you're buying intelligence today and technical debt tomorrow.

discuss

fvdessen|23 days ago

That's the point of the loop, (the prompt is in another comment) start with a fresh context at every step, read the whole code base, and do one thing at a time.

Two important part that has been left out from the article is 1) service code size, our services are small enough to fit in a context + leave room for implementation of the change. If this is not the case you need to scope it down from 'read the whole service'.

The other part is that our services interact with http apis specified as openapi yaml specs, and the refactoring hopefully doesn't alter their behaviour and specifications. If it was internal apis or libraries where the spec are part of the code that would potentially be touched by the reafctoring I would be less at ease with this kind of approach

The service also have close to 100% test coverage, and this is still essential as the models still do mistakes that wouldn't be caught without them

burnerToBetOut|22 days ago

    > …our services interact with http apis…
    > … 
    > …If it was internal apis or libraries…

That reminds me that I wanted to ask you: How good is your agent with complying with your system's architectural patterns?

Given my admittedly limited experience with coding agents, I'd expect a fully autonomous agent to have a tendency to do naïve juniory dev stuff.

Like, for example, write code that makes direct calls to your data access layer (i.e., the repository) from your controllers.

Or bypass the façade layer in favor of direct calls from your business services to external services.

FWIW: Those are Java/Spring Boot idioms. I'd have to research whether or not there are parallels in microservices implemented in Go.

burnerToBetOut|23 days ago

    > …If you don't measure it, you don't see 
    > it until something breaks in production…
    > …
    > …the slow, silent divergence between
    > what you intended and what actually exists…

What's your take on the absence of any mention of tests in the OP's loop steps?

fvdessen|23 days ago

Opus 4.6 is smart enough to run the tests without being told to do so, that's why it isn't in the prompt