top | item 46969561

(no title)

If you give it a test harness then you're doing TDD? That will only work if you know what you're building, which is seldom the case.

discuss

vidarh|19 days ago

TDD does not require you to know everything you're building up-front. Tests can come out of experimentation, to validate the final build. Tests can be driven by autonomous directed planning.

I'm currently, in fact, working on a system where the LLM semi-independently build up an understanding of a project and its goals from exploration, and then creates small targeted improvement plans, including the acceptance criteria that then feeds into building test suites which the build will finally be measured against.

It still needs direction - if you have a large spec or a judge/fitness function, such as you would for a compiler for an existing language, you can achieve a lot just from using that and may not need much additional direction. But even for far more exploratory projects, you can have the LLM surface perceived goals and plans to meet those goals, and "teach it" on the way by giving it points on how to revise a given goal or plan, and have e.g. implementation successes and failures feed into future plans.

My current system has "learned" [1] quite quickly on fairly complex test projects, and I'm in fact right now testing it on a hobby compiler project. The first cycles are frustrating (and an area I'm refining), because it's dumped into a project it doesn't know the real motivations for, and it will start making some code changes you know are bad, and letting go obsessing over that is hard. But ultimately using it as input to a feedback cycle where you add to its goals (e.g. make clear one of the goals is code that meets your specific standards) is more useful than managing it in detail yourself.

I'm very closet to putting this improvement agent in a cron job for a project I rely on for day to day use (yes, I'll make sure I can roll back), because it now very consistently implements improvements both entirely unilaterally, or based on minor hints (it has access to some files on my desktop, including a "journal" of sorts, and if I put a one-liner about an idea or frustration, I'll often come back to find a 300+ line implementation plan for a change to fix it, or lay the foundation for fixing it.

[1] "Learned" in this instance is in quotes for a reason. I'm not fine-tuning models - I have the agent do a retro of its own plan executions, and update documents with "lessons learned" that gets fed into the next planning stage.