top | item 47153979

(no title)

eesmith | 4 days ago

I fully agree with the assessment it was "reasonably well".

It is not, however, something equivalent to the product of a disciplined TDD practitioner. Not even close.

You write that test-first development helps protect against two risks of code agents, but what does that mean for your specific example?

How is the final product better than the test-after prompt "Build a Python function to extract headers from a markdown string, then write a complete and robust test suite."

Otherwise, how do you know it's a "fantastic fit for coding agents" or that it gets "better results out of a coding agent"?

discuss

simonw|4 days ago

I know TDD provides better results for coding agents from 6+ months of experience working this, plus confirmation from conversations with other practitioners. TDD is the key methodology used by the popular superpowers set of Claude skills by Jesse Vincent, for example.

I'm not going to be trying to irrefutably prove everything I write about in the Agentic Engineering Patterns book - that would require a credible research team and peer-reviewed papers, and that's not a level of effort I'm willing to put into this.

eesmith|4 days ago

By your response, I think you've flipped the bozo bit on me. I will try again.

I'm most certainly not asking for irrefutable proof. I'm asking for a concrete example of how you know, in a way that that would inform me and others in your readership:

1) how do the results from a TDD prompt compare to a good quality test-last prompt?

2) following the TDD approach, what are the steps to get from the initial solution, with errors and untested code, to one which passes human code review?

There's a long history of how Postel's Robustness principle combined with the difficulty of following a spec closely results in a fractured and incompatible ecosystem. We have enough deliberate Markdown variants without needing to introduce a new one by happenstance. This informs my belief that something claiming to parse Markdown requires extra attention to the details, beyond what a one-off toy example would need. That's precisely why I think this is a good example problem.

I'm not tracking what's going on with agentic programming. I don't know who Jesse Vincet is or how his Clause skills are relevant. Is the target audience for your book those who know what what those mean, or developers like me who don't?

What I do know very well is what robust tests look like, and what TDD is supposed to look like. I didn't see it in your example, and would very much like to see a full example of a non-trivial problem like this one worked out, and compared to a non-TDD agentic approach.

That level of analysis is missing from almost every TDD example, which tend to use a toy problem to walk through the mechanical details of the red-green step, with little attention to -- or need for -- the refactor part, which is the hardest part of TDD.

I'll also note that I seem to be the only one here who commented about the generated code quality and fitness to task. I mourn that so few care about those details.