top | item 47149730

(no title)

eesmith | 4 days ago

Will you be updating the text at https://simonwillison.net/guides/agentic-engineering-pattern...?

As it currently says:

> A significant risk with coding agents is that they might write code that doesn't work, or build code that is unnecessary and never gets used, or both.

> Test-first development helps protect against both of these common mistakes, and also ensures a robust automated test suite that protects against future regressions.

while the ChatGPT generated code contains bugs, contains unnecessary code which never gets used, and the ChatGPT generated test suite is not robust.

(As an example of unnecessary code which never gets used, _FENCE_RE contains "(?P<info>.*)$" but neither the group name nor the group are used, and the pattern is unneeded -- and all of the tests pass without it.)

Your writings are widely read and influential. I think it's important that you let readers know the results produced in your experiment are not actually a complete example of a "fantastic fit" of Red/Green TDD for coding agents, and to highlight their limitations.

discuss

simonw|4 days ago

I'll be replacing the examples with ones that better illustrate the technique. I dashed off those off in a hurry using the wrong tools (I used ChatGPT and Claude directly, not the Coding agent harnesses Claude Code and Codex) and that was a mistake.

eesmith|4 days ago

You didn't think they were the wrong tools when you wrote it. You said "this example is simple enough that both Claude and ChatGPT can implement it using their default code environments".

From what I gather, a lot of people are using these code assistance tools because they too are in a hurry, under pressure from management forcing them to go faster with AI, and with limited ability to push back.

You have significantly more experience than most of your readership. Will you be providing guidelines about which tools to avoid for which problems, based on your experience?

Will you use this or something similar as an example of the negative consequence of being in a hurry, hopefully leading to a worked-out example of one might better audit or inspect tool-generated code, and the effort involved?

That would be invaluable for people dealing with overly-optimistic management pressure.

My personal belief is that one of the reasons for TDD's success is as a way for programmers to respond to ill-advised pressure to skimp on testing found in some test-after shops.

That disappears if managers believe instructing an agentic code generator to "use Red/Green TDD" easily ensures a robust automated test suite.

My apologies if you have already done this. I have not followed your work. My interest in this thread is from my views of TDD as a development approach, and the difficulty in generating a test suite which is robust, minimal, understandable, and maintainable.