top | item 47153090

(no title)

eesmith | 6 days ago

You didn't think they were the wrong tools when you wrote it. You said "this example is simple enough that both Claude and ChatGPT can implement it using their default code environments".

From what I gather, a lot of people are using these code assistance tools because they too are in a hurry, under pressure from management forcing them to go faster with AI, and with limited ability to push back.

You have significantly more experience than most of your readership. Will you be providing guidelines about which tools to avoid for which problems, based on your experience?

Will you use this or something similar as an example of the negative consequence of being in a hurry, hopefully leading to a worked-out example of one might better audit or inspect tool-generated code, and the effort involved?

That would be invaluable for people dealing with overly-optimistic management pressure.

My personal belief is that one of the reasons for TDD's success is as a way for programmers to respond to ill-advised pressure to skimp on testing found in some test-after shops.

That disappears if managers believe instructing an agentic code generator to "use Red/Green TDD" easily ensures a robust automated test suite.

My apologies if you have already done this. I have not followed your work. My interest in this thread is from my views of TDD as a development approach, and the difficulty in generating a test suite which is robust, minimal, understandable, and maintainable.

discuss

simonw|6 days ago

I stand by what I originally wrote: the example was simple enough for ChatGPT and Claude do implement reasonably well.

They didn't implement it well enough for people not to pick them apart though, which is a distraction from the concept I'm trying to demonstrate.

This is honestly the biggest challenge in writing about this stuff, especially if you're doing it in public. Any example is an opportunity for people to find flaws which they might use to undermine the larger point I'm trying to communicate.

I have a visible changelog on each chapter now so people can follow how I evolve them over time. I'll try to find the right balance in terms of illustrative examples. My first attempt at linking directly to the first working transcripts I got clearly isn't it.

eesmith|6 days ago

I fully agree with the assessment it was "reasonably well".

It is not, however, something equivalent to the product of a disciplined TDD practitioner. Not even close.

You write that test-first development helps protect against two risks of code agents, but what does that mean for your specific example?

How is the final product better than the test-after prompt "Build a Python function to extract headers from a markdown string, then write a complete and robust test suite."

Otherwise, how do you know it's a "fantastic fit for coding agents" or that it gets "better results out of a coding agent"?