(no title)
apapalns | 3 months ago
I had a coworker do this with windsurf + manual driving awhile back and it was an absolute mess. Awful tests that were unmaintainable and next to useless (too much mocking, testing that the code “works the way it was written”, etc.). Writing a useful test suite is one of the most important parts of a codebase and requires careful deliberate thought. Without deep understanding of business logic (which takes time and is often lost after the initial devs move on) you’re not gonna get great tests.
To be fair to AI, we hired a “consultant” that also got us this same level of testing so it’s not like there is a high bar out there. It’s just not the kind of problem you can solve in 2 weeks.
simonw|3 months ago
Ask a coding agent to build tests for a project that has none and you're likely to get all sorts of messy mocks and tests that exercise internals when really you want them to exercise the top level public API of the project.
Give them just a few starting examples that demonstrate how to create a good testable environment without mocking and test the higher level APIs and they are much less likely to make a catastrophic mess.
You're still going to have to keep an eye on what they're doing and carefully review their work though!
cortesoft|3 months ago
I find this to be true for all AI coding, period. When I have the problem fully solved in my head, and I write the instructions to explicitly and fully describe my solution, the code that is generated works remarkably well. If I am not sure how it should work and give more vague instructions, things don't work so well.
Vinnl|3 months ago
omgbear|3 months ago
Or would return early from playwright tests when the desired targets couldn't be found instead of failing.
But I agree that with some guidance and a better CLAUDE.md, can work well!
throwup238|3 months ago
btown|3 months ago
andai|3 months ago
krschacht|3 months ago
anandchowdhary|3 months ago
typpilol|3 months ago
LASR|3 months ago
Code assistance tools might speed up your workflow by maybe 50% or even 100%, but it's not the geometric scaling that is commonly touted as the benefits of autonomous agentic AI.
And this is not a model capability issue that goes away with newer generations. But it's a human input problem.
anandchowdhary|3 months ago
For example, you can spend a few hours writing a really good set of initial tests that cover 10% of your codebase, and another few hours with an AGENTS.md that gives the LLM enough context about the rest of the codebase. But after that, there's a free* lunch because the agent can write all the other tests for you using that initial set and the context.
This also works with "here's how I created the Slack API integration, please create the Teams integration now" because it has enough to learn from, so that's free* too. This kind of pattern recognition means that prompting is O(1) but the model can do O(n) from that (I know, terrible analogy).
*Also literally becomes free as the cost of tokens approaches zero
nl|3 months ago
I recently had a bunch of Claude credits so got it to write a language implementation for me. It probably took 4 hours of my time, but judging by other implementations online I'd say the average implementation time is hundreds of hours.
The fact that the model knew the language and there are existing tests I could use is a radical difference.
id00|3 months ago
piker|3 months ago
colechristensen|3 months ago
An agent does a good job fixing it's own bad ideas when it can run tests, but the biggest blocker I've been having is the agent writing bad tests and getting stuck or claiming success by lobotomizing a test. I got pretty far with myself being the test critic and that being mostly the only input the agent got after the initial prompt. I'm just betting it could be done with a second agent.
andai|3 months ago
Uploaded a Prolog interpreter in Python and asked for a JS version. It surprised my by not just giving me a code block, but actually running a bunch of commands in its little VM, setting up a npm project, it even wrote a test suite and ran it to make sure all the tests pass!
I was very impressed, then I opened the tests script and saw like 15 lines of code, which ran some random functions, did nothing to test their correctness, and just printed "Test passed!" regardless of the result.
PunchyHamster|3 months ago
But "throw vague prompt at AI direction" does about as well as doing same thing with an intern.
gloosx|3 months ago
Aligns with vibe-coding values well: number go up – exec happy.
cpursley|3 months ago