top | item 46832267

(no title)

akst | 1 month ago

ATM I feel like LLM writing tests can be a bit dangerous at times, there are cases where it's fine there are cases where it's not. I don't really think I could articulate a systemised basis for identifying either case, but I know it when I see it I guess.

Like the the other day, I gave it a bunch of use cases to write tests for, the use cases were correct the code was not, it saw one of the tests broken so it sought to rewrite the test. You risking suboptimal results when an agent is dictating its own success criteria.

At one point I did try and use seperate Claude instances to write tests, then I'd get the other instance to write the implementation unaware of the tests. But it's a bit to much setup.

discuss

icedchai|29 days ago

I work with individuals who attempt to use LLMs to write tests. More than once, it's added nonsensical, useless test cases. Admittedly, humans do this, too, to a lesser extent.

Additionally, if their code has broken existing tests, it "fixes" them by not fixing the code under test, but changing the tests... (assert status == 200 becomes 500 and deleting code.)

Tests "pass." PR is opened. Reviewers wade through slop...

sigotirandolas|29 days ago

The most annoying thing is that even after cleaning up all the nonsense, the tests still contain all sort of fanfare and it’s essentially impossible to get the submitter to trim them because it’s death by a thousand cuts (and you better not say "do it as if you didn’t use AI" in the current climate..)