top | item 44803121

(no title)

There are some patterns you can use that help a bit with this problem. Lowest hanging fruit is to tell the LLM that its tests should test only through public interfaces where possible. Next after that is to add a "check if any non-public interfaces were used in places where a public interface exposes the same functionality the not-yet-committed tests - if so, rewrite tests to use only publicly exposed interfaces" step to the workflow. You could likely also add linter rules, though sometimes you genuinely need to test something like error conditions that can't reasonably be tested only through public interfaces.

discuss

gspencley|6 months ago

Oh don't get me wrong. I'm sure that an LLM can write a decent test that doesn't have the problems I described. The problem is that LLMs are making a preexisting problem much, MUCH worse.

That problem statement is:

- Not all tests add value

- Some tests can even create dis-value (ex: slow to run, thus increasing CI bills for the business without actually testing anything important)

- Few developers understand what good automated testing looks like

- Developers are incentivized to write tests just to satisfy code coverage metrics

- Therefore writing tests is a chore and an afterthought

- So they reach for an LLM because it solves what they perceive as a problem

- The tests run and pass, and they are completely oblivious to the anti-patterns just introduced and the problems those will create over time

- The LLMs are generating hundreds, if not thousands, of these problems

So yeah, the problem is 100% the developers who don't understand how to evaluate the output of a tool that they are using.

But unlike functional code, these tests are - in many cases - arguably creating disvalue for the business. At least the functional code is a) more likely to be reviewed and code quality problems addressed and b) even if not, it's still providing features for the end user and thus adding some value.