top | item 47059050

(no title)

t43562 | 12 days ago

Code may have to compile but that's a lowish bar and since the AI is writing the tests it's obvious that they're going to pass.

In all areas where there's less easy ways to judge output there is going to be correspondingly more value to getting "good" people. Some AI that can produce readable reports isn't "good" - what matters is the quality of the work and the insight put into it which can only be ensured by looking at the workers reputation and past history.

discuss

naasking|12 days ago

> since the AI is writing the tests it's obvious that they're going to pass

That's not obvious at all if the AI writing the tests is different than the AI writing the code being tested. Put into an adversarial and critical mode, the same model outputs very different results.

t43562|12 days ago

IMO the reason neither of them can really write entirely trustworthy tests is that they don't have domain knowledge so they write the test based on what the code does plus what they extract from some prompts rather than based on some abstract understanding of what it should do given that it's being used e.g. in a nuclear power station or for promoting cat videos or in a hospital or whatever.

Obviously this is only partially true but it's true enough.

It takes humans quite a long time to learn the external context that lets them write good tests IMO. We have trouble feeding enough context into AIs to give them equal ability. One is often talking about companies where nobody bothers to write down more than 1/20th of what is needed to be an effective developer. So you go to some place and 5 years later you might be lucky to know 80% of the context in your limited area after 100s of meetings and talking to people and handling customer complaints etc.

disiplus|12 days ago

Even if its a different session it can be enough. But that said i had times where it rewrote tests "because my implementation was now different so the tests needed to be updated" so you have to prompt even that to tell it to not touch the tests.

nthj|12 days ago

We’ve had the sycophant problem for as long as people have held power over other people, and the answer has always been “put 3-5 workers in a room and make them compete for the illusion of favor.”

I have been doing this with coding agents across LLM providers for a while now, with very successful results. Grok seems particularly happy to tell Anthropic where it’s cutting corners, but I get great insights from O3 and Gemini too.