(no title)
t43562 | 12 days ago
In all areas where there's less easy ways to judge output there is going to be correspondingly more value to getting "good" people. Some AI that can produce readable reports isn't "good" - what matters is the quality of the work and the insight put into it which can only be ensured by looking at the workers reputation and past history.
naasking|12 days ago
That's not obvious at all if the AI writing the tests is different than the AI writing the code being tested. Put into an adversarial and critical mode, the same model outputs very different results.
t43562|12 days ago
Obviously this is only partially true but it's true enough.
It takes humans quite a long time to learn the external context that lets them write good tests IMO. We have trouble feeding enough context into AIs to give them equal ability. One is often talking about companies where nobody bothers to write down more than 1/20th of what is needed to be an effective developer. So you go to some place and 5 years later you might be lucky to know 80% of the context in your limited area after 100s of meetings and talking to people and handling customer complaints etc.
disiplus|12 days ago
nthj|12 days ago
I have been doing this with coding agents across LLM providers for a while now, with very successful results. Grok seems particularly happy to tell Anthropic where it’s cutting corners, but I get great insights from O3 and Gemini too.