(no title)
davidbau | 2 months ago
The distinction is between two ways of deploying human thinking. In the first, you are the test oracle: think about every test, repeat every five minutes, or every 30. In the second, you design the evaluation infrastructure: what to measure, what's untested, what hypotheses to prioritize. Both require judgment. But the first scatters your attention; the second concentrates it. I disagree that an LLM cannot be used to write tests, but as with any battery of tests, you cannot trust it blindly. You need to think about how to test the tests.
As for product risk: I do not know what is hiding in 13,000 lines I haven't read; there are certainly bugs to be found. But that is just as true when you manage a big team. The solution has never been to read every line your collaborators write. You need to invest in (technical and human) systems that give you confidence without requiring you to personally verify everything. The question is how to build systems that are good enough.
No comments yet.