(no title)
bfioca | 6 months ago
I think a more correct take here might be "it's a tool that I don't trust enough to use without checking," or at the very least, "it's a useless tool for my purposes." I understand your point, but I got a little caught up on the above line because it's very far out of alignment with my own experience using it to save enormous amounts of time.
libraryofbabel|6 months ago
Coding agents have now got pretty good at checking themselves against reality, at least for things where they can run unit tests or a compiler to surface errors. That would catch the error in TFA. Of course there is still more checking to do down the line, in code reviews etc, but that goes for humans too. (This is not to say that humans and LLMs should be treated the same here, but nor do I treat an intern’s code and a staff engineer’s code the same.) It’s a complex issue that we can’t really collapse into “LLMs are useless because they get things wrong sometimes.”
AllegedAlec|6 months ago
YMMV. I've seen Claude go completely batshit insane saying that tests all passed. Then I run them and I see 50+ failures. I copy the output tell him to fix it and he goes on his sycophantic apologia before spinning his wheels doing nothing and saying all tests are back to green.
lazide|6 months ago
vidarh|6 months ago
A whole lot of my schooling involved listening to teachers repeating over and over to us how we should check our work, because we can't even trust ourselves.
(heck, I had to double-check and fix typos in this comment)
mhh__|6 months ago
tmnvdb|6 months ago