top | item 44847150

(no title)

bfioca | 6 months ago

>...it’s a useless tool. I don’t like collaborating with chronic liars who aren’t able to openly point out knowledge gaps...

I think a more correct take here might be "it's a tool that I don't trust enough to use without checking," or at the very least, "it's a useless tool for my purposes." I understand your point, but I got a little caught up on the above line because it's very far out of alignment with my own experience using it to save enormous amounts of time.

discuss

libraryofbabel|6 months ago

and as others have pointed out, this issue of “how much should I check” is really just a subset of an old general problem in trust and knowledge (“epistemology” or what have you) that people have recognized since at least the scientific revolution. The Royal Society’s motto on its founding in the 1660s was “take no man’s word for it.”

Coding agents have now got pretty good at checking themselves against reality, at least for things where they can run unit tests or a compiler to surface errors. That would catch the error in TFA. Of course there is still more checking to do down the line, in code reviews etc, but that goes for humans too. (This is not to say that humans and LLMs should be treated the same here, but nor do I treat an intern’s code and a staff engineer’s code the same.) It’s a complex issue that we can’t really collapse into “LLMs are useless because they get things wrong sometimes.”

AllegedAlec|6 months ago

> Coding agents have now got pretty good at checking themselves against reality, at least for things where they can run unit tests or a compiler to surface errors.

YMMV. I've seen Claude go completely batshit insane saying that tests all passed. Then I run them and I see 50+ failures. I copy the output tell him to fix it and he goes on his sycophantic apologia before spinning his wheels doing nothing and saying all tests are back to green.

lazide|6 months ago

It’s a tool that fundamentally can’t be used reliably without double checking everything it. That is rather different than you’re presenting it.

vidarh|6 months ago

We double check human work too in all kinds of contexts.

A whole lot of my schooling involved listening to teachers repeating over and over to us how we should check our work, because we can't even trust ourselves.

(heck, I had to double-check and fix typos in this comment)

mhh__|6 months ago

Checking is usually faster than writing from scratch so this is still +EV

tmnvdb|6 months ago

So similar to wikipedia