top | item 46704370

(no title)

Draiken | 1 month ago

Yet this is not reproducible. This is the whole issue with LLMs: they are random.

You cannot trust that it'll do a good job on all reports so you'll have to manually review the LLMs reports anyways or hope that real issues didn't get false-negatives or fake ones got false-positives.

This is what I've seen most LLM proponents do: they gloss over the issues and tell everyone it's all fine. Who cares about the details? They don't review the gigantic pile of slop code/answers/results they generate. They skim and say YOLO. Worked for my narrow set of anecdotal tests, so it must work for everything!

IIRC DOGE did something like this to analyze government jobs that were needed or not and then fired people based on that. Guess how good the result was?

This is a very similar scenario: make some judgement call based on a small set of data. It absolutely sucks at it. And I'm not even going to get into the issue of liability which is another can of worms.

discuss

colechristensen|1 month ago

Is it not reproducable? Someone up thread reproduced it and expanded on it. It worked for me the first time I prompted. Did you try it or are you just guessing that it's not reproducable because that's what you already think?

I'm not talking about completely replacing humans, the goal of this exercise was demonstrating how to use an LLM to filter out garbage. Low quality semi-anonymous reports don't deserve a whole lot of accuracy and being conservative and rejecting most reports even when you throw out legitimate ones is fine.

You seem like regardless of evidence presented, your prejudices will lead you to the same conclusions, so what's the point discussing anything? I looked for, found, and shared evidence, you're sharing your opinion.

>IIRC DOGE did something like this to analyze government jobs that were needed or not and then fired people based on that. Guess how good the result was?

I'm talking about filtering spammy communication channels, that has nothing like the care required in making employment decisions.

Your comment is plainly just bad faith and prejudice.

Draiken|1 month ago

> Is it not reproducable? Someone up thread reproduced it and expanded on it. It worked for me the first time I prompted. Did you try it or are you just guessing that it's not reproducable because that's what you already think?

I assumed you knew how LLMs work. They are random by nature, not "because I'm guessing it". There's a reason if you ask the LLM the same exact prompt hundreds of times you'll get hundreds of different answers.

>I looked for, found, and shared evidence

Anecdotal evidence. Studies have shown how unreliable LLMs are exactly because they are not deterministic. Again, it's a fact, not an opinion.

>I'm talking about filtering spammy communication channels

So if we make tons of mistakes there, who cares, right?

I only used this as an example because it's one of the few very public uses of LLMs to make judgement calls where people accepted it as true and faced consequences.

I'm sure there are plenty more people getting screwed over by similar mistakes, but folks generally aren't stupid enough to say that publicly. Maybe the Salesforce huge mistake qualifies too? Incidentally it also involved people's jobs.

Regardless, the point stands: they are unreliable.

Want to trust LLMs blindly for your weekend project? Great! The only potential victim for its mistakes is you. For anything serious like a huge open source project? That's irresponsible.