top | item 46702121

(no title)

novalis78 | 1 month ago

Just use an LLM to weed them out. What’s so hard about that?

discuss

Because LLMs are bad at reviewing code for the same reasons they are bad at making it? They get tricked by fancy clean syntax and take long descriptions / comments for granted without considering the greater context.

colechristensen|1 month ago

I don't know, I prompted Opus 4.5 "Tell me the reasons why this report is stupid" on one of the example slop reports and it returned a list of pretty good answers.[1]

Give it a presumption of guilt and tell it to make a list, and an LLM can do a pretty good job of judging crap. You could very easily rig up a system to give this "why is it stupid" report and then grade the reports and only let humans see the ones that get better than a B+.

If you give them the right structure I've found LLMs to be much better at judging things than creating them.

Opus' judgement in the end:

"This is a textbook example of someone running a sanitizer, seeing output, and filing a report without understanding what they found."

1. https://claude.ai/share/8c96f19a-cf9b-4537-b663-b1cb771bfe3f

bootsmann|1 month ago

If AI can't be trusted to write bug reports, why should it be trusted to review them?

f311a|1 month ago

How would it work if LLMs provide incorrect reports in the first place? Have a look at the actual HackerOne reports and their comments.

The problem is the complete stupidity of people. They use LLMs to convince the author of the curl that he is not correct about saying that the report is hallucinated. Instead of generating ten LLM comments and doubling down on their incorrect report, they could use a bit of brain power to actually validate the report. It does not even require a lot of skills, you have to manually tests it.

fc417fc802|1 month ago

Let the reporter duke it out with the project's gatekeeping LLM. If it keeps going on for long enough a human can quickly skim the exchange. It should be immediately obvious if the reporter is making sensible rebuttals or just throwing more slop at the wall.

I think fighting fire with fire is likely the correct answer here.

eqvinox|1 month ago

At this point it's impossible to tell if this is sarcasm or not.

Brave new world we got there.

vee-kay|1 month ago

Set a thief to catch a thief.