Because LLMs are bad at reviewing code for the same reasons they are bad at making it? They get tricked by fancy clean syntax and take long descriptions / comments for granted without considering the greater context.
I don't know, I prompted Opus 4.5 "Tell me the reasons why this report is stupid" on one of the example slop reports and it returned a list of pretty good answers.[1]
Give it a presumption of guilt and tell it to make a list, and an LLM can do a pretty good job of judging crap. You could very easily rig up a system to give this "why is it stupid" report and then grade the reports and only let humans see the ones that get better than a B+.
If you give them the right structure I've found LLMs to be much better at judging things than creating them.
Opus' judgement in the end:
"This is a textbook example of someone running a sanitizer, seeing output, and filing a report without understanding what they found."
"Tell me the reasons why this report is stupid" is a loaded prompt. The tool will generate whatever output pattern matches it, including hallucinating it. You can get wildly different output if you prompt it "Tell me the reasons why this report is great".
It's the same as if you searched the web for a specific conclusion. You will get matches for it regardless of how insane it is, leading you to believe it is correct. LLMs take this to another level, since they can generate patterns not previously found in their training data, and the output seems credible on the surface.
Trusting the output of an LLM to determine the veracity of a piece of text is a baffilingly bad idea.
colechristensen|1 month ago
Give it a presumption of guilt and tell it to make a list, and an LLM can do a pretty good job of judging crap. You could very easily rig up a system to give this "why is it stupid" report and then grade the reports and only let humans see the ones that get better than a B+.
If you give them the right structure I've found LLMs to be much better at judging things than creating them.
Opus' judgement in the end:
"This is a textbook example of someone running a sanitizer, seeing output, and filing a report without understanding what they found."
1. https://claude.ai/share/8c96f19a-cf9b-4537-b663-b1cb771bfe3f
imiric|1 month ago
It's the same as if you searched the web for a specific conclusion. You will get matches for it regardless of how insane it is, leading you to believe it is correct. LLMs take this to another level, since they can generate patterns not previously found in their training data, and the output seems credible on the surface.
Trusting the output of an LLM to determine the veracity of a piece of text is a baffilingly bad idea.
exyi|1 month ago
unknown|1 month ago
[deleted]
nprateem|1 month ago