(no title)
aktau | 4 days ago
> So we run dozens of parallel CLI agents that can review the code in excruciating detail. This has completely replaced human code review for anything that isn't functional correctness but is near the same order of magnitude of price. Much better than humans and beats every commercial tool.
Sure, you could make multiple LLM invocations (different temporature, different prompts, ...). But how does one separate the good comments from the bad comments? Another meta-LLM? [1] Do you know of anyone who summarizes the approach?
[1]: I suppose you could shard that out for as much compute you want to spend, with one LLM invocation judging/collating the results of (say) 10 child reviewers.
DustinKlent|3 days ago
ivansavz|3 days ago
One thing that works very well for me (in a different context) is to ask to return two lists:
- Things that I must absolutely fix (bugs, typos, logic mistakes, etc.)
- Lesser fixes and other stylistic improvements
Then I look only at the first list.
jjmarr|3 days ago
Otherwise, some people feel review is too harsh, other people feel it is not harsh enough. AI does not fix inconsistent expectations.
> But how does one separate the good comments from the bad comments?
If the AI took a valid interpretation of the coding guidelines, it is a legitimate comment. If the AI is being overly pedantic, it is a documentation bug and we change the rules.