With not much more effort you can get a much better review by additionally concatenating the touched files and sending them as context along with the diff. It was the work of about five minutes to make the scaffolding of a very basic bot that does this, and then somewhat more time iterating on the prompt. By the way, I find it's seriously worth sucking up the extra ~four minutes of delay and going up to GPT-5 high rather than using a dumber model; I suspect xhigh is worth the ~5x additional bump in runtime on top of high, but at that point you have to start rearchitecting your workflows around it and I haven't solved that problem yet.
(That's if you don't want to go full Codex and have an agent play around with the PR. Personally I find that GPT-5.2 xhigh is incredibly good at analysing diffs-plus-context without tools.)
Alternative twist on this that I find works very well (and that I posted about a month ago https://news.ycombinator.com/item?id=45959846) - instead of concat&sending touched files, checkout the feature branch and the prompt becomes "help me review this pr, diff attached, we are on the feature branch" with an AI that has access to the codebase (I like Cursor).
I've been using gemini-3-flash the last few days and it is quite good, I'm not sure you need the biggest models anymore. I have only switched to pro once or twice the last few days
Do you do any preprocessing of diffs to replace significant whitespace with some token that is easier to spot? In my experience, some LLMs cannot tell unchanged context from the actual changes. That's especially annoying with -U99999 diffs as a shortcut to provide full file context.
I still dont get the idea about AI code reviews.
A code review (at least in my opinion) is for your peers to check if the changes will have a positive or negative effect on the overall code + architecture.
I have yet to see an LLM being good at this.
Sure, they will leave comments about common made errors (your editor should already warn about this before you even commit it) etc. But to notify about this weird thing that was done to make sure something a lot of customers wanted is made reality.
also, PR's are created to share knowledge. Questions and answers on them are to spread knowledge in the team. AI does not do that.
Sure, AI code reviews aren't a replacement for an architecture review on a larger team project.
But they're fantastic at spotting dumb mistakes or low-hanging fruit for improvements!
And having the AI spot those for you first means you don't waste your team's valuable reviewing time on the simple stuff that you could have caught early.
This question is surprising to me, because I consider AI code review the single most valuable aspect of AI-assisted software development today. It's ahead of line/next-edit tab completion, agentic task completion, etc.
AI code review does not replace human review. But AI reviewers will often notice little things that a human may miss. Sometimes the things they flag are false positives, but it's still worth checking in on them. If even one logical error or edge case gets caught by an AI reviewer that would've otherwise made it to production with just human review, it's a win.
Some AI reviewers will also factor in context of related files not visible in the diff. Humans can do this, but it's time consuming, and many don't.
AI reviews are also a great place to put "lint" like rules that would be complicated to express in standard linting tools like Eslint.
We currently run 3-4 AI reviewers on our PRs. The biggest problem I run into is outdated knowledge. We've had AI reviewers leave comments based on limitations of DynamoDB or whatever that haven't been true for the last year or two. And of course it feels tedious when 3 bots all leave similar comments on the same line, but even that is useful as reinforcement of a signal.
If you have an architecture document, readmes for related services, relevant code from related services and such assembled for a LLM, it can do a pretty solid review even on microservices. It can catch parameter mismatches/edge cases, instrument logging end to end, do some reasonable flow modeling, etc. It can also point out when uncovered code is a risk, and do a sanity check on tests.
In order to be time efficient, human review should focus on the 'what' rather than the 'how' in most cases.
I work alone (in a medium sized company). No peers, no code review. AI code review is invaluable.
AI is a mixed bag. I'm the type of person who is compelled to have a deep understanding of the code they write. Writing my own code vs fixing AI-generated code is a wash timewise, and the AI generated code is so limited (assuming you pared down the uselessly elaborate code and fixed all the critical runtime bugs) as to restrict further iterations. And I'm talking about uploading an architectural blueprint with every function a documented but otherwise empty stub.
AI is a great bellwether. I bounce ideas off AI for a consensus. The closest equivalent is reading StackOverflow comments. I once offhandedly complained that python had no equivalent for setattr at class scope (as __class__ is not defined until after __new__), but AI showed me how to provide a closure in __prepare__ over the class namespace, which was introduced in 3.3 (?) and to which I paid little attention. What a gem.
AI is great for learning. If you follow a textbook or blog or paper and don't understand, AI can clarify. But be careful with less structured learning - it is important to build a full mental model accounting for every possible outcome and explanation, otherwise you're susceptible to hallucinations. I remember my first derivative in which the end result could be obtained via two separate proofs, one of which would imply an incorrect calculus. You've got to play with it until you're satisfied your mental model accounts for all the facts.
Because AI facilitates learning so easily I feel the best skills for a future generation are those pertaining to memory and retention. Ya know, assuming we don't develop individualized and personalized AI that can model your next word and act as a personal memex.
AI is great as a search assistant. I have much better recall when rereading content. Thus, I prefer to ask AI to search for links to content I vaguely recall, rather than ask AI for it's own summary or recollection.
Despite being terrible at writing decent code, AI provides fantastic code review. It catches everything from subtle high-level errors - even potential errors that haven't yet occured - to api mismatches to syntax errors. I actually wish I was as fluent at namespaces and cgroups as AI, and I'm well versed.
AI is interesting at comments. It can be hard to provide germane comments while wading through the weeds. I feel the best comments are written a few days after the code is complete. AI provides that fresh perspective instantly. And if AI can't understand your code, you had better improve your comments.
I'm about to try AI for unit tests. I prefer hypothesis tests. I took a quick glance at the generated code, and it seemed overly complicated. So I'm not optimistic.
Outside of software, AI is great for things you don't really care about. Yeah it might hallucinate, but my back-and-forth with AI is more about holding up a mirror to myself, revealing inner biases. Especially useful for interior decoration / remodeling.
Of course, take everything I said with a grain of salt as security at my company actively discourages AI. So, everything I said applies to free/cheap plans only. And I haven't tried skills yet.
As for PR reviews, assuming you've got linting and static analysis out the way, you'd need to enter a sufficiently reasonable prompt to truly catch problems or surface reviews that match your standard and not generic AI comments.
My company uses some automatic AI PR review bots, and they annoy me more than they help. Lots of useless comments
I would just put a PR_REVIEW.md file in the repo an have a CI agent run it on the diff/repo and decide pass or reject. In this file there are rules the code must be evaluated against. It could be project level policy, you just put your constraints you cannot check by code testing. Of course any constraint that can be a code test, better be a code test.
My experience is you can trust any code that is well tested, human or AI generated. And you cannot trust any code that is not well tested (what I call "vibe tested"). But some constraints need to be in natural language, and for that you need a LLM to review the PRs. This combination of code tests and LLM review should be able to ensure reliable AI coding. If it does not, iterate on your PR rules and on tests.
`gh pr diff num` is an alternative if you have the repo checked out. One can then pipe the output to one's favorite llm CLI and create a shell alias with a default review prompt.
> My company uses some automatic AI PR review bots, and they annoy me more than they help. Lots of useless comments
One way to make them more useful is to ask to list the topN problems found in the change set.
Hum? I just tell claude to review pr #123 and it uses 'gh' to do everything, including responding to human comments! Feedback from coleagues has been awesome.
Not my experience. Most Claude reviews are horrible and if I catch you replying with Claude (any AI really) under your own name you are gonna get two earfulls. Don't get me wrong, if you have an AI bot that I can have a convo with on the PR, sure. But you passing their stuff off as you: do that twice and you're dead to me.
Now, I use it as well to review, just like you mention it pulls it via gh, has all the source to reference and then tells me what it thinks. But it can't be left alone.
Similarly people have been trying to pass root cause analyses off as true and they sound confident but have holes like a good Swiss cheese.
Good thing I work on an old C++ code base where it's impossible for AI to go through the millions of lines that all interact horribly in unpredictable ways.
I recently started using LLMs to review my code before asking for a more formal review from colleagues. It's actually been surprisingly useful - why waste my colleagues time with small obvious things? But it's also gone much further than that sometimes with deeper reviews points. Even when I don't agree with them it's great having that little bit more food for thought - if anything it helps seed the review
while this approach is useful, i think the diff is too small to catch a lot of bugs.
i use https://www.coderabbit.ai/ and it tends to be aware of files that aren't in the diff, and definitely can see the rest of the file your are editing (not just the lines in the diff)
I have been using Codex as a code review step and it has been magnificent, truly. I don’t like how it writes code, but as a second line of defence I’m getting better code reviews out of it than I’ve ever had from a human.
Smaug123|2 months ago
(That's if you don't want to go full Codex and have an agent play around with the PR. Personally I find that GPT-5.2 xhigh is incredibly good at analysing diffs-plus-context without tools.)
infl8ed|2 months ago
verdverm|2 months ago
Here are the commits, the tasks were not trivial
https://github.com/hofstadter-io/hof/commits/_next/
Social posts and pretty pictures as I work on my custom copilot replacement
https://bsky.app/profile/verdverm.com
fweimer|2 months ago
mvanbaak|2 months ago
Sure, they will leave comments about common made errors (your editor should already warn about this before you even commit it) etc. But to notify about this weird thing that was done to make sure something a lot of customers wanted is made reality.
also, PR's are created to share knowledge. Questions and answers on them are to spread knowledge in the team. AI does not do that.
[edit] Added the part about knowledge sharing
simonw|2 months ago
But they're fantastic at spotting dumb mistakes or low-hanging fruit for improvements!
And having the AI spot those for you first means you don't waste your team's valuable reviewing time on the simple stuff that you could have caught early.
bilalq|2 months ago
AI code review does not replace human review. But AI reviewers will often notice little things that a human may miss. Sometimes the things they flag are false positives, but it's still worth checking in on them. If even one logical error or edge case gets caught by an AI reviewer that would've otherwise made it to production with just human review, it's a win.
Some AI reviewers will also factor in context of related files not visible in the diff. Humans can do this, but it's time consuming, and many don't.
AI reviews are also a great place to put "lint" like rules that would be complicated to express in standard linting tools like Eslint.
We currently run 3-4 AI reviewers on our PRs. The biggest problem I run into is outdated knowledge. We've had AI reviewers leave comments based on limitations of DynamoDB or whatever that haven't been true for the last year or two. And of course it feels tedious when 3 bots all leave similar comments on the same line, but even that is useful as reinforcement of a signal.
CuriouslyC|2 months ago
In order to be time efficient, human review should focus on the 'what' rather than the 'how' in most cases.
unknown|2 months ago
[deleted]
orbisvicis|2 months ago
AI is a mixed bag. I'm the type of person who is compelled to have a deep understanding of the code they write. Writing my own code vs fixing AI-generated code is a wash timewise, and the AI generated code is so limited (assuming you pared down the uselessly elaborate code and fixed all the critical runtime bugs) as to restrict further iterations. And I'm talking about uploading an architectural blueprint with every function a documented but otherwise empty stub.
AI is a great bellwether. I bounce ideas off AI for a consensus. The closest equivalent is reading StackOverflow comments. I once offhandedly complained that python had no equivalent for setattr at class scope (as __class__ is not defined until after __new__), but AI showed me how to provide a closure in __prepare__ over the class namespace, which was introduced in 3.3 (?) and to which I paid little attention. What a gem.
AI is great for learning. If you follow a textbook or blog or paper and don't understand, AI can clarify. But be careful with less structured learning - it is important to build a full mental model accounting for every possible outcome and explanation, otherwise you're susceptible to hallucinations. I remember my first derivative in which the end result could be obtained via two separate proofs, one of which would imply an incorrect calculus. You've got to play with it until you're satisfied your mental model accounts for all the facts.
Because AI facilitates learning so easily I feel the best skills for a future generation are those pertaining to memory and retention. Ya know, assuming we don't develop individualized and personalized AI that can model your next word and act as a personal memex.
AI is great as a search assistant. I have much better recall when rereading content. Thus, I prefer to ask AI to search for links to content I vaguely recall, rather than ask AI for it's own summary or recollection.
Despite being terrible at writing decent code, AI provides fantastic code review. It catches everything from subtle high-level errors - even potential errors that haven't yet occured - to api mismatches to syntax errors. I actually wish I was as fluent at namespaces and cgroups as AI, and I'm well versed.
AI is interesting at comments. It can be hard to provide germane comments while wading through the weeds. I feel the best comments are written a few days after the code is complete. AI provides that fresh perspective instantly. And if AI can't understand your code, you had better improve your comments.
I'm about to try AI for unit tests. I prefer hypothesis tests. I took a quick glance at the generated code, and it seemed overly complicated. So I'm not optimistic.
Outside of software, AI is great for things you don't really care about. Yeah it might hallucinate, but my back-and-forth with AI is more about holding up a mirror to myself, revealing inner biases. Especially useful for interior decoration / remodeling.
Of course, take everything I said with a grain of salt as security at my company actively discourages AI. So, everything I said applies to free/cheap plans only. And I haven't tried skills yet.
ohans|2 months ago
As for PR reviews, assuming you've got linting and static analysis out the way, you'd need to enter a sufficiently reasonable prompt to truly catch problems or surface reviews that match your standard and not generic AI comments.
My company uses some automatic AI PR review bots, and they annoy me more than they help. Lots of useless comments
visarga|2 months ago
My experience is you can trust any code that is well tested, human or AI generated. And you cannot trust any code that is not well tested (what I call "vibe tested"). But some constraints need to be in natural language, and for that you need a LLM to review the PRs. This combination of code tests and LLM review should be able to ensure reliable AI coding. If it does not, iterate on your PR rules and on tests.
hrpnk|2 months ago
> My company uses some automatic AI PR review bots, and they annoy me more than they help. Lots of useless comments
One way to make them more useful is to ask to list the topN problems found in the change set.
MYEUHD|2 months ago
You can also append ".patch" and get a more useful output
zedascouves|2 months ago
We are sooo gonna get replaced soon...
tharkun__|2 months ago
Now, I use it as well to review, just like you mention it pulls it via gh, has all the source to reference and then tells me what it thinks. But it can't be left alone.
Similarly people have been trying to pass root cause analyses off as true and they sound confident but have holes like a good Swiss cheese.
didibus|2 months ago
Colleague's feedback:
Claude> Address comments on PR #123
porise|2 months ago
ocharles|2 months ago
oldmanrahul|2 months ago
The objective of this initial review is to catch those low-hanging fruit that your colleagues would waste cycles on.
LLMs can catch syntax and basic semantics. Peers can spend time on more interesting things like design and relevant biz context.
danlamanna|2 months ago
afro88|2 months ago
elliottkember|2 months ago
I didn't see this mentioned, but we've been running bugbot for a while now and it's very good. It catches so many subtle bugs.
bastawhiz|2 months ago
howToTestFE|2 months ago
i use https://www.coderabbit.ai/ and it tends to be aware of files that aren't in the diff, and definitely can see the rest of the file your are editing (not just the lines in the diff)
petesergeant|2 months ago
syndacks|2 months ago
matt3210|2 months ago
bhl|2 months ago
also works if you have the GitHub cli installed. Would setup an AGENTS.md or SKILL.md to instruct an agent on how to use gh too.
IshKebab|2 months ago
jadenconelly6|2 months ago
[deleted]
mehdibl|2 months ago
sgt101|2 months ago