Ask HN: Are diffs still useful for AI-assisted code changes?
7 points| nuky | 1 month ago
Lately I’ve been feeling frustrated during reviews when an AI generates a large number of changes. Even if the diff is "small", it can be very hard to understand what actually changed in behavior or structure.
I started experimenting with a different approach: comparing two snapshots of the code (baseline and current) instead of raw line diffs. Each snapshot captures a rough API shape and a behavior signal derived from the AST. The goal isn’t deep semantic analysis, but something fast that can signal whether anything meaningful actually changed.
It’s intentionally shallow and non-judgmental — just signals, not verdicts.
At the same time, I see more and more LLM-based tools helping with PR reviews. Probabilistic changes reviewed by probabilistic tools feels a bit dangerous to me.
Curious how others here think about this: – Do diffs still work well for AI-generated changes? – How do you review large AI-assisted refactors today?
ccoreilly|1 month ago
The truth is we’re all still experimenting and shovels of all sizes and forms are being built.
nuky|1 month ago
What I keep running into is the step before reading tests or code: when a change is large or mechanical, I’m mostly trying to answer "did behavior or API actually change, or is this mostly reshaping?" so I know how deep to go etc.
Agree we’re all still experimenting here.
unknown|1 month ago
[deleted]
csomar|1 month ago
nuky|1 month ago
What I'm exploring is more about what we do with that structure once someone/smth starts generating thousands of changed lines: how to compress change into signals we can actually reason about.
Thank you for sharing. I'm actually trying your tool right now - it looks really interesting. Happy to exchange thoughts.
veunes|1 month ago
Diffs are still necessary, but they should act as a filter. If a diff is too complex for a human to parse in 5 minutes, it’s bad code, even if it runs. We need to force AI to write "atomically" and clearly; otherwise we're building legacy code that's unmaintainable without that same AI
nuky|1 month ago
DiabloD3|1 month ago
Its common to change git's diff to things like difftastic, so formatting slop doesn't trigger false diff lines.
You're probably better off, FWIW, just avoiding LLMs. LLMs cannot produce working code, and they're the wrong tool for this. They're just predicting tokens around other tokens, they do not ascribe meaning to them, just statistical likelihood.
LLM weights themselves would be far more useful if we used them to indicate statistical likelihood (ie, perplexity) of the code that has been written; ie, strange looking code is likely to be buggy, but nobody has written this tool yet.
nuky|1 month ago
nuky|1 month ago
My question is slightly orthogonal though: even with a cleaner diff, I still find it hard to quickly tell whether public API or behavior changed, or whether logic just moved around.
Not really about LLMs as reviewers — more about whether there are useful deterministic signals above line-level diff.
nuky|1 month ago
I ran into this problem while reviewing AI-gen refactors and started thinking about whether we’re still reviewing the right things. Mostly curious how others approach this.
uhfraid|1 month ago
just like any other patch, by reading it
veunes|1 month ago
nuky|1 month ago