This idea that you can get good results from a bad process as long as you have good quality control seems… dubious, to say the least. “Sure, it’ll produce endless broken nonsense, but as long as someone is checking, it’s fine.” This, generally, doesn’t really work. You see people _try_ it in industry a bit; have a process which produces a high rate of failures, catch them in QA, rework (the US car industry used to be notorious for this). I don’t know of any case where it has really worked out.
Imagine that your boss came to you, the tech lead of a small team, and said “okay, instead of having five competent people, your team will now have 25 complete idiots. We expect that their random flailing will sometimes produce stuff that kinda works, and it will be your job to review it all.” Now, you would, of course, think that your boss had gone crazy. No-one would expect this to produce good results. But somehow, stick ‘AI’ on this scenario, and a lot of people start to think “hey, maybe that could work.”
Reviewing code from less experienced or unmotivated people is also very taxing, both in a cognitive and emotional sense. It will never approach a really good level of quality because you just give up after 4 rounds of reviews on the same feature.
Right, this is the exact opposite of the best practices that Edward Deming helped develop in Japan, then brought to the west.
Quality needs to come from the process, not the people.
Choosing to use a process known to be flawed, then hoping that people will catch the mistakes, doesn't seem like a great idea if the goal is quality.
The trouble is that LLMs can be used in many ways, but only some of those ways play to their strengths. Management have fantasies of using AI for everything, having either failed to understand what it is good for, or failed to learn the lessons of Japan/Deming.
What happens is a kind of feeling of developing a meta skill. It's tempting to believe the scope of what you can solve has expanded when you are self-assessed as "good" with AI.
Its the same with any "general" tech. I've seen it since genetic algorithms were all the rage. Everyone reaches for the most general tool, then assumes everything that tool might be used for is now a problem or domain they are an expert in, with zero context into that domain. AI is this times 100x, plus one layer more meta, as you can optimize over approaches with zero context.
Yep. All the process in the world won’t teach you to make a system that works.
The pattern I see over and over is a team aimlessly putting a long through tickets in sprints until an engineer who knows how to solve the problem gets it on track personally.
2. Quality control is key to good processes as well. Code review is literally a best practice in the software industry. Especially in BigTech and high-performing organizations. That is, even for humans, including those that could be considered the cream of the industry, code review is a standard step of the delivery process.
3. People have posted their GitHub profiles and projects (including on this very forum) to show how AI is working out for them. Browse through some of them and see how much "endless broken nonsense" you find. And if that seems unscientific, well go back to point 1.
I have a play project which hits these constraints a lot.
I have been messing around with getting AI to implement novel (to me) data structures from papers. They're not rocket science or anything but there's a lot of detail. Often I do not understand the complex edge cases in the algorithms myself so I can't even "review my way out of it". I'm also working in go which is usually not a very good fit for implementing these things because it doesn't have sum types; lack of sum types oten adds so much interface{} bloat it would render the data structure pointless. Am working around with codegen for now.
What I've had to do is demote "human review" a bit; it's a critical control but it's expensive. Rather, think more holistically about "guard rails" to put where and what the acceptance criteria should be. This means that when I'm reviewing the code I am reasonably confident it's functionally correct, leaving me to focus on whether I like how that is being achieved. This won't work for every domain, but if it's possible to automate controls, it feels like this is the way to go wherever possible.
The "principled" way to do this would be to use provers etc, but being more of an engineer I have resorted to ruthless guard rails. Bench tests that automatically fail if the runtime doesn't meet requirements (e.g. is O(n) instead of O(log n)) or overall memory efficiency is too low - and enforcing 100% code coverage from both unit tests AND fuzzing. Sometimes the cli agent is running for hours chasing indexes or weird bugs; the two main tasks are preventing it from giving up, and stopping it from "punting" (wait, this isn't working, let me first create a 100% correct O(n) version...) or cheating. Also reminding it to check AGAIN for slice sharing bugs which crop up a surprising % of the time.
The other "interesting" part of my workflow right now is that I have to manually shuffle a lot between "deep research" (which goes and reads all the papers and blogs about the data structure) and the cli agent which finds the practical bugs etc but often doesn't have the "firepower" to recognise when it's stuck in a local maximum or going around in circles. Have been thinking about an MCP that lets the cli agent call out to "deep research" when it gets really stuck.
The issue with the hypothetical is if you give a team lead 25 competent people they'd also get bad results. Or at least, the "team lead" isn't really leading their team on technical matters apart from fighting off the odd attempt to migrate to MongoDB and hoping that their people are doing the right thing. The sweet spot for teams is 3-6 people and someone more interested in empire building than technical excellence can handle maybe around 9 people and still do a competent job. It doesn't depend much on the quality of the people.
The way team leads seem to get used is people who are good at code get a little more productive as more people are told to report to them. What is happening now is the senior-level engineers all automatically get the same option: a team of 1-2 mid-level engineers on the cheap thanks to AI which is entirely manageable. And anyone less capable gets a small team, a rubber duck or a mentor depending on where they fall vs LLM use.
Of course, the real question is what will happen as the AIs get into the territory traditionally associated with 130+ IQ ranges and the engineers start to sort out how to give them a bit more object persistence.
AI-generated code can be useful in the early stages of a project, but it raises concerns in mature ones. Recently, a 280kloc+ Postgres parser was merged into Multigres (https://github.com/multigres/multigres/pull/109) with no public code review. In open source, this is worrying. Many people rely on these projects for learning and reference. Without proper review, AI-generated code weakens their value as teaching tools, and more importantly the trust in pulling as dependencies. Code review isn’t just about bugs, it’s how contributors learn, understand design choices, and build shared knowledge. The issue isn’t speed of building software (although corporations may seem to disagree), but how knowledge is passed on.
I oversaw this work, and I'm open to feedback on how things can be improved. There are some factors that make this particular situation different:
This was an LLM assisted translation of the C parser from Postgres, not something from the ground up.
For work of this magnitude, you cannot review line by line. The only thing we could do was to establish a process to ensure correctness.
We did control the process carefully. It was a daily toil. This is why it took two months.
We've ported most of the tests from Postgres. Enough to be confident that it works correctly.
Also, we are in the early stages for Multigres. We intend to do more bulk copies and bulk translations like this from other projects, especially Vitess. We'll incorporate any possible improvements here.
The author is working on a blog post explaining the entire process and its pitfalls. Please be on the lookout.
I was personally amazed at how much we could achieve using LLM. Of course, this wouldn't have been possible without a certain level of skill. This person exceeds all expectations listed here: https://github.com/multigres/multigres/discussions/78.
I am falling into a pattern of treating AI coding like a drunk mid-level dev: "I saw those few paragraphs of notes you wrote up on a napkin, and stayed up late Saturday night while drinking and spat out this implementation. you like?"
So I can say to myself, "No, do not like. But the overall gist at least started in the right direction, so I can revise it from here and still be faster than had I done it myself on Monday morning."
> It’s very time consuming and 80% of the time I end up wondering if it would’ve been quicker to just do it all by myself right from the start.
Yes, this. Every time I read these sort of step by step guides to getting the best results with coding agents it all just sounds like boatloads of work that erase the efficiency margins that AI is supposed to bring in the first place. And anecdotally, I've found that to be true in practice as well.
Not to say that AI isn't useful. But I think knowing when and where AI will be useful is a skill in and of itself.
I think I’m working at lower levels, but usually my flow is:
- I start to build or refactor the code structure by myself creating the basic interfaces or skip to the next step when they already exist. I’ll use LLMs as autocomplete here.
- I write down the requirements and tell which files are the entry point for the changes.
- I do not tell the agent my final objective, only one step that gets me closer to it, and one at a time.
- I watch carefully and interrupt the agent as soon as I see something going wrong. At this point I either start over if my requirement assumptions were wrong or just correct the course of action of the agent if it was wrong.
Most of the issues I had in the past were from when I write down a broad objective that requires too many steps at the beginning. Agents cannot judge correctly when they finished something.
I have a similar, though not as detailed, process. I do the same as you up to the PRD, then give it the PRD and tell it the high level architecture, and ask it to implement components how I want them.
It's still time-consuming, and it probably would be faster for me to do it myself, but I can't be bothered manually writing lines of code any more. I maybe should switch to writing code with the LLM function by function, though.
Yeah, sounds like it would have been far quicker to use the AI to give you a general overview of approaches/libraries/language features/etc, and then done the work yourself.
> If you’re a nitpicky code reviewer, I think you will struggle to use AI tooling effectively. [...] Likewise, if you’re a rubber-stamp code reviewer, you’re probably going to put too much trust in the AI tooling.
So in other words, if you are good at code review you are also good enough at writing code that you will be better off writing it yourself for projects you will be responsible for maintaining long term. This is true for almost all of them if you work at a sane place or actually care about your personal projects. Writing code for you is not a chore and you can write it as fluently and quickly as anything else.
Your time "using AI" is much better spent filling in the blanks when you're unfamiliar with a certain tool or need to discover a new one. In short, you just need a few google searches a day... just like it ever was.
I will admit that modern LLMs have made life easier here. AI summaries on search engines have indeed improved to the point where I almost always get my answer and I no longer get hung up meat-parsing poorly written docs or get nerd-sniped pondering irrelevant information.
Code review is part of the job, but one of the least enjoyable parts. Developers like _writing_ and that gives the most job satisfaction. AI tools are helpful, but inherently increases the amount of code we have to review with more scrutiny than my colleagues because of how unpredictable - yet convincing - it can be. Why did we create tools that do the fun part and increase the non-fun part? Where are the "code-review" agents at?
Maybe I'm weird but I don't actually enjoy the act of _writing_ code. I enjoy problem solving and creating something. I enjoy decomposing systems and putting them back together in a better state, but actually manually typing out code isn't something I enjoy.
When I use an LLM to code I feel like I can go from idea to something I can work with in much less time than I would have normally.
Our codebase is more type-safe, better documented, and it's much easier to refactor messy code into the intended architecture.
Maybe I just have lower expectations of what these things can do but I don't expect it to problem solve. I expect it to be decent at gathering relevant context for me, at taking existing patterns and re-applying them to a different situation, and at letting me talk shit to it while I figure out what actually needs to be done.
I especially expect it to allow me to be lazy and not have to manually type out all of that code across different files when it can just generate them it in a few seconds and I can review each change as it happens.
>
Code review is part of the job, but one of the least enjoyable parts. Developers like _writing_ and that gives the most job satisfaction.
At least for me, what gives the most satisfaction (even though this kind of satisfaction happens very rarely) if I discover some very elegant structure behind whatever has to be implemented that changes the whole way how you thought about programming (oroften even about life) for decades.
> Developers like _writing_ and that gives the most job satisfaction.
Is it possible that this is just the majority and there’s plenty of folks that dislike actually starting from nothing and the endless iteration to make something that works, as opposed to have some sort of a good/bad baseline to just improve upon?
I’ve seen plenty of people that are okay with picking up a codebase someone else wrote and working with the patterns and architecture in there BUT when it comes to them either needing to create new mechanisms in it or create an entirely new project/repo it’s like they hit a wall - part of it probably being friction, part not being familiar with it, as well as other reasons.
> Why did we create tools that do the fun part and increase the non-fun part? Where are the "code-review" agents at?
Presumably because that’s where the most perceived productivity gain is in. As for code review, there’s CodeRabbit, I think GitLab has their thing (Duo) and more options are popping up. Conceptually, there’s nothing preventing you from feeding a Git diff into RooCode and letting it review stuff, alongside reading whatever surrounding files it needs.
> Developers like _writing_ and that gives the most job satisfaction.
Not me. I enjoy figuring out the requirements, the high-level design, and the clever approach that will yield high performance, or reuse of existing libraries, or whatever it is that will make it an elegant solution.
Once I've figured all that out, the actual process of writing code is a total slog. Tracking variables, remembering syntax, trying to think through every edge case, avoiding off-by-one errors. I've gone from being an architect (fun) to slapping bricks together with mortar (boring).
I'm infinitely happier if all that can be done for me, everything is broken out into testable units, the code looks plausibly correct, and the unit tests for each function cover all cases and are demonstrably correct.
Because the goal of "AI" is not to have fun, it's to solve problems and increase productivity. I have fun programming too, but you have to realize the world isn't optimizing make things more fun.
Asking AI to stay true to my requested parameters is hard, THEY ALL DRIFT AWAY, RANDOMLY
When working on nftables syntax highlighters, I have 230 tokens, 2,500 state, and 50,000+ state transitions.
Some firm guidelines given to AI agents are:
1. Fully-deterministic LL(1) full syntax tree.
2. No use of Vim 'syntax keyword' statement
3. Use long group names in snake_case whose naming starts with 'nft_' prefix (avoids collision with other Vim namespaces)
4. For parts of the group names, use only nftables/src/parser_bison.y semantic action and token names as-is.
5. For each traversal down the syntax tree, append that non-terminal node name from parser_bison.y to its group names before using it.
With those 5 "simple" user-requested requirements, all AI agents drift away from at least each of the rules at seemingly random interval.
At the moment, it is dubiois to even trust the bit-length of each packet field.
Never mind their inability to construct a simple Vimscript.
I use AI agents mainly as documentation.
On the bright side, they are getting good at breaking down 'rule', 'chain_block stmt', and 'map_stmt_expr' (that '.' period we see at chaining header expressions together; just use the quoted words and paste in one of your nft rule statements.
Code review isn't the same as design review, nor are these the only type of things (coding and design) that someone may be trying to use AI for.
If you are going to use AI, and catch it's mistakes, then you need to have expertise in whatever it is you are using the AI for. Even if we limit the discussion just to coding, then being a good code reviewer isn't enough - you'd need to have skill at whatever you are asking the AI to do. One of the valuable things AI can do is help you code using languages and frameworks you are not familiar with, which then of course means you are not going to be competent to review the output, other than in most generic fashion.
A bit off topic, but it's weird to me to see the term "coding" make a comeback in this AI/LLM era. I guess it is useful as a way to describe what AI is good at - coding vs more general software developer, but how many companies nowadays hire coders as opposed to software developers (I know it used to be a thing with some big companies like IBM)? Rather than compartmentalized roles, it seems the direction nowadays is more expecting developers to be able to do everything from business analysis and helping develop requirements, to architecture/design and then full-stack development, and subsequent production support.
> Using AI agents correctly is a process of reviewing code. [...]
> Why is that? Large language models are good at producing a lot of code, but they don’t yet have the depth of judgement of a competent software engineer. Left unsupervised, they will spend a lot of time committing to bad design decisions.
Obviously you want to make course corrections sooner than later. Same as I would do with less experienced devs, talk through the high level operations, then the design/composition. Reviewing a large volume of unguided code is like waiting for 100k tokens to be written only to correct the premise in the first 100 and start over.
I love doing code review for colleagues since I know that it bolsters our shared knowledge, experience and standards. Code review for an external, stubborn, uncooperative AI? No thanks, that sounds like burnout.
No. The failure conditions of "AI agents" are not even close to classical human mistakes (the only one ones where code review has anything more than an infinitesimal chance to catch). There is absolutely no skill transfer and it is a poor excuse anyway since review was never going to catch anything anyway.
I think that I review code much differently than the author. When I'm reviewing code, my assumption is that the person writing it has already verified that it works. I am primarily looking for readability and code smells.
In an ideal world I'd probably be looking more at the actual logic of the code. However, everywhere I've worked it's a full time job just despirately trying to fight ballooning complexity from people who prioritize quick turn around over quality code.
code review can be almost as much effort as writing the code, especially when the code is not up to the expectations of the reviewer. this is fine, because you want two people (the original author, and the reviewer) on the code.
when reviewing AI code, not only will the effort needed by the reviewer increase, you also lose the second person (the author) looking at the code, because AI can't do that. it can produce code but not reason about or reflect on it like humans can.
- If I had to iterate as much with a Jr dev as CC on not highly difficult stuff ("of course, I'll just do X!" then X doesn't work, then "of course, the answer is Y!" then Y doesn't work, etc.) I probably would have fired them by now or just say "never mind, I'll do it myself" .
- On the other hand a Jr dev will (hopefully) learn as they go, get better each time, so a month from now they're not making the same mistakes. An LLM can't learn so until there's a new model they keep making the same mistakes (yes, within a session they can learn -- if the session doesn't get too long -- but not across sessions). Also, the Jr dev can test their solution (which may require more than just running unit tests) and iterate on it so that they only come to me when it works and/or they're stuck. Just yesterday, on a rather simple matter, I wasted so much time telling the LLM "that didn't work, try again".
This isn't some triviality you can throw aside as unimportant, it is the shape that the code has today, and limits and controls what it will have tomorrow.
It's how you make things intuitive, and it is equally how you ensure people follow a correct flow and don't trap themselves into a security bug.
I really disagree with this too, especially given the article's next line:
> ...You’ll be forever tweaking individual lines of code, asking for a .reduce instead of a .map.filter, bikeshedding function names, and so on. At the same time, you’ll miss the opportunity to guide the AI away from architectural dead ends.
I think a good review will often do both, and understand that code happens at the line level and also the structural level. It implies a philosophy of coding that I have seen be incredibly destructive firsthand — committing a bunch of shit that no one on a team understands and no one knows how to reuse.
Agreed. A program is made of names, these names are of the utmost importance. For understanding, and also for searchability.
I do a lot of code reviews, and one of the main things I ask for, after bug fixes, is renaming things for readers to understand at first read unambiguously and to match the various conventions we use throughout the codebase.
Ex: new dev wrote "updateFoo()" for a method converting a domain thing "foo" from its type in layer "a" to its type in layer "b", so I asked him to use "convertFoo_aToB()" instead.
This blog gets posted often but the content is usually lousy. Lots of specious assertions about the nature of software development that really give off a "I totally have this figured out" vibe. I can't help but feel that anyone who feels so about this young industry that changes so rapidly and is so badly performed at so many places, is yet to summit Mt. Stupid.
I think I'd actually have a use for an AI that could receive my empty public APIs (such as a C++ header file) as an input and produce a first rough implementation. Maybe this exists already, I don't know because I haven't done any serious vibe coding.
I think I'm good at code review, but we've all seen parts of the codebase where it's written by one teammate with specific domain knowledge and your option is to approve something you don't fully understand or to learn the background necessary to understand it.
In my experience, not having to learn the background is the biggest time saver provided by LLM coding (e.g. not having to read through API docs or confirm details of a file format or understand some algorithm). So in a way I feel like there is a fundamental tension.
I am good at code review, sure, but I don't like doing it. It's about as strong an engineering technique as coding at a whiteboard. I know I'm at a tiny fraction of my potential without debugging tools and for that reason code review on github is usually a waste of my time. I'll just write code thanks and I'll move the needle on quality by developing. As a reviewer I'll scan for smells but I assume that you too would be most effective if I left you make and clean up your own messes so long as they aren't egregious
> In my view, the best code review is structural. It brings in context from parts of the codebase that the diff didn’t mention.
That may be true for AI code.
But it would be pretty terrible for human-written code to bring this up after the code is written, wasting hours/days effort for lack of a little up-front communication on design.
AI makes routine code generation cheap -- only seconds/minutes and cents are being wasted -- but you essentially still need that design session.
I have received a few LLM produced PRs from peers from adjacent teams, in good faith but not familiar with the project, and they increasingly infuriate me. They were all garbage, but there’s a great asymmetry: it costs my peers nothing to generate them, it costs me precious time to refute them. And what can I do really? Saying “it’s irreparable garbage because the syntax might be right but it’s conceptually nonsense” but that’s not the most constructive take.
[+] [-] rsynnott|6 months ago|reply
Imagine that your boss came to you, the tech lead of a small team, and said “okay, instead of having five competent people, your team will now have 25 complete idiots. We expect that their random flailing will sometimes produce stuff that kinda works, and it will be your job to review it all.” Now, you would, of course, think that your boss had gone crazy. No-one would expect this to produce good results. But somehow, stick ‘AI’ on this scenario, and a lot of people start to think “hey, maybe that could work.”
[+] [-] Manfred|6 months ago|reply
[+] [-] HarHarVeryFunny|6 months ago|reply
Quality needs to come from the process, not the people.
Choosing to use a process known to be flawed, then hoping that people will catch the mistakes, doesn't seem like a great idea if the goal is quality.
The trouble is that LLMs can be used in many ways, but only some of those ways play to their strengths. Management have fantasies of using AI for everything, having either failed to understand what it is good for, or failed to learn the lessons of Japan/Deming.
[+] [-] jvanderbot|6 months ago|reply
Its the same with any "general" tech. I've seen it since genetic algorithms were all the rage. Everyone reaches for the most general tool, then assumes everything that tool might be used for is now a problem or domain they are an expert in, with zero context into that domain. AI is this times 100x, plus one layer more meta, as you can optimize over approaches with zero context.
[+] [-] monkeyelite|6 months ago|reply
The pattern I see over and over is a team aimlessly putting a long through tickets in sprints until an engineer who knows how to solve the problem gets it on track personally.
[+] [-] keeda|6 months ago|reply
2. Quality control is key to good processes as well. Code review is literally a best practice in the software industry. Especially in BigTech and high-performing organizations. That is, even for humans, including those that could be considered the cream of the industry, code review is a standard step of the delivery process.
3. People have posted their GitHub profiles and projects (including on this very forum) to show how AI is working out for them. Browse through some of them and see how much "endless broken nonsense" you find. And if that seems unscientific, well go back to point 1.
[+] [-] xyzzy123|6 months ago|reply
I have been messing around with getting AI to implement novel (to me) data structures from papers. They're not rocket science or anything but there's a lot of detail. Often I do not understand the complex edge cases in the algorithms myself so I can't even "review my way out of it". I'm also working in go which is usually not a very good fit for implementing these things because it doesn't have sum types; lack of sum types oten adds so much interface{} bloat it would render the data structure pointless. Am working around with codegen for now.
What I've had to do is demote "human review" a bit; it's a critical control but it's expensive. Rather, think more holistically about "guard rails" to put where and what the acceptance criteria should be. This means that when I'm reviewing the code I am reasonably confident it's functionally correct, leaving me to focus on whether I like how that is being achieved. This won't work for every domain, but if it's possible to automate controls, it feels like this is the way to go wherever possible.
The "principled" way to do this would be to use provers etc, but being more of an engineer I have resorted to ruthless guard rails. Bench tests that automatically fail if the runtime doesn't meet requirements (e.g. is O(n) instead of O(log n)) or overall memory efficiency is too low - and enforcing 100% code coverage from both unit tests AND fuzzing. Sometimes the cli agent is running for hours chasing indexes or weird bugs; the two main tasks are preventing it from giving up, and stopping it from "punting" (wait, this isn't working, let me first create a 100% correct O(n) version...) or cheating. Also reminding it to check AGAIN for slice sharing bugs which crop up a surprising % of the time.
The other "interesting" part of my workflow right now is that I have to manually shuffle a lot between "deep research" (which goes and reads all the papers and blogs about the data structure) and the cli agent which finds the practical bugs etc but often doesn't have the "firepower" to recognise when it's stuck in a local maximum or going around in circles. Have been thinking about an MCP that lets the cli agent call out to "deep research" when it gets really stuck.
[+] [-] roenxi|6 months ago|reply
The way team leads seem to get used is people who are good at code get a little more productive as more people are told to report to them. What is happening now is the senior-level engineers all automatically get the same option: a team of 1-2 mid-level engineers on the cheap thanks to AI which is entirely manageable. And anyone less capable gets a small team, a rubber duck or a mentor depending on where they fall vs LLM use.
Of course, the real question is what will happen as the AIs get into the territory traditionally associated with 130+ IQ ranges and the engineers start to sort out how to give them a bit more object persistence.
[+] [-] swaptr|6 months ago|reply
Edit: Reference to the time it took to open the PR: https://www.linkedin.com/posts/sougou_the-largest-multigres-...
[+] [-] sougou|6 months ago|reply
This was an LLM assisted translation of the C parser from Postgres, not something from the ground up.
For work of this magnitude, you cannot review line by line. The only thing we could do was to establish a process to ensure correctness.
We did control the process carefully. It was a daily toil. This is why it took two months.
We've ported most of the tests from Postgres. Enough to be confident that it works correctly.
Also, we are in the early stages for Multigres. We intend to do more bulk copies and bulk translations like this from other projects, especially Vitess. We'll incorporate any possible improvements here.
The author is working on a blog post explaining the entire process and its pitfalls. Please be on the lookout.
I was personally amazed at how much we could achieve using LLM. Of course, this wouldn't have been possible without a certain level of skill. This person exceeds all expectations listed here: https://github.com/multigres/multigres/discussions/78.
[+] [-] brap|6 months ago|reply
1. Give it requirements
2. Tell it to ask me clarifying questions
3. When no more questions, ask it to explain the requirements back to me in a formal PRD
4. I criticize it
5. Tell it to come up with 2 alternative high level designs
6. I pick one and criticize it
7. Tell it to come up with 2 alternative detailed TODO lists
8. I pick one and criticize it
9. Tell it to come up with 2 alternative implementations of one of the TODOs
10. I pick one and criticize it
11. Back to 9
I usually “snapshot” outputs along the way and return to them to reduce useless context.
This is what produces the most decent results for me, which aren’t spectacular but at the very least can be a baseline for my own implementation.
It’s very time consuming and 80% of the time I end up wondering if it would’ve been quicker to just do it all by myself right from the start.
[+] [-] codingdave|6 months ago|reply
I am falling into a pattern of treating AI coding like a drunk mid-level dev: "I saw those few paragraphs of notes you wrote up on a napkin, and stayed up late Saturday night while drinking and spat out this implementation. you like?"
So I can say to myself, "No, do not like. But the overall gist at least started in the right direction, so I can revise it from here and still be faster than had I done it myself on Monday morning."
[+] [-] rco8786|6 months ago|reply
Yes, this. Every time I read these sort of step by step guides to getting the best results with coding agents it all just sounds like boatloads of work that erase the efficiency margins that AI is supposed to bring in the first place. And anecdotally, I've found that to be true in practice as well.
Not to say that AI isn't useful. But I think knowing when and where AI will be useful is a skill in and of itself.
[+] [-] jwrallie|6 months ago|reply
- I start to build or refactor the code structure by myself creating the basic interfaces or skip to the next step when they already exist. I’ll use LLMs as autocomplete here.
- I write down the requirements and tell which files are the entry point for the changes.
- I do not tell the agent my final objective, only one step that gets me closer to it, and one at a time.
- I watch carefully and interrupt the agent as soon as I see something going wrong. At this point I either start over if my requirement assumptions were wrong or just correct the course of action of the agent if it was wrong.
Most of the issues I had in the past were from when I write down a broad objective that requires too many steps at the beginning. Agents cannot judge correctly when they finished something.
[+] [-] stavros|6 months ago|reply
It's still time-consuming, and it probably would be faster for me to do it myself, but I can't be bothered manually writing lines of code any more. I maybe should switch to writing code with the LLM function by function, though.
[+] [-] scuff3d|6 months ago|reply
[+] [-] lapcat|6 months ago|reply
[+] [-] sublinear|6 months ago|reply
So in other words, if you are good at code review you are also good enough at writing code that you will be better off writing it yourself for projects you will be responsible for maintaining long term. This is true for almost all of them if you work at a sane place or actually care about your personal projects. Writing code for you is not a chore and you can write it as fluently and quickly as anything else.
Your time "using AI" is much better spent filling in the blanks when you're unfamiliar with a certain tool or need to discover a new one. In short, you just need a few google searches a day... just like it ever was.
I will admit that modern LLMs have made life easier here. AI summaries on search engines have indeed improved to the point where I almost always get my answer and I no longer get hung up meat-parsing poorly written docs or get nerd-sniped pondering irrelevant information.
[+] [-] QuantumNoodle|6 months ago|reply
[+] [-] jmcodes|6 months ago|reply
When I use an LLM to code I feel like I can go from idea to something I can work with in much less time than I would have normally.
Our codebase is more type-safe, better documented, and it's much easier to refactor messy code into the intended architecture.
Maybe I just have lower expectations of what these things can do but I don't expect it to problem solve. I expect it to be decent at gathering relevant context for me, at taking existing patterns and re-applying them to a different situation, and at letting me talk shit to it while I figure out what actually needs to be done.
I especially expect it to allow me to be lazy and not have to manually type out all of that code across different files when it can just generate them it in a few seconds and I can review each change as it happens.
[+] [-] simonw|6 months ago|reply
OpenAI's Codex Cloud just added a new feature for code review, and their new GPT-5-Codex model has been specifically trained for code review: https://openai.com/index/introducing-upgrades-to-codex/
Gemini and Claude both have code review features that work via GitHub Actions: https://developers.google.com/gemini-code-assist/docs/review... and https://docs.claude.com/en/docs/claude-code/github-actions
GitHub have their own version of this pattern too: https://github.blog/changelog/2025-04-04-copilot-code-review...
There are also a whole lot of dedicated code review startups like https://coderabbit.ai/ and https://www.greptile.com/ and https://www.qodo.ai/products/qodo-merge/
[+] [-] aleph_minus_one|6 months ago|reply
At least for me, what gives the most satisfaction (even though this kind of satisfaction happens very rarely) if I discover some very elegant structure behind whatever has to be implemented that changes the whole way how you thought about programming (oroften even about life) for decades.
[+] [-] mercutio2|6 months ago|reply
Senior developers love removing code.
Code review is probably my favorite part of the job, when there isn’t a deadline bearing down on me for my own tasks.
So I don’t really agree with your framing. Code reviews are very fun.
[+] [-] KronisLV|6 months ago|reply
Is it possible that this is just the majority and there’s plenty of folks that dislike actually starting from nothing and the endless iteration to make something that works, as opposed to have some sort of a good/bad baseline to just improve upon?
I’ve seen plenty of people that are okay with picking up a codebase someone else wrote and working with the patterns and architecture in there BUT when it comes to them either needing to create new mechanisms in it or create an entirely new project/repo it’s like they hit a wall - part of it probably being friction, part not being familiar with it, as well as other reasons.
> Why did we create tools that do the fun part and increase the non-fun part? Where are the "code-review" agents at?
Presumably because that’s where the most perceived productivity gain is in. As for code review, there’s CodeRabbit, I think GitLab has their thing (Duo) and more options are popping up. Conceptually, there’s nothing preventing you from feeding a Git diff into RooCode and letting it review stuff, alongside reading whatever surrounding files it needs.
[+] [-] crazygringo|6 months ago|reply
Not me. I enjoy figuring out the requirements, the high-level design, and the clever approach that will yield high performance, or reuse of existing libraries, or whatever it is that will make it an elegant solution.
Once I've figured all that out, the actual process of writing code is a total slog. Tracking variables, remembering syntax, trying to think through every edge case, avoiding off-by-one errors. I've gone from being an architect (fun) to slapping bricks together with mortar (boring).
I'm infinitely happier if all that can be done for me, everything is broken out into testable units, the code looks plausibly correct, and the unit tests for each function cover all cases and are demonstrably correct.
[+] [-] phito|6 months ago|reply
[+] [-] dearilos|6 months ago|reply
[+] [-] cmrdporcupine|6 months ago|reply
[+] [-] egberts1|6 months ago|reply
When working on nftables syntax highlighters, I have 230 tokens, 2,500 state, and 50,000+ state transitions.
Some firm guidelines given to AI agents are:
1. Fully-deterministic LL(1) full syntax tree.
2. No use of Vim 'syntax keyword' statement
3. Use long group names in snake_case whose naming starts with 'nft_' prefix (avoids collision with other Vim namespaces)
4. For parts of the group names, use only nftables/src/parser_bison.y semantic action and token names as-is.
5. For each traversal down the syntax tree, append that non-terminal node name from parser_bison.y to its group names before using it.
With those 5 "simple" user-requested requirements, all AI agents drift away from at least each of the rules at seemingly random interval.
At the moment, it is dubiois to even trust the bit-length of each packet field.
Never mind their inability to construct a simple Vimscript.
I use AI agents mainly as documentation.
On the bright side, they are getting good at breaking down 'rule', 'chain_block stmt', and 'map_stmt_expr' (that '.' period we see at chaining header expressions together; just use the quoted words and paste in one of your nft rule statements.
[+] [-] HarHarVeryFunny|6 months ago|reply
Code review isn't the same as design review, nor are these the only type of things (coding and design) that someone may be trying to use AI for.
If you are going to use AI, and catch it's mistakes, then you need to have expertise in whatever it is you are using the AI for. Even if we limit the discussion just to coding, then being a good code reviewer isn't enough - you'd need to have skill at whatever you are asking the AI to do. One of the valuable things AI can do is help you code using languages and frameworks you are not familiar with, which then of course means you are not going to be competent to review the output, other than in most generic fashion.
A bit off topic, but it's weird to me to see the term "coding" make a comeback in this AI/LLM era. I guess it is useful as a way to describe what AI is good at - coding vs more general software developer, but how many companies nowadays hire coders as opposed to software developers (I know it used to be a thing with some big companies like IBM)? Rather than compartmentalized roles, it seems the direction nowadays is more expecting developers to be able to do everything from business analysis and helping develop requirements, to architecture/design and then full-stack development, and subsequent production support.
[+] [-] scuff3d|6 months ago|reply
1. Stood up and managed my own Kubernetes clusters for my team
2. Docker, just so so much Docker
3. Developed CI/CD pipelines
4. Done more integration and integration testing then I care to think about
5. Written god knows how many requirements and produced and endless stream of diagrams and graphs for systems engineering teams
6. Don't a bunch of random IT crap because our infrastructure team can't be bothered
7. Wrote some code once in a while
[+] [-] karmakaze|6 months ago|reply
> Using AI agents correctly is a process of reviewing code. [...]
> Why is that? Large language models are good at producing a lot of code, but they don’t yet have the depth of judgement of a competent software engineer. Left unsupervised, they will spend a lot of time committing to bad design decisions.
Obviously you want to make course corrections sooner than later. Same as I would do with less experienced devs, talk through the high level operations, then the design/composition. Reviewing a large volume of unguided code is like waiting for 100k tokens to be written only to correct the premise in the first 100 and start over.
[+] [-] notachatbot123|6 months ago|reply
[+] [-] AshamedCaptain|6 months ago|reply
[+] [-] harimau777|6 months ago|reply
In an ideal world I'd probably be looking more at the actual logic of the code. However, everywhere I've worked it's a full time job just despirately trying to fight ballooning complexity from people who prioritize quick turn around over quality code.
[+] [-] em-bee|6 months ago|reply
when reviewing AI code, not only will the effort needed by the reviewer increase, you also lose the second person (the author) looking at the code, because AI can't do that. it can produce code but not reason about or reflect on it like humans can.
[+] [-] insane_dreamer|6 months ago|reply
- If I had to iterate as much with a Jr dev as CC on not highly difficult stuff ("of course, I'll just do X!" then X doesn't work, then "of course, the answer is Y!" then Y doesn't work, etc.) I probably would have fired them by now or just say "never mind, I'll do it myself" .
- On the other hand a Jr dev will (hopefully) learn as they go, get better each time, so a month from now they're not making the same mistakes. An LLM can't learn so until there's a new model they keep making the same mistakes (yes, within a session they can learn -- if the session doesn't get too long -- but not across sessions). Also, the Jr dev can test their solution (which may require more than just running unit tests) and iterate on it so that they only come to me when it works and/or they're stuck. Just yesterday, on a rather simple matter, I wasted so much time telling the LLM "that didn't work, try again".
[+] [-] shakna|6 months ago|reply
... Function names compose much of the API.
The API is the structure of the codebase.
This isn't some triviality you can throw aside as unimportant, it is the shape that the code has today, and limits and controls what it will have tomorrow.
It's how you make things intuitive, and it is equally how you ensure people follow a correct flow and don't trap themselves into a security bug.
[+] [-] AirMax98|6 months ago|reply
> ...You’ll be forever tweaking individual lines of code, asking for a .reduce instead of a .map.filter, bikeshedding function names, and so on. At the same time, you’ll miss the opportunity to guide the AI away from architectural dead ends.
I think a good review will often do both, and understand that code happens at the line level and also the structural level. It implies a philosophy of coding that I have seen be incredibly destructive firsthand — committing a bunch of shit that no one on a team understands and no one knows how to reuse.
[+] [-] jffhn|6 months ago|reply
I do a lot of code reviews, and one of the main things I ask for, after bug fixes, is renaming things for readers to understand at first read unambiguously and to match the various conventions we use throughout the codebase.
Ex: new dev wrote "updateFoo()" for a method converting a domain thing "foo" from its type in layer "a" to its type in layer "b", so I asked him to use "convertFoo_aToB()" instead.
[+] [-] 000ooo000|6 months ago|reply
[+] [-] glimshe|6 months ago|reply
[+] [-] tdeck|6 months ago|reply
In my experience, not having to learn the background is the biggest time saver provided by LLM coding (e.g. not having to read through API docs or confirm details of a file format or understand some algorithm). So in a way I feel like there is a fundamental tension.
[+] [-] conartist6|6 months ago|reply
[+] [-] habibur|6 months ago|reply
- Rewrite it yourself?
- Tell AI to generate it again? — will lead to worse code than the first.
- Write the long prompt (like 6 page) even longer and hope it works this time?
[+] [-] simonw|6 months ago|reply
[+] [-] jmull|6 months ago|reply
That may be true for AI code.
But it would be pretty terrible for human-written code to bring this up after the code is written, wasting hours/days effort for lack of a little up-front communication on design.
AI makes routine code generation cheap -- only seconds/minutes and cents are being wasted -- but you essentially still need that design session.
[+] [-] rollulus|6 months ago|reply
[+] [-] simianparrot|6 months ago|reply
[+] [-] praptak|6 months ago|reply