top | item 46142575

(no title)

deltaburnt | 2 months ago

I don’t think I’ve ever seen someone seriously argue that personal throwaway projects need thorough code reviews of their vibe code. The problem comes in when I’m maintaining a 20 year old code base used by anywhere from 1M to 1B users.

In other words you can’t vibe code in an environment where evaluating “does this code work” is an existential question. This is the case where 7k LOC/day becomes terrifying.

Until we get much better at automatically proving correctness of programs we will need review.

discuss

adriand|2 months ago

My point about my experience with this plugin isn’t that it’s a throwaway or meaningless project. My point is that it might be enough in some cases to verify output without verifying code. Another example: I had to import tens of thousands of records of relational data. I got AI to write the code for the import. All I verified was that the data was imported correctly. I didn’t even look at the code.

deltaburnt|2 months ago

In this context I meant throwaway as "low stakes" not "meaningless". Again, evaluating the output of a database import like that could be existensial for your company given the context. Not to mention there's many cases where evaluating the output isn't feasible for a human.

nilkn|2 months ago

Human code review does not prove correctness. Almost every software service out there contains bugs. Humans have struggled for decades to reliably produce correct software at scale and speed. Overall, humans have a pretty terrible track record of producing bug-free correct code no matter how much they double-check and review their code along the way.

cyphar|2 months ago

So the solution is to stop doing code reviews and just YOLO-merge everything? After all, everything is fucked already, how much worse could it get?

For the record, there are examples where human code review and design guidelines can lead to very low-bug code. NASA published their internal guidelines for producing safety-critical code[1]. The problem is that the development cost of software when using such processes is too high for most companies, and most companies don't actually produce safety-critical software.

My experience with the vast majority of LLM code submitted to projects I maintain is that it has subtle bugs that I managed to find through fairly cursory human review. The copilot code review feature on GitHub also tends to miss actual bugs and report nonexistent bugs, making it worse than useless. So in my view, the death of the benefits of human code review have been wildly exaggerated.

[1]: https://en.wikipedia.org/wiki/The_Power_of_10:_Rules_for_Dev...