top | item 46746015

(no title)

czhu12 | 1 month ago

The problem I’ve been having is that when Claude generates copious amounts of code, it makes it way harder to review than small snippets one at a time.

Some would argue there’s no point reviewing the code, just test the implementation and if it works, it works.

I still am kind of nervous doing this in critical projects.

Anyone just YOLO code for projects that’s not meant to be one time, but fully intend to have to be supported for a long time? What are learnings after 3-6 months of supporting in production?

discuss

serial_dev|1 month ago

In a professional setting where you still have coding standards, and people will review your code, and the code actually reaches hundreds of thousands of real users, handling one agent at a time is plenty for me. The code output is never good enough, and it makes up stuff even for moderately complicated debugging ("Oh I can clearly see the issue now", I heard it ten times before and you were always wrong!)

I do use them, though, it helps me, search, understand, narrow down and ideate, it's still a better Google, and the experience is getting better every quarter, but people letting tens or hundreds of agents just rip... I can't imagine doing it.

For personal throwaway projects that you do because you want to reach the end output (as opposed to learning or caring), sure, do it, you verify it works roughly, and be done with it.

pron|1 month ago

This is my problem with the whole "can LLMs code?" discussion. Obviously, LLMs can produce code, well even, much like a champion golfer can get a hole in one. But can they code in the sense of "the pilot can fly the plane", i.e. barring a catastrophic mechanical malfunction or a once-in-a-decade weather phenomennon, the pilot will get the plane to its destination safely? I don't think so.

To me, someone who can code means someone who (unless they're in a detectable state of drunkenness, fatigue, illness, or distraction) will successfully complete a coding task commensurate with some level of experience or, at the very least, explain why exactly the task is proving difficult. While I've seen coding agents do things that truly amaze me, they also make mistakes that no one who "can code" ever makes. If you can't trust an LLM to complete a task anyone who can code will either complete or explain their failure, then it can't code, even if it can (in the sense of "a flipped coin can come up heads") sometimes emit impressive code.

Sateeshm|1 month ago

Exactly my experience too.

I also heard "I see the issue now" so many times because it missed or misunderstood something very simple.

KaiserPro|1 month ago

> people will review your code,

I mean you'd think. But it depends on the motivations.

At meta, we had league tables for reviewing code. Even then people only really looked at it if a) they were a nitpicking shit b) don't like you and wanted piss on your chips c) its another team trying to fix our shit.

With the internal claude rollout and the drive to vibe code all the things, I'm not sure that situation has got any better. Fortunately its not my problem anymore

prmoustache|1 month ago

> people will review your code,

People will ask LLM to review some slop made by LLM and they will be absolutely right!

There is no limit to lazyness.

idontwantthis|1 month ago

I just can’t get with this. There is so much beyond “works” in software. There are requirements that you didn’t know about and breaking scenarios that you didn’t plan for and if you don’t know how the code works, you’re not going to be able to fix it. Assuming an AI could fix any problem given a good enough prompt, I can’t write that prompt without sufficient knowledge and experience in the codebase. I’m not saying they are useless, but I cannot just prompt, test and ship a multiservice, asynchronous, multidb, zero downtime app.

zmmmmm|1 month ago

Yes this is one of my concerns.

Usually about 50% of my understanding of the domain comes from the process of building the code. I can see a scenario where large scale automated code works for a while but then quickly becomes unsupportable because the domain expertise isn't there to drive it. People are currently working off their pre-existing domain knowledge which is what allows them to rapidly and accurately express in a few sentences what an AI should do and then give decisive feedback to it.

The best counter argument is that AIs can explain the existing code and domain almost as well as they can code it to begin with. So there is a reasonable prospect that the whole system can sustain itself. However there is no arguing to me that isn't a huge experiment. Any company that is producing enormous amounts of code that nobody understands is well out over their skis and could easily find themselves a year or two down the track with huge issues.

atonse|1 month ago

I don’t know what your stack is, but at least with elixir and especially typescript/nextJS projects, and properly documenting all those pieces you mentioned, it goes a long way. You’d be amazed.

gen220|1 month ago

In my (admittedly conflict-of-interest, I work for graphite/cursor) opinion, asking CC to stack changes, and then having an automated reviewer agent help a lot with digesting and building conviction in otherwise-large changesets.

My "first pass" of review is usually me reading the PR stack in graphite. I might iterate on the stack a few times with CC before publishing it for review. I have agents generate much of my code, but this workflow has allowed me to retain ownership/understanding of the systems I'm shipping.

squirrellous|1 month ago

Not a direct answer to your question, but I’m recently trying to adopt the mindset of letting Claude “prove” to me with very high confidence that what they did works. The bar for this would be much higher than what I’d require for a human engineer. For example it can be near 100% test coverage, combined with advanced testing techniques like property-based tests and fuzz tests, and benchmarks if performance is a concern. I’d still have to skim through both the implementation and tests, but it doesn’t have to be a line by line review. This also forces me to establish a verifiable success criteria which is quite useful.

Results will vary depending on how automatically checkable a problem is, but I expect a lot of problems are amenable to some variation of this.

AstroBen|1 month ago

I think we'll start to see the results of that late this year, but it's a little early yet. Plenty of people are diving headfirst into it

To me it feels like building your project on sand. Not a good idea unless it's a sandcastle

linsomniac|1 month ago

I have Claude Code author changes, and then I use this "codex-review" skill I wrote that does a review of the last commit. You might try asking Codex (or whatever) to review the change to give you some pointers to focus on with your review, and also in your review you can see if Codex was on track or if it missed anything, maybe feed that back into your codex review prompt.

unknown|1 month ago

[deleted]

yencabulator|1 month ago

> when Claude generates copious amounts of code, it makes it way harder to review than small snippets one at a time.

I find Claude Code to be very steerable. Ask it to make small atomic commits and it will.

chasing|1 month ago

Yeah, it's not just my job to generate the code: It's my job to know the code. I can't let code out into the wild that I'm not 100% willing to vouch for.

zmmmmm|1 month ago

At a higher level, it goes beyond that. It's my job to take responsibility for code. At some fundamental level that puts a limit on how productive AI can be. Because we can only produce code as fast as responsibility takers can execute whatever processes they need to do to ensure sufficient due diligence is executed. In a lot of jurisdictions, human-in-loop line by line review is being mandated for code developed in regulatory settings. That pretty much caps the output at the rate of human review, which is to be honest, not drastically higher than coding itself anyway (Often I might invest 30% of the time to review a change as the developer took to do it).

It means there is no value in producing more code. Only value in producing better, clearer, safer code that can be reasoned about by humans. Which in turn makes me very sceptical about agents other than as a useful parallelisation mechanism akin to multiple developers working on separate features. But in terms of ramping up the level of automation - it's frankly kind of boring to me because if anything it make the review part harder which actually slows us down.

szundi|1 month ago

[deleted]