top | item 46918279

(no title)

rapind | 24 days ago

True, and that's usually what I'm doing now, but to be honest I'm also giving all of it's code at least a cursory glance.

Some of the things it occasionally does:

- Ignores conventions (even when emphasized in the CLAUDE.md)

- Decides to just not implement tests if gets spins out on them too much (it tells you, but only as it happens and that scrolls by pretty quick)

- Writes badly performing code (N+1)

- Does more than you asked (in a bad way, changing UIs or adding cruft)

- Makes generally bad assumptions

I'm not trying to be overly negative, but in my experience to date, you still need to babysit it. I'm interested though in the idea of using multiple models to have them perform independent reviews to at least flag spots that could use human intervention / review.

discuss

vidarh|23 days ago

Sure, but non of those things requires you to watch it work. They're all easy to pick up on when reviewing a finished change, which ideally should come after it's instructions have had it run linters, run sub agents that verify it has added tests, run sub agents doing a code review.

I don't want to waste my time reviewing a change the model can still significantly improve all by itself. My time costs far more than the models.