top | item 46995917

(no title)

_dwt | 18 days ago

Sorry for coming off combative - I'm mostly fatigued from "criti-hype" pieces we've been deluged with the last week. For what it's worth I think you're right about the inevitability but I also think it's worth pushing a bit against the pre-emptive shaping of the Overton window. I appreciate the comment.

I don't know how to encourage the kind of review that AI code generation seems to require. Historically we've been able to rely on the fact that (bluntly) programming is "g-loaded": smart programmers probably wrote better code, with clearer comments, formatted better, and documented better. Now, results that look great are a prompt away in each category, which breaks some subconscious indicators reviewers pick up on.

I also think that there is probably a sweet spot for automation that does one or two simple things and fails noisily outside the confidence zone (aviation metaphor: an autopilot that holds heading and barometric altitude and beeps loudly and shakes the stick when it can't maintain those conditions), and a sweet spot for "perfect" automation (aviation metaphor: uh, a drone that autonomously flies from point A to point B using GPS, radar, LIDAR, etc...?). In between I'm afraid there be dragons.

discuss

allanmacgregor|17 days ago

@_dwt don't worry you didn't I appreciate good discussion and criticism. The publication is new and I'm still trying to calibrate my voice and style for it.

>I don't know how to encourage the kind of review that AI code generation seems to require. Historically we've been able to rely on the fact that (bluntly) programming is "g-loaded": smart programmers probably wrote better code, with clearer comments, formatted better, and documented better. Now, results that look great are a prompt away in each category, which breaks some subconscious indicators reviewers pick up on.

I don't anyone knows for sure, we all are on the same boat trying to figure it out how to best work with AI; the pace of change is making it so incredibly difficult to keep or try things. I'm trying a bunch of stuff at the same time:

-https://structpr.dev/ - to try to rethink how we approach PR reading, organizing review (dog-fooding it right now so is mostly alpha)

- I have an article schedule next week talking about StrongDMs Software factory, there are some interesting ideas there like test holdouts

- Some experiments in the Elixir stack for code generation and verification that go beyond it looks great. AI can definetively create code that _looks_ great but there is plenty of research that shows a lot of AI generated code and test can have a high degree of false confidence.