top | item 47142164

(no title)

ej88 | 5 days ago

Really interesting updates to their 2025 experiment.

Repeat devs from the original experiment went from 0-40% slowdown to now -10-40% speedup - and METR estimates this as a 'lower-bound'

more devs saying they dont even want to do 50% of their work without AI, even for 50/hr

30-50% of devs decided not to submit certain tasks without AI, missing the tasks with the highest uplift

it also seems like there is a skill gap - repeat devs from the first study are more productive with ai tools than newly recruited ones with variable experience

overall it seems like the high preference for devs to use AI is actually hurting METR's ability to judge their speedup, due to a refusal to do tasks without it. imo this is indirectly quite supportive for ai coding's productivity claims.

discuss

roxolotl|5 days ago

The finding of the first study was people cannot judge their performance with these tools. So I don’t think the lack of individuals not willing to work without them is indicative of productivity improvements. I think it’s indicative of them being enjoyable to use.

logicprog|5 days ago

It was claimed to find that, but I don't think it did. It compared developers' beliefs about average speed up across tasks, measured by asking them once at the end, compared to the average comparative speed measured per task and then averaged. That's measuring two different things, and all kinds of things could mass up developers' fuzzy recollection of the gestalt of several tasks (such as recency bias and question/study framing) that wouldn't effect it if you asked them right after; moreover, when tasks were broken down by task type, the speed up/slow down results actually matched developers' qualitative comments.

judahmeek|4 days ago

There are some people participating in the study who will fire & forget instructions to Claude/Codex running in parallel worktrees, but would really struggle if they were required to work on their project without AI assistance.

So while some study participants probably are seeing an actual speedup because of the discipline with which they manage their codebase's structure & documentation, other study participants are actually getting worse at non-AI coding.

...and METR's study can't tell which is which because METR's study isn't using any sort of codebase quality metrics for grounding.