top | item 47019255

(no title)

nkmnz | 15 days ago

> A study from METR found that when developers used AI tools, they estimated that they were working 20% faster, yet in reality they worked 19% slower. That is nearly a 40% difference between perceived and actual times!

It’s not. It’s either 33% slower than perceived or perception overestimates speed by 50%. I don’t know how to trust the author if stuff like this is wrong.

discuss

order

jph00|15 days ago

> I don’t know how to trust the author if stuff like this is wrong.

She's not wrong.

A good way to do this calculation is with the log-ratio, a centered measure of proportional difference. It's symmetric, and widely used in economics and statistics for exactly this reason. I.e:

ln⁡(1.2/0.81) = ln⁡(1.2)-ln⁡(0.81) ≈ 0.393

That's nearly 40%, as the post says.

nkmnz|14 days ago

so if the numbers were “99% slower than without AI but they thought they would be 99% fast”, you’d call that “they were 529% slower”, even though it doesn’t make sense to be more than 100% slower? And you’d not only expect everyone to understand that, but you really think it’s more likely a random person on the internet used a logarithmic scale than they just did bad math?

piker|15 days ago

I get caught up personally in this math as well. Is a charitable interpretation of the throwaway line that they were off by that many “percentage points”?

nkmnz|15 days ago

That would be correct, but also useless. It matters if 50pp are 50% vs. 100%, 75% vs. 125% or 100% vs. 150%.

regular_trash|15 days ago

Can you elaborate? This seems like a simple mistake if they are incorrect, I'm not sure where 33% or 50% come from here.

nkmnz|15 days ago

Their math is 120%-80%=40% while the correct math is (80-120)/120=-33% or (120-80)/80=+50%

It’s more obvious if you take more extreme numbers, say: they estimated to take 99% less time with AI, but it took 99% more time - the difference is not 198%, but 19900%. Suddenly you’re off by two orders of magnitude.

jph00|15 days ago

It's not a mistake. It's correct, and is a excellent way to present this information.

softwaredoug|15 days ago

Isn't the study a year old by now? Things have evolved very quickly in the last few months.

jascha_eng|15 days ago

Yes and if was done with people using cursor at the time and already had a few caveats back then about who was actually experienced with the tool etc.

Still an interesting observation. It was also on brown field open source projects which imo explains a bit why people building new stuff have vastly different experiences than this.

legulere|15 days ago

The exact numbers certainly would be different today, but you would probably still see the effect that there’s an overestimation of productivity

nkmnz|15 days ago

Yes. No agents, no deep research, no tools, and just Sonnet-3.5 and 3.7 - I’d love to see the same study today with Opus-4.6 and Codex-5.3