top | item 45848991

(no title)

Strilanc | 3 months ago

Well, for example, consider this recent study that claimed developers using AI tools take 19% longer to finish tasks [1].

This was their methodology:

> we recruited 16 experienced developers from large open-source repositories (averaging 22k+ stars and 1M+ lines of code) that they’ve contributed to for multiple years. Developers provide lists of real issues (246 total) that would be valuable to the repository—bug fixes, features, and refactors that would normally be part of their regular work. Then, we randomly assign each issue to either allow or disallow use of AI while working on the issue.

Now consider the question of whether you expect this research to generalize. Do you expect that if you / your friends / your coworkers started using AI tools (or stopped using AI tools) that the difference in productivity would also be 19%? Of course not! They didn't look at enough people or contexts to get two sig figs of precision on that average, nor enough to expect the conclusion to generalize. Plus the AI tools are constantly changing, so even if the study was nailing the average productivity change it would be wrong a few months later. Plus the time period wasn't long enough for the people to build expertise, and "if I spend time getting good at this will it be worth it" is probably the real question we want answered. The study is so weak that I don't even feel compelled to trust the sign of their result to be predictive. And I would be saying the same thing if it reported 19% higher instead of 19% lower.

I don't want to be too harsh on the study authors; I have a hard time imagining any way to do better given resource constraints and real world practicalities... but that's kind of the whole problem with such studies. They're too small and too specific and that's really hard to fix. Honestly I think I'd trust five anecdotes at lunch more than most software studies (mainly because the anecdotes have the huge advantage of being from the same context I work in). Contrast with medical studies where I'd trust the studies over the anecdotes, because for all their flaws at least they actually put in the necessary resources.

To be pithy: maybe we upvote Carmack quotes more than software studies because Carmack quotes are informed by more written code than most software studies.

[1]: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...

discuss

mmooss|3 months ago

Taking into account issues like that reading critically, which is great and essential. Dismissing ideas on that basis - often done on HN generally, even for large medical studies - is intellectually lazy, imho:

Life is full of flaws and uncertainty; that is the medium in which we swim and breath and work. The solution is not to lie at the bottom until the ocean becomes pure H2O; the trick is to find value.

Mars008|3 months ago

> I don't want to be too harsh on the study authors

Well, I'll do it for you. There is much of attention grabbing bull*it. For example I've seen on LinkedIn study claiming 60% of Indians daily using AI in their jobs, and only 10% of Japanese. You can guess who did it, very patriotic, but far from the reality.