WingNews

kqr|6 months ago

Seems like it's on the line that's scaring people like AI 2027, isn't it? https://aisafety.no/img/articles/length-of-tasks-log.png

FergusArgyll|6 months ago

It's above the exponential line & right around the Super exponential line

Davidzheng|6 months ago

I actually think there's a high chance that this curve becomes almost vertical at some point around a few hours. I think in less than 1 hour regime, scaling the time scales the complexity which the agent must internalize. While after a few hours, limitations of humans means we have to divide into subtasks/abstractions each of which are bounded in complexity which must be internalized. And there's a separate category of skills which are needed like abstraction, subgoal creation, error correction. It's a flimsy argument but I don't see scaling time of tasks for humans as a very reliable metric at all.

qsort|6 months ago

Isn't that pretty much in line with what people were expecting? Is it surprising?

usaar333|6 months ago

No, this is below expectations on both Manifold and lesswrong (https://www.lesswrong.com/posts/FG54euEAesRkSZuJN/ryan_green...). Median was ~2.75 hours on both (which already represented a bearish slowdown).

Not massively off -- manifold yesterday implied odds this low were ~35%. 30% before Claude Opus 4.1 came out which updated expected agentic coding abilities downward.

dingnuts|6 months ago

It's not surprising to AI critics but go back to 2022 and open r/singularity and then answer: what "people" were expecting? Which people?

SamA has been promising AGI next year for three years like Musk has been promising FSD next year for the last ten years.

IDK what "people" are expecting but with the amount of hype I'd have to guess they were expecting more than we've gotten so far.

The fact that "fast takeoff" is a term I recognize indicates that some people believed OpenAI when they said this technology (transformers) would lead to sci fi style AI and that is most certainly not happening

umanwizard|6 months ago

What is METR?

ravendug|6 months ago

https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/metr-measu...

tunesmith|6 months ago

The 2h 15m is the length of tasks the model can complete with 50% probability. So longer is better in that sense. Or at least, "more advanced" and potentially "more dangerous".

Leary|6 months ago

https://metr.github.io/autonomy-evals-guide/gpt-5-report/

wisemang|6 months ago

To maybe save others some time METR is a group called Model Evaluation and Threat Research who

> propose measuring AI performance in terms of the length of tasks AI agents can complete.

Not that hard to figure out but the way people refer were referring to them made me think it stood for an actual metric.

(no title)

discuss