top | item 45411043

(no title)

justlikereddit | 5 months ago

The exponential progress argument is frequently also misconstrued as a

>"we will get there by monotonously doing more of what we did previously"

Take the independent time being an SWE metric of the article. This is a rather new( metric for measuring AI capabilitie, it's also a good metric, it is directly measurable in a quantified way, unlike nebulous goal points such as "AGI/ASI".

It also doesn't necessarily predict any upheaval, which I also think is a good trait of a metric, we know it will be better when it hits 8, or 16 hours, but we can skip the hype and prophecies of civilizational transformation that are attached to terminology like "AGI/ASI".

Now the caveat is that a SWE-time metric is useful at the moment because it's an intra day timescale, but if we push this number to the point of comparing 48 hour vs 54 hour SWE-time models we can easily end up chasing abstractions that have little to no explanatory power as to how good this AI really is and what consists as a proper and good incremental improvement and what comes out as a numerical benchmark number that may or may not be artificial.

The same can be said of math-olympiad scores and many of the existing AI benchmarks.

In the past there existed a concept of narrow AI. We could take task A, make a narrow AI become good at it. But we would expect a different application to be needed for task B.

Now we have generalist AI, and we take the generalist AI and make it become good at task A because that is the flavor of the month metric, but maybe that doesn't translate for improving task B, which someone will come around to improving when that becomes flavor of the month.

The conclusion? There's probably no good singular metric to get stuck on and say

"this is it, this graph is the one, watch it go exponential and bring forth God"

We will instead skip, hop and jump between task-or-category specific metrics that are deemed significant at the moment and arms-race style pump them up until their relevance fades.

discuss

order

No comments yet.