top | item 30063352

(no title)

You may find this blog post useful for thinking about AI scaling: https://www.alignmentforum.org/posts/k2SNji3jXaLGhBeYP/extra...

For general tasks like language modeling, we are still seeing predictable improvements (on the next-token-prediction loss) with increasing compute. We will very likely be able to scale things up by 10,000x or so and continue to see increasing performance.

But what does this mean for end users? We are probably going to see sigmoid-like curves, where qualitative features of these models (like being able to do math, or tell jokes, or tutor you in French, or provide therapy, or mediate international conflicts) will suddenly get a * lot * better at some point in the scaling curve. We saw this for simple arithmetic in the GPT-3 paper, where the small <1B param models were terrible at it, and then with 100B scale suddenly the model could do arithmetic with 80%+ accuracy.

Personally I would not expect diminishing returns with increased scale, instead there will be sudden leaps in ability that will be very economically valuable. And that is why Meta and others are so interested in scaling up these models.

discuss

No comments yet.