I took the “end” to mean the part of the exponential where it quickly trends towards infinity. So let’s say the x axis is time (by which you get more training data and more compute) and the y axis is model ability. So far, if we think we are in the beginning of the exponential, adding data/compute looks almost linear to the untrained eye in terms of model capability. But once you hit a threshold, where he thinks the model will start to generalize, a small amount of data/compute will result in a massive increase in model ability.
tylervigen|15 days ago