(no title)
Tier3r
|
1 year ago
From the amount of data each successive generation used (which grew many orders of magnitude each time) to the decreasing, logarithmic performance, it's quite clear the steam is running out on shoving more data into it. If one plots the data to performance graph, its horribly logarithmic. In another perspective, the ability of LLMs to transfer learning actually decreases exponentially the larger they and the data sets get. This fits into the how humans have to specialise in topics because the mental models of one field is very difficult to transfer to another.
meiraleal|1 year ago
Tier3r|1 year ago
1) Creating new "harnesses" for models that connect to various systems, APIs, frameworks, etc. While this sounds "trivial", a lot of gains can come from this. Similar to how the voice version of ChatGPT was (apparently) amazing, all you really had to do was create an additional voice to text layer and another text to voice layer.
2) Increasing specialisation of models. I predict over time that end user AI companies (e.g those that just use models and not develop them), will use more and more specialised models. The current, almost monolithic, system where every service from text summary to homework help is plugged into the same model will slowly change.
kingkongjaffa|1 year ago
We haven't seen wholesale specialised models yet because creating foundation models is expensive and difficult and the current highest ROI is to make a general model.
unknown|1 year ago
[deleted]
cma|1 year ago
In what measure, loss? Loss can't go below 0 plus the inherent entropy in the text (other than that with overfitting it could reach nearer to 0, but not fully if it is next token and there are multiple same prefixes).
With respect to hallucinations 4 got incredibly better over 3
Tier3r|1 year ago
The inputs - data, compute and parameters - going into training these models have grown by many orders of magnitude between each gen. There's a lot of fuzziness about how much better each gen has gotten, but clearly 4 is not many orders of magnitude better than 3 by any reasonable definition. This mental model isn't useful to say how good each gen is, but it is quite useful to see the trend and make long term predictions.