Can someone elucidate us as to how so many platforms (ChatGPT, Gemini, Claude, etc etc) all sprung up so quickly? How did the engineering teams immediately know how to go about doing this kind of tech with LLMs and DNNs and whatnot?
By 2020/2021 with the release of GPT-3, the trajectory of a lot of the most obvious product directions had already become clear. It was mainly a matter of models becoming capable enough to unlock them.
E.g. here's a forecast of 2021 to 2026 from 2021, over a year before ChatGPT was released. It hits a lot of the product beats we've come to see as we move into late 2025.
It's not much different to other ML, pretty much it's on a bigger and more expensive scale. So once someone figured out the rough recipie (NN architecture, ludicrous scale of weights and data, reinforcement learning tuning), it's not hard for other experts in the field to replicate, so long as they have the resources. Deepseek was pretty much a side project, for example.
Was it that quickly? GPT 3 is where I would kind of put the start of this and that was in 2020, they had to work on the technology for quite a while before it got like this. Everyone else has been able to follow their progress and see what works.
I imagine it wasn't as immediate as it might look on the outside. If they all were working independently on similar ideas for a while, one of them launching their product might have caused the others to scramble to get theirs out as well to avoid missing the train.
I think it's also worth pointing out that the polish on these products was not actually there on day one. I remember the first week or so after ChatGPT's initial launch being full of stories and screenshots of people fairly easily getting around some of the intended limitations with silly methods like asking it to write a play where the dialogue has the topic it refused to talk about directly or asking it to give examples about what types of things it's not allowed to say in response to certain questions. My point isn't that there wasn't a lot of technical knowledge that went into the initial launch, but that it's a bit of an oversimplification to view things at a binary where people didn't know how to do it before, but then they did.
All of the products you mention already had research teams (in the case of ChatGPT and Claude that actually predated most of their engineers). So knowing how to build small language models was always in their wheel house. Scaling up to larger LLMs required a few algorithmic advancements but for the most part it was a question of sourcing more data and more compute. The remarkable part of transformers is their scaling laws, which let us achieve much better models without having to reinvent new architecture.
Intersection of cloud compute power being plentiful combined with existing LMs. As I understand it, right now, it's really just throwing compute power at existing LMs to learn on gigantic datasets.
dwohnitmok|4 months ago
E.g. here's a forecast of 2021 to 2026 from 2021, over a year before ChatGPT was released. It hits a lot of the product beats we've come to see as we move into late 2025.
https://www.lesswrong.com/posts/6Xgy6CAf2jqHhynHL/what-2026-...
(The author of this is one of the authors of AI 2027: https://ai-2027.com/)
Or e.g. AI agents (this is a doc from about six months before ChatGPT was released: https://www.lesswrong.com/posts/kpPnReyBC54KESiSn/optimality...)
airstrike|4 months ago
bryanlarsen|4 months ago
That was 2017. And of course Google & UofT were working on it for many years before the paper was published.
rcxdude|4 months ago
nemomarx|4 months ago
dantyti|4 months ago
Edit: mixed up my dates claiming DALL E came out before GPT 3
saghm|4 months ago
I think it's also worth pointing out that the polish on these products was not actually there on day one. I remember the first week or so after ChatGPT's initial launch being full of stories and screenshots of people fairly easily getting around some of the intended limitations with silly methods like asking it to write a play where the dialogue has the topic it refused to talk about directly or asking it to give examples about what types of things it's not allowed to say in response to certain questions. My point isn't that there wasn't a lot of technical knowledge that went into the initial launch, but that it's a bit of an oversimplification to view things at a binary where people didn't know how to do it before, but then they did.
icyfox|4 months ago
iamflimflam1|4 months ago
Deep learning has now been around for a long time. Running these models is well understood.
obviously running them at scale for multiple users is more difficult.
The actual front ends are not complicated - as is evidenced by the number of open source equivalents.
0xfeba|4 months ago
warkdarrior|4 months ago