top | item 45541998

(no title)

Can someone elucidate us as to how so many platforms (ChatGPT, Gemini, Claude, etc etc) all sprung up so quickly? How did the engineering teams immediately know how to go about doing this kind of tech with LLMs and DNNs and whatnot?

discuss

dwohnitmok|4 months ago

By 2020/2021 with the release of GPT-3, the trajectory of a lot of the most obvious product directions had already become clear. It was mainly a matter of models becoming capable enough to unlock them.

E.g. here's a forecast of 2021 to 2026 from 2021, over a year before ChatGPT was released. It hits a lot of the product beats we've come to see as we move into late 2025.

https://www.lesswrong.com/posts/6Xgy6CAf2jqHhynHL/what-2026-...

(The author of this is one of the authors of AI 2027: https://ai-2027.com/)

Or e.g. AI agents (this is a doc from about six months before ChatGPT was released: https://www.lesswrong.com/posts/kpPnReyBC54KESiSn/optimality...)

airstrike|4 months ago

Yeah, no. The whole 2025-2026 is 10 years away at best.

bryanlarsen|4 months ago

This is the paper that kicked off the current generation: https://proceedings.neurips.cc/paper_files/paper/2017/file/3....

That was 2017. And of course Google & UofT were working on it for many years before the paper was published.

rcxdude|4 months ago

It's not much different to other ML, pretty much it's on a bigger and more expensive scale. So once someone figured out the rough recipie (NN architecture, ludicrous scale of weights and data, reinforcement learning tuning), it's not hard for other experts in the field to replicate, so long as they have the resources. Deepseek was pretty much a side project, for example.

nemomarx|4 months ago

Was it that quickly? GPT 3 is where I would kind of put the start of this and that was in 2020, they had to work on the technology for quite a while before it got like this. Everyone else has been able to follow their progress and see what works.

dantyti|4 months ago

GPT 2 didn't have a chat interface but had made a splash in some circles (think spam-adjacent).

Edit: mixed up my dates claiming DALL E came out before GPT 3

saghm|4 months ago

I imagine it wasn't as immediate as it might look on the outside. If they all were working independently on similar ideas for a while, one of them launching their product might have caused the others to scramble to get theirs out as well to avoid missing the train.

I think it's also worth pointing out that the polish on these products was not actually there on day one. I remember the first week or so after ChatGPT's initial launch being full of stories and screenshots of people fairly easily getting around some of the intended limitations with silly methods like asking it to write a play where the dialogue has the topic it refused to talk about directly or asking it to give examples about what types of things it's not allowed to say in response to certain questions. My point isn't that there wasn't a lot of technical knowledge that went into the initial launch, but that it's a bit of an oversimplification to view things at a binary where people didn't know how to do it before, but then they did.

icyfox|4 months ago

All of the products you mention already had research teams (in the case of ChatGPT and Claude that actually predated most of their engineers). So knowing how to build small language models was always in their wheel house. Scaling up to larger LLMs required a few algorithmic advancements but for the most part it was a question of sourcing more data and more compute. The remarkable part of transformers is their scaling laws, which let us achieve much better models without having to reinvent new architecture.

iamflimflam1|4 months ago

Once you have the weights, actually running these models is easy. The code is not complicated - they are just huge in terms of memory requirements.
Deep learning has now been around for a long time. Running these models is well understood.

obviously running them at scale for multiple users is more difficult.

The actual front ends are not complicated - as is evidenced by the number of open source equivalents.

0xfeba|4 months ago

Intersection of cloud compute power being plentiful combined with existing LMs. As I understand it, right now, it's really just throwing compute power at existing LMs to learn on gigantic datasets.

warkdarrior|4 months ago

"so quickly" meaning over the last 3 years? (ChatGPT was launched in Nov 2022)