I think the OpenAI deal to lock wafers was a wonderful coup. OpenAI is more and more losing ground against the regularity[0] of the improvements coming from Anthropic, Google and even the open weights models. By creating a chock point at the hardware level, OpenAI can prevent the competition from increasing their reach because of the lack of hardware.[0]: For me this is really an important part of working with Claude, the model improves with the time but stay consistent, its "personality" or whatever you want to call it, has been really stable over the past versions, this allows a very smooth transition from version N to N+1.
bakugo|2 months ago
How did we get here? What went so wrong?
makeitdouble|2 months ago
I'm assuming you wouldn't see it as fine if the corporation was profitable.
> How did we get here?
We've always been there. Not that it makes it right, but that's an issue that is neither simple to fix nor something most law makers are guaranteed to want to fix in the first place.
Nothing in the rules stops you from cornering most markets, and an international companies with enough money can probably corner specific markets if they'd see a matching ROI.
zozbot234|2 months ago
hodgehog11|2 months ago
jandrese|2 months ago
frankchn|2 months ago
UncleOxidant|2 months ago
bri3d|2 months ago
Grosvenor|2 months ago
hodgehog11|2 months ago
What I can think of is that there may be a push toward training for exclusively search-based rewards so that the model isn't required to compress a large proportion of the internet into their weights. But this is likely to be much slower and come with initial performance costs that frontier model developers will not want to incur.
lofaszvanitt|2 months ago
lysace|2 months ago
2024 production was (according to openai/chatgpt) 120 billion gigabytes. With 8 billion humans that's about 15 GB per person.
GistNoesis|2 months ago
For training, their models have a certain number of memory needed to store the parameters, and this memory is touched for every example of every iteration. Big models have 10^12 (>1T )parameters, and with typical values of 10^3 examples per batch, and 10^6 number of iteration. They need ~10^21 memory accesses per run. And they want to do multiple runs.
DDR5 RAM bandwidth is 100G/s = 10^11, Graphics RAM (HBM) is 1T/s = 10^12. By buying the wafer they get to choose which types of memory they get.
10^21 / 10^12 = 10^9s = 30 years of memory access (just to update the model weights), you need to also add a factor 10^1-10^3 to account for the memory access needed for the model computation)
But the good news is that it parallelize extremely well. If you parallelize you 1T parameters, 10^3 times, your run time is brought down to 10^6 s = 12 days. But you need 10^3 *10^12 = 10^15 Bytes of RAM by run for weight update and 10^18 for computation (your 120 billions gigabytes is 10^20, so not so far off).
Are all these memory access technically required : No if you use other algorithms, but more compute and memory is better if money is not a problem.
Is it strategically good to deprive your concurrents from access to memory : Very short-sighted yes.
It's a textbook cornering of the computing market to prevent the emergence of local models, because customers won't be able to buy the minimal RAM necessary to run the models locally even just the inferencing part (not the training). Basically a war on people where little Timmy won't be able to get a RAM stick to play computer games at Xmas.
mebassett|2 months ago
daemonologist|2 months ago
Phelinofist|2 months ago
I already hate OpenAI, you don't have to convince me
codybontecou|2 months ago
beAbU|2 months ago
hnuser123456|2 months ago
nutjob2|2 months ago
With a bit of luck OpenAI collapses under its own weight sooner than later, otherwise we're screwed for several years.
malfist|2 months ago
mholm|2 months ago