top | item 35817624

(no title)

sacred_numbers | 2 years ago

I would bet money against that. Replicating GPT-4 pre-training with current hardware would cost about 40-50m in compute. Compute will continue to decrease in cost and algorithmic improvements may allow for more efficient training, but probably not 3 orders of magnitude in a few years. I think there will be plenty of open source models that will claim GPT-4 quality, and some of them will be close, but they will be models that used millions of dollars (probably from some corporate benefactor but possibly from crowdsourcing) in compute to train. You will probably be able to fine-tune and run inference on fairly cheap hardware, but you can't cheat scale. It's going to take a major innovation to move away from the expensive base model paradigm.

discuss

order

p1esk|2 years ago

Replicating GPT-4 pre-training with current hardware would cost about 40-50m in compute.

Source? My educated guess it’s somewhere between 10 to 100 times cheaper than that.

sacred_numbers|2 years ago

I did my own calculations based on plotting loss on benchmarks compared to models with known parameters and training data, as well as using a quote from Sam Altman that said that GPT-4 would not use very many more parameters than GPT-3. Based on this, I estimated that GPT-4 probably used about 250B parameters, and since I had an estimate for the total compute I was able to estimate that the training data was about 15T tokens. 250B parameters times 15T tokens times 6 (https://medium.com/@dzmitrybahdanau/the-flops-calculus-of-la...) means the compute was about 2.2510^25 FLOPs. I estimated that A100s cost about $1/hr and can process about 5.410^17 FLOPs at 50% efficiency per hour. Therefore, the compute cost would be (2.2510^25)/(5.410^17) or about $40 million.

Interestingly, my own calculations lined up pretty well with this calculation, although they approached the problem from a different direction (a leak by Morgan Stanley about how many GPUs OpenAI used to train GPT-4 as well as an estimate of how long it was trained): https://colab.research.google.com/drive/1O99z9b1I5O66bT78r9S...

Sam Altman has also stated that GPT-4 cost more than $100 million to train, and replication can cost 2-4x less compute. https://www.wired.com/story/openai-ceo-sam-altman-the-age-of...

If you know of an organization that can replicate GPT-4 for $400k to $4m I would love to know so that I can invest in them.

RelativeDelta|2 years ago

Especially if you consider that as compute costs decrease, so does the ability of scale players to process larger datasets.

If we extrapolate that relation, you eventually reach a point where the biggest player can collect and process the most information and produce an ever-evolving model to maintain that relation.

Better hope it's creators have your best interests at heart.

polski-g|2 years ago

What happens when people can network their GPU together (like SETI at home) and a group of thousands of consumers can train GPT-5000?