top | item 35819927

(no title)

sacred_numbers | 2 years ago

I did my own calculations based on plotting loss on benchmarks compared to models with known parameters and training data, as well as using a quote from Sam Altman that said that GPT-4 would not use very many more parameters than GPT-3. Based on this, I estimated that GPT-4 probably used about 250B parameters, and since I had an estimate for the total compute I was able to estimate that the training data was about 15T tokens. 250B parameters times 15T tokens times 6 (https://medium.com/@dzmitrybahdanau/the-flops-calculus-of-la...) means the compute was about 2.2510^25 FLOPs. I estimated that A100s cost about $1/hr and can process about 5.410^17 FLOPs at 50% efficiency per hour. Therefore, the compute cost would be (2.2510^25)/(5.410^17) or about $40 million.

Interestingly, my own calculations lined up pretty well with this calculation, although they approached the problem from a different direction (a leak by Morgan Stanley about how many GPUs OpenAI used to train GPT-4 as well as an estimate of how long it was trained): https://colab.research.google.com/drive/1O99z9b1I5O66bT78r9S...

Sam Altman has also stated that GPT-4 cost more than $100 million to train, and replication can cost 2-4x less compute. https://www.wired.com/story/openai-ceo-sam-altman-the-age-of...

If you know of an organization that can replicate GPT-4 for $400k to $4m I would love to know so that I can invest in them.

discuss

order

p1esk|2 years ago

Let's go through your guesstimates one by one:

1. We don't know what the number of parameters is, could be 175B, could be 250B, could be 400B. Ok, let's stick with 250B.

2. Training data: GPT-3 was trained on 300B tokens. It already used most of the high-quality data available on the internet, but let's say they somehow managed to find and prepare three times as much high quality data for GPT-4. This means GPT-4 was trained on about 1T tokens.

3. 5.4e+17 FLOPs/hour means 150TFlops, which is half of the BFLOAT16 max theoretical output, sounds reasonable.

4. $1/A100/hr is reasonable.

OK, so we need to divide your cost estimate by a factor of 15: Total cost to train GPT-4 comes out to be around $2.7M.

Regarding Altman's statement about "more than 100M to train GPT-4" - I'm pretty sure he was talking about the total cost to develop GPT-4, which includes a lot of experimentation and exploration, many training runs, and many other administrative costs which are not relevant to the cost of a single training run to reproduce the existing results. Just salaries alone: ~200 people worked on GPT-4 for let's say half a year, at $400k/year: 0.5 * 400k * 200 = $40M.