top | item 42023604

(no title)

duchenne | 1 year ago

Training a 1B model on 1T tokens is cheaper than people might think. A H100 GPU can be rented for 2.5$ per hour and can train around 63k tokens per second for a 1B model. So you would need around 4,400 hours of GPU training costing only $11k And costs will keep going down.

discuss

order

lumost|1 year ago

Is there a handy table for this? My napkin math has either underestimated throughput by 2 orders of magnitude or the above estimate is high.

codetrotter|1 year ago

(1,000,000,000,000/63,000)/(60*60)

(1T tokens / 63k tokens per second) / (60 seconds per minute * 60 minutes per hour)

Is approx 4400 hours

So I guess that’s how the calculation went.

Or did you mean a source for the number of tokens per second?