top | item 42474534

(no title)

By my estimates, for this single benchmark, this is comparable cost to training a ~70B model from scratch today. Literally from 0 to a GPT-3 scale model for the compute they ran on 100 ARC tasks.

I double checked with some flop estimates (P100 for 12 hours = Kaggle limit, they claim ~100-1000x for O3-low, and x172 for O3-high) so roughly on the order of 10^22-10^23 flops.

In another way, using H100 market price $2/chip -> at $350k, that's ~175k hours. Or 10^24 FLOPs in total.

So, huge margin, but 10^22 - 10^24 flop is the band I think we can estimate.

These are the scale of numbers that show up in the chinchilla optimal paper, haha. Truly GPT-3 scale models.

discuss

No comments yet.