(no title)
bluecoconut | 1 year ago
I double checked with some flop estimates (P100 for 12 hours = Kaggle limit, they claim ~100-1000x for O3-low, and x172 for O3-high) so roughly on the order of 10^22-10^23 flops.
In another way, using H100 market price $2/chip -> at $350k, that's ~175k hours. Or 10^24 FLOPs in total.
So, huge margin, but 10^22 - 10^24 flop is the band I think we can estimate.
These are the scale of numbers that show up in the chinchilla optimal paper, haha. Truly GPT-3 scale models.
No comments yet.