top | item 42515691

(no title)

zardinality | 1 year ago

In the introduction of the paper it says: "Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks." They have indeed a very strong infra team.

discuss

order

ComputerGuru|1 year ago

Do we have two completely different definitions of “infrastructure”?