top | item 42515691

(no title)

In the introduction of the paper it says: "Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks." They have indeed a very strong infra team.

discuss

unknown|1 year ago

[deleted]

ComputerGuru|1 year ago

Do we have two completely different definitions of “infrastructure”?