top | item 45116332

(no title)

grubbs | 6 months ago

Interesting. I work in higher ed and we have thousands of GPUs under my team. Rarely ever seen a failure. Mostly when we put consumer grade GPUs in servers (Nvidia doesn't like this). True server-grade GPUs never have any problems.

discuss

order

ecshafer|6 months ago

IS this for some kind of HPC cluster? What kind of utilization are you at? For an AI company these GPUs are going to be at near 100% utilization 24/7. These kinds of loads destroy hardware quick.

bluedino|6 months ago

Every site I've worked at has plenty of GPU failures. Not consumer grade either, H100/A100