top | item 45441871

(no title)

mwambua | 5 months ago

How did you arrive at the decision of not putting the GPU machines in the colo? Were the power costs going to be too high? Or do you just expect to need more physical access to the GPU machines vs the storage ones?

discuss

order

g413n|5 months ago

When I was working at sfcompute prior to this we saw multiple datacenters literally catch on fire bc the industry was not experienced with the power density of h100s. Our training chips just aren't a standard package in the way JBODs are.

Symbiote|5 months ago

Isn't the easy option to spread the computers out, i.e. not fill the rack, but only half of it?

A GPU cluster next to my servers has done this, presumably they couldn't have 64A in one rack so they've got 32A in two. (230V 3phase.)

lemonlearnings|5 months ago

Adding the compute story would be interesting as a follow up.

Where is that done? How many GPUs do you need to crunching all that data. Etc.

Very interesting and refreshing read though. Feels like what Silicon Valley is more about than just the usual: tf apply then smile and dial.