How did you arrive at the decision of not putting the GPU machines in the colo? Were the power costs going to be too high? Or do you just expect to need more physical access to the GPU machines vs the storage ones?
When I was working at sfcompute prior to this we saw multiple datacenters literally catch on fire bc the industry was not experienced with the power density of h100s. Our training chips just aren't a standard package in the way JBODs are.
g413n|5 months ago
Symbiote|5 months ago
A GPU cluster next to my servers has done this, presumably they couldn't have 64A in one rack so they've got 32A in two. (230V 3phase.)
lemonlearnings|5 months ago
Where is that done? How many GPUs do you need to crunching all that data. Etc.
Very interesting and refreshing read though. Feels like what Silicon Valley is more about than just the usual: tf apply then smile and dial.