Are there industry options or methods of wiring to allow for a UPS room separate from the actual rooms the racks are stored in?
It's almost tradition to have a rack with UPS's in the bottom and then the rest of the space filled with servers or drive arrays.
We wouldn't ever think of putting a tiny backup generator in the bottom of every rack, so why do we put a battery storage system there? Also, with the advances in battery chemistry technology that improve reliability and density, it's only a matter of time until Lithium chemistry batteries are available and that also increases the risk of fire.
Is there any reason not to move backup power to another room, or even to a separate structure like how they put backup generators on a pad outside of the building?
For the extreme opposite of that, Google famously trolled everyone 10 years ago by announcing that every one of their servers had its own in-chassis 12V battery:
Power is generated hundreds of kilometres from where it is used, so having your UPS room a few dozen metres from your actual DC room isn't a big deal. I-squared-R losses aren't going to be that huge.
Europe uses 400Y/230 for nominal low-voltage distribution (see Table 1 in above), so stringing some 400V extra copper to the PDUs, which then have 230V at the plugs, isn't a big deal.
> "it's only a matter of time until Lithium chemistry batteries are available and that also increases the risk of fire."
Not all Li-ion chemistries are equal. In particular, the increasingly popular LiFePO4 (LFP) technology has much higher energy density, longer lifespan, improved environmental characteristics, and similar if not better safety characteristics compared to lead-acid.
(Besides sharing lead-acid's very low risk of fire, LiFePO4 also contains no corrosive acids which could damage and short equipment if a leak were to occur.)
The farther your UPS is away from the server the fewer causes of power loss you can prevent.
I've seen people use UPSes to allow them to rearrange wiring. I've seen them fail by relying on the UPSes as well, of course.
If you wire the entire room with 2-3 separate electrical systems all powered off of separate remote UPSes, you can do whatever you want, but it's harder to change your mind or build out incrementally if you do.
Battery rooms were traditionally separate, used lead acid batteries, were surrounded by thicker walls, and equipped with FM200, just like main datacenter floors. They were typically placed near the transfer and PDU switchgear. I wouldn't put anything more flammable than LiFePO4 in a battery room, much less anywhere near a server.
It's people who decide to throw away conventions, common sense, and building codes because "they know better" who get into trouble.
I suspect this datacenter company could be sued into oblivion.
I've been led around a DC in the Netherlands where they used a separate room for the UPS batteries. Not only because of safety but because the reliability of small rack-mount UPSes really sucks (and you have to match them to the power draw of the servers in the rack or pay for a lot of unused capacity)
All racks and servers were connected to dual power feeds, so even when one of the feeds goes down the servers should run fine.
in most datacenters, the UPS's are stored in the same collum like structure as their generators. A Datacenter is divided into vertical columns, with eacht collum being powered by redundant power feeds, generator(s) and UPS's.
If you're running an IT operation this big, you should have both fire detection units and oxygen replacing suppressants.
That should be mandatory. Otherwise, it'd be very hard to contain. Especially when all your servers have RAID controllers with Li-Ion batteries or supercapacitors or other extremely trigger-happy components.
Oh, and cooling systems. You're just kindling the fire with it at the beginning.
I’m pretty confident they have efficient fire suppression systems.
They are hosting at least 400,000 servers. They for sure have multiple servers taking fire every single day, and yet it’s the first time it ends up in a catastrophic fire.
The fire suppression system either catastrophically failed, or something out of design happened with one of their inverters, as suggested in the video.
I realize it's probably just paranoia on my part but I am terrified of UPSs and can't bring myself to have one in my house. Gigantic batteries lying around in a flammable environment seems way too scary. My always-on server is read-only about 99% of the time, so I just put up with the outages when they happen (about 2 a year). If for some reason my OS eventually gets hosed because of this, I'll rebuild it.
Maybe one day if have a basement and some kind of concrete compartment to put the battery in I'll feel a bit better about it... but even then, not much you can do about gas leakage if that's a possibility with more recent UPSs.
tl;dr UPS maintenance was performed by a vendor the day before the fire. Fire department used a thermal camera to isolate source of fire, it seemed to originate with 2 UPSes, one of which was the recently maintained UPS
Other random bits: SBG-2 was an older generation datacenter, had ventilation issues? They have 4 other datacenters who have a similar design. Others, including SBG-3, have newer designs.
They're building 2500 servers per week.
For the offline buildings that are not destroyed, they have to rebuild the electrical distribution and network. It was not clear if they are also moving servers physically.
Back when I worked in hosting we'd get an email from this particular DC's NOC about either "UPS Maintenance" or generator testing. Our hearts would sink because, during one particular eighteen month period, there was a 50/50 chance our suite would go dark afterwards.
If it turns out it's the upses that caught fire, one can't help but wonder if it would be a better idea to house the upses/backup power solutions in an adjacent smaller building outfitted with sprinklers perhaps?
The problem with water in the same place with high power equipment is that it instantly turns the room into a death trap for any personnel, now everything is potentially live.
Also, in the first part of a lithium battery fire, dropping water on them is quite explosive. It will eventually quench the fire but on the short run it will make it worse, filling the room with explosive hydrogen and poisonous lithium hydroxide. So when your water sprinklers engage over your UPS, you better be sure there's nobody around: https://www.youtube.com/watch?v=cTJh_bzI0QQ
Not to be rude, but it's really wild to me that even after all this time during the pandemic, the CEO doesn't have a headset he can use so that the audio is intelligible. It's gotta be one of the highest ROI investments you can possibly make at this point.
Headset with mic, a neutral backdrop, a key light, a good camera, and a skincare routine have all paid for themselves 10x since this thing started, in my case.
This also frustrates me so much with all the newly-live-streamed events this year. So many companies are spending so much money putting together virtual conferences, but can’t be bothered to ship their speakers a decent mic or webcam. Heck, Apple’s $20 headphones would make a huge difference. Instead we get audio that sounds like it was recorded in my shower.
edit: That comment was snide, my heart goes out to the OVH team, the message within the video was good, forthright & honest. I hope it will be well received by their customers - just a shame it's a bit difficult to listen to!
This would be one of inherient difference between smaller vs. giga players in cloud hosting.
AWS/Google/Azure, if this happens, there should only be limited outage to a small fraction of customers. As a matter of fact, Google had such an incident before, and literally no customers (internal and external) noticed.
This is an apples to oranges comparison. OVH largely sells bare metal; their public cloud wasn't really impacted.
If you are using AWS, Google, or Azure, ran a single (or multiple machines) inside a single AZ with no backups and opted out of snapshots, you would face the exact same situation.
I can definitely say I see people complaining about how everything they have is down on AWS when us-east-1 goes down periodically, while large players that deploy sanely like Netflix fail over to another region seamlessly.
This [only owning a single machine at all] is what most of their customers whinging the most were doing. People that have actual sane production workloads on AWS or GCP are not going to be running 100% of their workload on a single EC2 instance with no backups.
People that are running on OVH are running often things like gameservers etc that monopolise 100% of a physical machine and don't support horizontal scaling. You quite literally cannot force a srcds/hlds server to "load balance" dynamically and fail over on heartbeat.
Often they are kids or students too, and the $30/m for a machine with 32-64GB ram is all they can afford (though this doesn't absolve them of paying $1-2/m more for offsite backups elsewhere)
You can provision more physical machines with the OVH API and have them be up in a different city in a minute or two. You get linespeed bandwidth between OVH DCs. It's up to you to use it.
This is a difference in what you are buying. When you are buying a dedicated server, there isn't exactly a good way to hide that the thing has just gone up in smokes.
When you buy a storage API, sure, failure rates go up, latency increases 100x, but after a few hours its probably back to normal.
Of course, with the increased abstraction, you get more problems. "Availability zones" are useless when most cloud outages are because of configuration or systemic issues that tend to bring the whole thing down, no matter which AZ you are. But apparently it's now considered "good enough" to just go "oh we are down because AWS is down".
It also depends if you're renting a dedicated server, vs cloud/VPS. AWS/Google/Azure deal with virtualized systems that can be moved around to another server easily.
OVH has a lot of dedicated servers as well though, so if you're using one of those then it can't be moved very easily to avoid downtime.
Genuine question: How did that question come to be?
Knowing nothing about OVH, I just typed "ovh datacenters" into Google and the first hit was this: https://www.ovh.com/world/us/about-us/datacenters.xml with the first sentence being "27 data centers around the world, including 2 of the largest ones".
deftnerd|5 years ago
It's almost tradition to have a rack with UPS's in the bottom and then the rest of the space filled with servers or drive arrays.
We wouldn't ever think of putting a tiny backup generator in the bottom of every rack, so why do we put a battery storage system there? Also, with the advances in battery chemistry technology that improve reliability and density, it's only a matter of time until Lithium chemistry batteries are available and that also increases the risk of fire.
Is there any reason not to move backup power to another room, or even to a separate structure like how they put backup generators on a pad outside of the building?
mikepurvis|5 years ago
https://www.cnet.com/news/google-uncloaks-once-secret-server...
throw0101a|5 years ago
Yes: longer cables.
See Figure 1 in the Schneider-APC white paper, where they have "Electrical Space", "Mechanical Space" (HVAC), and IT Space:
* https://download.schneider-electric.com/files?p_File_Name=VA...
Power is generated hundreds of kilometres from where it is used, so having your UPS room a few dozen metres from your actual DC room isn't a big deal. I-squared-R losses aren't going to be that huge.
Europe uses 400Y/230 for nominal low-voltage distribution (see Table 1 in above), so stringing some 400V extra copper to the PDUs, which then have 230V at the plugs, isn't a big deal.
Reason077|5 years ago
Not all Li-ion chemistries are equal. In particular, the increasingly popular LiFePO4 (LFP) technology has much higher energy density, longer lifespan, improved environmental characteristics, and similar if not better safety characteristics compared to lead-acid.
(Besides sharing lead-acid's very low risk of fire, LiFePO4 also contains no corrosive acids which could damage and short equipment if a leak were to occur.)
tyingq|5 years ago
hinkley|5 years ago
I've seen people use UPSes to allow them to rearrange wiring. I've seen them fail by relying on the UPSes as well, of course.
If you wire the entire room with 2-3 separate electrical systems all powered off of separate remote UPSes, you can do whatever you want, but it's harder to change your mind or build out incrementally if you do.
doggodaddo78|5 years ago
Battery rooms were traditionally separate, used lead acid batteries, were surrounded by thicker walls, and equipped with FM200, just like main datacenter floors. They were typically placed near the transfer and PDU switchgear. I wouldn't put anything more flammable than LiFePO4 in a battery room, much less anywhere near a server.
It's people who decide to throw away conventions, common sense, and building codes because "they know better" who get into trouble.
I suspect this datacenter company could be sued into oblivion.
bzzzt|5 years ago
All racks and servers were connected to dual power feeds, so even when one of the feeds goes down the servers should run fine.
unknown|5 years ago
[deleted]
walshemj|5 years ago
The lack of fire suppression is also very worrying.
bayindirh|5 years ago
kazen44|5 years ago
jacquesm|5 years ago
tyingq|5 years ago
How close the 3 data centers are in SBG: https://cdn.baxtel.com/data-center/ovh-strasbourg-campus/pho...
How hot that fire was. I'm pretty sure the orange spots are holes melted in the walls that are made from metal shipping containers: https://pbs.twimg.com/media/EwGqV17XMAMF_wa?format=jpg&name=...
tpmx|5 years ago
Second question: is such a system required for this kind of operation? Maybe?
bayindirh|5 years ago
That should be mandatory. Otherwise, it'd be very hard to contain. Especially when all your servers have RAID controllers with Li-Ion batteries or supercapacitors or other extremely trigger-happy components.
Oh, and cooling systems. You're just kindling the fire with it at the beginning.
sschueller|5 years ago
raphaelj|5 years ago
They are hosting at least 400,000 servers. They for sure have multiple servers taking fire every single day, and yet it’s the first time it ends up in a catastrophic fire.
The fire suppression system either catastrophically failed, or something out of design happened with one of their inverters, as suggested in the video.
tormeh|5 years ago
dang|5 years ago
nickdothutton|5 years ago
bscphil|5 years ago
Maybe one day if have a basement and some kind of concrete compartment to put the battery in I'll feel a bit better about it... but even then, not much you can do about gas leakage if that's a possibility with more recent UPSs.
riffic|5 years ago
* http://etler.com/docs/bsp-archive/157/157-601-701_I19.pdf
* http://etler.com/docs/bsp-archive/157/157-601-101_I7.pdf
lovedswain|5 years ago
mgbmtl|5 years ago
They're building 2500 servers per week.
For the offline buildings that are not destroyed, they have to rebuild the electrical distribution and network. It was not clear if they are also moving servers physically.
jeffbee|5 years ago
teh_klev|5 years ago
Back when I worked in hosting we'd get an email from this particular DC's NOC about either "UPS Maintenance" or generator testing. Our hearts would sink because, during one particular eighteen month period, there was a 50/50 chance our suite would go dark afterwards.
makkesk8|5 years ago
yholio|5 years ago
Also, in the first part of a lithium battery fire, dropping water on them is quite explosive. It will eventually quench the fire but on the short run it will make it worse, filling the room with explosive hydrogen and poisonous lithium hydroxide. So when your water sprinklers engage over your UPS, you better be sure there's nobody around: https://www.youtube.com/watch?v=cTJh_bzI0QQ
sschueller|5 years ago
gautamcgoel|5 years ago
kevinmgranger|5 years ago
aidos|5 years ago
notJim|5 years ago
tpmx|5 years ago
abluecloud|5 years ago
sneak|5 years ago
bigyikes|5 years ago
iptrans|5 years ago
5h|5 years ago
edit: That comment was snide, my heart goes out to the OVH team, the message within the video was good, forthright & honest. I hope it will be well received by their customers - just a shame it's a bit difficult to listen to!
dang|5 years ago
FDSGSG|5 years ago
dtx1|5 years ago
justicezyx|5 years ago
AWS/Google/Azure, if this happens, there should only be limited outage to a small fraction of customers. As a matter of fact, Google had such an incident before, and literally no customers (internal and external) noticed.
ev1|5 years ago
If you are using AWS, Google, or Azure, ran a single (or multiple machines) inside a single AZ with no backups and opted out of snapshots, you would face the exact same situation.
I can definitely say I see people complaining about how everything they have is down on AWS when us-east-1 goes down periodically, while large players that deploy sanely like Netflix fail over to another region seamlessly.
This [only owning a single machine at all] is what most of their customers whinging the most were doing. People that have actual sane production workloads on AWS or GCP are not going to be running 100% of their workload on a single EC2 instance with no backups.
People that are running on OVH are running often things like gameservers etc that monopolise 100% of a physical machine and don't support horizontal scaling. You quite literally cannot force a srcds/hlds server to "load balance" dynamically and fail over on heartbeat.
Often they are kids or students too, and the $30/m for a machine with 32-64GB ram is all they can afford (though this doesn't absolve them of paying $1-2/m more for offsite backups elsewhere)
You can provision more physical machines with the OVH API and have them be up in a different city in a minute or two. You get linespeed bandwidth between OVH DCs. It's up to you to use it.
stefan_|5 years ago
When you buy a storage API, sure, failure rates go up, latency increases 100x, but after a few hours its probably back to normal.
Of course, with the increased abstraction, you get more problems. "Availability zones" are useless when most cloud outages are because of configuration or systemic issues that tend to bring the whole thing down, no matter which AZ you are. But apparently it's now considered "good enough" to just go "oh we are down because AWS is down".
Saris|5 years ago
OVH has a lot of dedicated servers as well though, so if you're using one of those then it can't be moved very easily to avoid downtime.
jeffbee|5 years ago
sergiotapia|5 years ago
anyfoo|5 years ago
Knowing nothing about OVH, I just typed "ovh datacenters" into Google and the first hit was this: https://www.ovh.com/world/us/about-us/datacenters.xml with the first sentence being "27 data centers around the world, including 2 of the largest ones".
batmansmk|5 years ago
https://www.ovh.com/world/us/about-us/datacenters.xml
Nomikos|5 years ago
numpad0|5 years ago
FDSGSG|5 years ago