OVH CEO Octave Klaba speaking about the incident [video]

deftnerd|5 years ago

Are there industry options or methods of wiring to allow for a UPS room separate from the actual rooms the racks are stored in?

It's almost tradition to have a rack with UPS's in the bottom and then the rest of the space filled with servers or drive arrays.

We wouldn't ever think of putting a tiny backup generator in the bottom of every rack, so why do we put a battery storage system there? Also, with the advances in battery chemistry technology that improve reliability and density, it's only a matter of time until Lithium chemistry batteries are available and that also increases the risk of fire.

Is there any reason not to move backup power to another room, or even to a separate structure like how they put backup generators on a pad outside of the building?

mikepurvis|5 years ago

For the extreme opposite of that, Google famously trolled everyone 10 years ago by announcing that every one of their servers had its own in-chassis 12V battery:

https://www.cnet.com/news/google-uncloaks-once-secret-server...

throw0101a|5 years ago

> Are there industry options or methods of wiring to allow for a UPS room separate from the actual rooms the racks are stored in?

Yes: longer cables.

See Figure 1 in the Schneider-APC white paper, where they have "Electrical Space", "Mechanical Space" (HVAC), and IT Space:

* https://download.schneider-electric.com/files?p_File_Name=VA...

Power is generated hundreds of kilometres from where it is used, so having your UPS room a few dozen metres from your actual DC room isn't a big deal. I-squared-R losses aren't going to be that huge.

Europe uses 400Y/230 for nominal low-voltage distribution (see Table 1 in above), so stringing some 400V extra copper to the PDUs, which then have 230V at the plugs, isn't a big deal.

Reason077|5 years ago

> "it's only a matter of time until Lithium chemistry batteries are available and that also increases the risk of fire."

Not all Li-ion chemistries are equal. In particular, the increasingly popular LiFePO4 (LFP) technology has much higher energy density, longer lifespan, improved environmental characteristics, and similar if not better safety characteristics compared to lead-acid.

(Besides sharing lead-acid's very low risk of fire, LiFePO4 also contains no corrosive acids which could damage and short equipment if a leak were to occur.)

tyingq|5 years ago

There are plenty of datacenters with a separate battery room, sure.

hinkley|5 years ago

The farther your UPS is away from the server the fewer causes of power loss you can prevent.

I've seen people use UPSes to allow them to rearrange wiring. I've seen them fail by relying on the UPSes as well, of course.

If you wire the entire room with 2-3 separate electrical systems all powered off of separate remote UPSes, you can do whatever you want, but it's harder to change your mind or build out incrementally if you do.

doggodaddo78|5 years ago

Batteries burn up, especially NMC Li ion.

Battery rooms were traditionally separate, used lead acid batteries, were surrounded by thicker walls, and equipped with FM200, just like main datacenter floors. They were typically placed near the transfer and PDU switchgear. I wouldn't put anything more flammable than LiFePO4 in a battery room, much less anywhere near a server.

It's people who decide to throw away conventions, common sense, and building codes because "they know better" who get into trouble.

I suspect this datacenter company could be sued into oblivion.

bzzzt|5 years ago

I've been led around a DC in the Netherlands where they used a separate room for the UPS batteries. Not only because of safety but because the reliability of small rack-mount UPSes really sucks (and you have to match them to the power draw of the servers in the rack or pay for a lot of unused capacity)

All racks and servers were connected to dual power feeds, so even when one of the feeds goes down the servers should run fine.

unknown|5 years ago

[deleted]

walshemj|5 years ago

Depends for a DC / Telco your normally feeding 48v DC from the UPS which is normally separate.

The lack of fire suppression is also very worrying.

bayindirh|5 years ago

Our current DC has generation and UPSes in different rooms, in isolated places from each other. Both are pretty far from the actual DC itself.

kazen44|5 years ago

in most datacenters, the UPS's are stored in the same collum like structure as their generators. A Datacenter is divided into vertical columns, with eacht collum being powered by redundant power feeds, generator(s) and UPS's.

jacquesm|5 years ago

I love the lack of fluff, lack of political bs and the honesty on display here.

tyingq|5 years ago

Two interesting pictures from the earlier story...

How close the 3 data centers are in SBG: https://cdn.baxtel.com/data-center/ovh-strasbourg-campus/pho...

How hot that fire was. I'm pretty sure the orange spots are holes melted in the walls that are made from metal shipping containers: https://pbs.twimg.com/media/EwGqV17XMAMF_wa?format=jpg&name=...

tpmx|5 years ago

I think the key issue here is that there wasn't a functioning fire suppresssion system in place.

Second question: is such a system required for this kind of operation? Maybe?

bayindirh|5 years ago

If you're running an IT operation this big, you should have both fire detection units and oxygen replacing suppressants.

That should be mandatory. Otherwise, it'd be very hard to contain. Especially when all your servers have RAID controllers with Li-Ion batteries or supercapacitors or other extremely trigger-happy components.

Oh, and cooling systems. You're just kindling the fire with it at the beginning.

sschueller|5 years ago

I think certain ISO certifications require it.

raphaelj|5 years ago

I’m pretty confident they have efficient fire suppression systems.

They are hosting at least 400,000 servers. They for sure have multiple servers taking fire every single day, and yet it’s the first time it ends up in a catastrophic fire.

The fire suppression system either catastrophically failed, or something out of design happened with one of their inverters, as suggested in the video.

tormeh|5 years ago

The raw video link is https://www.ovh.com/fr/images/sbg/Octave-Klaba-speaking-en-v...

dang|5 years ago

That seems like a more precise URL so I've changed to that from https://www.ovh.com/fr/images/sbg/index-en.html for now. Thanks!

nickdothutton|5 years ago

Lucky it wasn’t a UPS explosion. https://h2tools.org/lessons/battery-room-explosion

bscphil|5 years ago

I realize it's probably just paranoia on my part but I am terrified of UPSs and can't bring myself to have one in my house. Gigantic batteries lying around in a flammable environment seems way too scary. My always-on server is read-only about 99% of the time, so I just put up with the outages when they happen (about 2 a year). If for some reason my OS eventually gets hosed because of this, I'll rebuild it.

Maybe one day if have a basement and some kind of concrete compartment to put the battery in I'll feel a bit better about it... but even then, not much you can do about gas leakage if that's a possibility with more recent UPSs.

riffic|5 years ago

There are a couple good bell system practice documents covering lead-acid battery hazards:

* http://etler.com/docs/bsp-archive/157/157-601-701_I19.pdf

* http://etler.com/docs/bsp-archive/157/157-601-101_I7.pdf

lovedswain|5 years ago

tl;dr UPS maintenance was performed by a vendor the day before the fire. Fire department used a thermal camera to isolate source of fire, it seemed to originate with 2 UPSes, one of which was the recently maintained UPS

mgbmtl|5 years ago

Other random bits: SBG-2 was an older generation datacenter, had ventilation issues? They have 4 other datacenters who have a similar design. Others, including SBG-3, have newer designs.

They're building 2500 servers per week.

For the offline buildings that are not destroyed, they have to rebuild the electrical distribution and network. It was not clear if they are also moving servers physically.

jeffbee|5 years ago

This is taking to extremes my maxim that UPSes cause more outages than they prevent.

teh_klev|5 years ago

It's always the bloody UPS after "maintenance" :)

Back when I worked in hosting we'd get an email from this particular DC's NOC about either "UPS Maintenance" or generator testing. Our hearts would sink because, during one particular eighteen month period, there was a 50/50 chance our suite would go dark afterwards.

makkesk8|5 years ago

If it turns out it's the upses that caught fire, one can't help but wonder if it would be a better idea to house the upses/backup power solutions in an adjacent smaller building outfitted with sprinklers perhaps?

yholio|5 years ago

The problem with water in the same place with high power equipment is that it instantly turns the room into a death trap for any personnel, now everything is potentially live.

Also, in the first part of a lithium battery fire, dropping water on them is quite explosive. It will eventually quench the fire but on the short run it will make it worse, filling the room with explosive hydrogen and poisonous lithium hydroxide. So when your water sprinklers engage over your UPS, you better be sure there's nobody around: https://www.youtube.com/watch?v=cTJh_bzI0QQ

sschueller|5 years ago

You would want to use a non-water based fire suppression system.

gautamcgoel|5 years ago

What incident?

kevinmgranger|5 years ago

https://news.ycombinator.com/item?id=26407323

aidos|5 years ago

They had a fire.

notJim|5 years ago

Not to be rude, but it's really wild to me that even after all this time during the pandemic, the CEO doesn't have a headset he can use so that the audio is intelligible. It's gotta be one of the highest ROI investments you can possibly make at this point.

tpmx|5 years ago

It's quite consistent with how the entire operation is run, at least when witnessed from the outside as a customer.

abluecloud|5 years ago

a functioning fire suppression system would have probably been better but second to that i guess a mic would be a good investment

sneak|5 years ago

Headset with mic, a neutral backdrop, a key light, a good camera, and a skincare routine have all paid for themselves 10x since this thing started, in my case.

bigyikes|5 years ago

This also frustrates me so much with all the newly-live-streamed events this year. So many companies are spending so much money putting together virtual conferences, but can’t be bothered to ship their speakers a decent mic or webcam. Heck, Apple’s $20 headphones would make a huge difference. Instead we get audio that sounds like it was recorded in my shower.

iptrans|5 years ago

Can somebody add [video] to the title?

5h|5 years ago

[video - poor audio] maybe

edit: That comment was snide, my heart goes out to the OVH team, the message within the video was good, forthright & honest. I hope it will be well received by their customers - just a shame it's a bit difficult to listen to!

dang|5 years ago

Sure. Done.

FDSGSG|5 years ago

Isn't that already implied by the title?

dtx1|5 years ago

Good to get out a response out quick, but this is too quick, the audio is garbage.

justicezyx|5 years ago

This would be one of inherient difference between smaller vs. giga players in cloud hosting.

AWS/Google/Azure, if this happens, there should only be limited outage to a small fraction of customers. As a matter of fact, Google had such an incident before, and literally no customers (internal and external) noticed.

ev1|5 years ago

This is an apples to oranges comparison. OVH largely sells bare metal; their public cloud wasn't really impacted.

If you are using AWS, Google, or Azure, ran a single (or multiple machines) inside a single AZ with no backups and opted out of snapshots, you would face the exact same situation.

I can definitely say I see people complaining about how everything they have is down on AWS when us-east-1 goes down periodically, while large players that deploy sanely like Netflix fail over to another region seamlessly.

This [only owning a single machine at all] is what most of their customers whinging the most were doing. People that have actual sane production workloads on AWS or GCP are not going to be running 100% of their workload on a single EC2 instance with no backups.

People that are running on OVH are running often things like gameservers etc that monopolise 100% of a physical machine and don't support horizontal scaling. You quite literally cannot force a srcds/hlds server to "load balance" dynamically and fail over on heartbeat.

Often they are kids or students too, and the $30/m for a machine with 32-64GB ram is all they can afford (though this doesn't absolve them of paying $1-2/m more for offsite backups elsewhere)

You can provision more physical machines with the OVH API and have them be up in a different city in a minute or two. You get linespeed bandwidth between OVH DCs. It's up to you to use it.

stefan_|5 years ago

This is a difference in what you are buying. When you are buying a dedicated server, there isn't exactly a good way to hide that the thing has just gone up in smokes.

When you buy a storage API, sure, failure rates go up, latency increases 100x, but after a few hours its probably back to normal.

Of course, with the increased abstraction, you get more problems. "Availability zones" are useless when most cloud outages are because of configuration or systemic issues that tend to bring the whole thing down, no matter which AZ you are. But apparently it's now considered "good enough" to just go "oh we are down because AWS is down".

Saris|5 years ago

It also depends if you're renting a dedicated server, vs cloud/VPS. AWS/Google/Azure deal with virtualized systems that can be moved around to another server easily.

OVH has a lot of dedicated servers as well though, so if you're using one of those then it can't be moved very easily to avoid downtime.

jeffbee|5 years ago

I can't even find any press articles about the Google incident.

sergiotapia|5 years ago

Is it true that OVH has literally all data in that single datacenter?

anyfoo|5 years ago

Genuine question: How did that question come to be?

Knowing nothing about OVH, I just typed "ovh datacenters" into Google and the first hit was this: https://www.ovh.com/world/us/about-us/datacenters.xml with the first sentence being "27 data centers around the world, including 2 of the largest ones".

batmansmk|5 years ago

Nope. 27 in the world, 10 in France, 1 caught fire, 2 others are stopped for inspection.

https://www.ovh.com/world/us/about-us/datacenters.xml

Nomikos|5 years ago

No, they have 26 left https://www.ovh.com/world/us/about-us/datacenters.xml

numpad0|5 years ago

It is usually true that data literally on site is on site

FDSGSG|5 years ago

No. OVH has 27(-1) datacenters.

140 comments