I worked in a datacenter (serving many companies you've heard of) during a catastrophic power failure that lasted almost 24 hours. It's kind of like a plane crash - it's never one thing failing that causes the problem (that's what redundancy is for) - it's that perfect chain of events, multiple 'once in a lifetime' failures that causes it.
For example, power outage occurs at the same time the UPS batteries are being changed. Bypass fails and Diesel generators fail to kick in. Circuit breakers blow everywhere making it extremely difficult to get the generators back on line. This all happens in the middle of the night in a winter storm (or 'lightning storm) which causes a further delay in response time.
Been there, done that.
Edit: Also, bureaucracy, lack of documentation, and a manager CF added several hours to the outage. Sometimes you just have to STFU and let the geeks fix the problem.
Linode doesn't run datacenters, they colo with other providers (Hurricane Electric in Fremont).
HE doesn't have the best reputation for resilient datacenter services. However, redundant power is a very complex problem and prone to failure if you can't afford to do it right... which you can't if you're selling colocation for as cheap as HE does.
I really, really wish Linode would launch a sort of premium offering in better datacenters.
i colo'd at HE until last year when they ran ~400V through my racks which had 110V circuits, several PDUs and servers were damaged. that was the cherry on top of the sundae though, there had been 2 power outages at that point and 2 other outages after that point.
Every Datacenter I've been in for more than a couple years, for the last 15 years has had a power outage (or two). AT&T, Qwest, Exodus, AIS (San Diego), Layer 42 (San Jose), Media Temple (Los Angeles). It's like cable-cuts on your circuit - they happen so reliably that you put these events into your business plan. If you need network redundancy, you always have two (diverse) circuits. If you need data center redundancy you always have two (diverse) data centers. These events happen so reliably that the surprise is when they _don't_ happen, not when they _do_ happen.
Plan for a minimum of one power outage every two-three years and you won't be disappointed.
I feel for the HE guys - back to back power outages has got to be killing them right now.
I host directly with HE in Fremont1 and just experienced the power outage. I've had equipment there for 4+ years and this is the first power issues I've had with them. HE isn't perfect, but up until now I've been perfectly happy there. Yes, HE runs a fairly relaxed data center there for better or worse. HE hasn't communicated about this outage--which I find very disappointing. I would guess this power outage was a result of attempting to fix whatever broke Saturday.
I found this a little disturbing (having just put a new app on a node there - fortunately pre-production), but then I remembered that the last two places I worked paid the premium to be hosted at 365 Main in SF, with its flywheels and diesel generators etc, and that didn't turn out to be magic:
I wouldn't host anything in Hurricane Electric @ Fremont for several reasons:
Back in 2008 an HE based colo "McColo" were shut down because they were hosting a HUGE amount of botnet controllers, spamming operations, and similar shady operations. When they were shut off some security firms saw a 50% drop in spam going through their firewalls.
However it only happened after immense pressure on HE and other providers involved by Google, Washington Post and all sorts of other players.
HE would have been aware because large IP blocks were being blacklisted (I heard at one point all of HE Freemont's IPs were blocked by some of the more extreme SBL lists) but they turned a blind eye and/or claimed Ts & Cs were not being infringed.
I found that highly irresponsible, both in terms of the detriment to their other colo customers who were sharing the BGP-level bandwidth but also from a wider 'being a good actor' perspective.
Given that they are also on a fault line and that good connectivity to Europe is more important to me than Asia, I would prefer to host on East Coast.
I'm actually in Linode's New Jersey and London Colos, and they are both excellent.
I'm not sure that is a bad thing, actually. Although I do not like spam and botnets, at least they are willing to stand up for their customers and require due process. Others will cave on bogus DMCA notices and assume that the customer is guilty.
I would actually call pulling the plug without due process irresponsible.
Also, I question your claim about it happening only after "immense pressure." Your own link praises them for their response, and some googling suggests similar wording in all coverage I can find.
This is the exception for sure. Word is that the problem is with the datacenter their boxes are in (Hurricane Electric). Of course, ultimately this could cause customer loss for them, so it becomes their issue.
I signed up in September after a year of horrible problems at vps.net and I've had no issues so far, my personal rule (assuming this isn't mission critical servers) is to wait 1 month before I make a judgement, having a problem on the first day doesn't necessarily define the ongoing experience.
Posting a reference to the previous outage thread here as it's the only place I could track down their support IRC info. (Might need to dig deeper in the Linode support docs)...
These recent outages have been giving me a lot of stress lately. I manage 7 Linodes at the Fremont data center. Fortunately these outages have been at non-peak times, but if this happens during a peak time. I will need to rethink my server architecture.
AWS is a cloud product, with pros and cons -- instances can die at any time and data (RAM and disk) won't persist, network is slightly strange with their NAT IPs but in return you get a setup that lets you connect with big storage (S3), potentially large clusters (the new GPU core product) etc.
Linode is a VPS which is just presenting you with an abstracted server. If the instance or the hypervisor gets bounced anything on disk is preserved. Networking is normal (standard IP address) and everything runs as if it is a bare metal server (more true for XEN based VPS's like Linode rather than SolusVM)
I live down the street from this datacenter and my home experienced this same power outage. Makes me wonder how they handle power failures. My guess is that they don't.
Yikes, makes me happy I chose atlanta for my project. I kind of figured it would be a good idea to have my server a long way away from California, for many reasons.
I noticed two network outages in the Newark data center over the last year. I moved to the London facility in May and have had 100% uptime there since.
" 99.9% uptime, or your lost time is refunded back to your account" - unless my math is wrong that's 7.2 hours a month (for a 30 day month). Most SLA's are terrifying if you do the math.
[+] [-] jrnkntl|15 years ago|reply
In their posting about the outage on the 20th they said: "At this point all we know is a severe lightning storm in the area caused a power outage and redundant UPS systems failed." http://status.linode.com/2010/11/possible-power-outage-in-fr...
Redundant UPS systems failed? And now it fails again? What kind of data center are they running in Fremont?
[+] [-] dholowiski|15 years ago|reply
For example, power outage occurs at the same time the UPS batteries are being changed. Bypass fails and Diesel generators fail to kick in. Circuit breakers blow everywhere making it extremely difficult to get the generators back on line. This all happens in the middle of the night in a winter storm (or 'lightning storm) which causes a further delay in response time.
Been there, done that.
Edit: Also, bureaucracy, lack of documentation, and a manager CF added several hours to the outage. Sometimes you just have to STFU and let the geeks fix the problem.
[+] [-] mrkurt|15 years ago|reply
HE doesn't have the best reputation for resilient datacenter services. However, redundant power is a very complex problem and prone to failure if you can't afford to do it right... which you can't if you're selling colocation for as cheap as HE does.
I really, really wish Linode would launch a sort of premium offering in better datacenters.
[+] [-] davidw|15 years ago|reply
[+] [-] nenolod|15 years ago|reply
i colo'd at HE until last year when they ran ~400V through my racks which had 110V circuits, several PDUs and servers were damaged. that was the cherry on top of the sundae though, there had been 2 power outages at that point and 2 other outages after that point.
[+] [-] unknown|15 years ago|reply
[deleted]
[+] [-] ghshephard|15 years ago|reply
Plan for a minimum of one power outage every two-three years and you won't be disappointed.
I feel for the HE guys - back to back power outages has got to be killing them right now.
[+] [-] 2timer|15 years ago|reply
[+] [-] jread|15 years ago|reply
Dallas - 99.951% Newark - 99.969% London - 99.986% Fremont - 99.989% Atlanta - 99.995%
[+] [-] scalyweb|15 years ago|reply
[+] [-] saikat|15 years ago|reply
[+] [-] rbarooah|15 years ago|reply
http://www.365main.com/status_update.html
The moral of the story for me is;
* These things are complicated
* Failures will happen
* You have to be prepared to deal with them
[+] [-] dotBen|15 years ago|reply
Back in 2008 an HE based colo "McColo" were shut down because they were hosting a HUGE amount of botnet controllers, spamming operations, and similar shady operations. When they were shut off some security firms saw a 50% drop in spam going through their firewalls.
However it only happened after immense pressure on HE and other providers involved by Google, Washington Post and all sorts of other players.
HE would have been aware because large IP blocks were being blacklisted (I heard at one point all of HE Freemont's IPs were blocked by some of the more extreme SBL lists) but they turned a blind eye and/or claimed Ts & Cs were not being infringed.
More here http://news.cnet.com/8301-1009_3-10095730-83.html
I found that highly irresponsible, both in terms of the detriment to their other colo customers who were sharing the BGP-level bandwidth but also from a wider 'being a good actor' perspective.
Given that they are also on a fault line and that good connectivity to Europe is more important to me than Asia, I would prefer to host on East Coast.
I'm actually in Linode's New Jersey and London Colos, and they are both excellent.
[+] [-] updog|15 years ago|reply
I would actually call pulling the plug without due process irresponsible.
Also, I question your claim about it happening only after "immense pressure." Your own link praises them for their response, and some googling suggests similar wording in all coverage I can find.
[+] [-] jojopotato|15 years ago|reply
[+] [-] thethimble|15 years ago|reply
[+] [-] rbranson|15 years ago|reply
[+] [-] hoop|15 years ago|reply
[+] [-] citricsquid|15 years ago|reply
[+] [-] slig|15 years ago|reply
[+] [-] davidw|15 years ago|reply
[+] [-] Dobbs|15 years ago|reply
[+] [-] maushu|15 years ago|reply
[+] [-] jrnkntl|15 years ago|reply
[+] [-] revicon|15 years ago|reply
http://news.ycombinator.com/item?id=1926368
Actual IRC info...
Server: irc.oftc.net Channel: #linode
[+] [-] mrinterweb|15 years ago|reply
[+] [-] floodfx|15 years ago|reply
[+] [-] dotBen|15 years ago|reply
AWS is a cloud product, with pros and cons -- instances can die at any time and data (RAM and disk) won't persist, network is slightly strange with their NAT IPs but in return you get a setup that lets you connect with big storage (S3), potentially large clusters (the new GPU core product) etc.
Linode is a VPS which is just presenting you with an abstracted server. If the instance or the hypervisor gets bounced anything on disk is preserved. Networking is normal (standard IP address) and everything runs as if it is a bare metal server (more true for XEN based VPS's like Linode rather than SolusVM)
[+] [-] mrinterweb|15 years ago|reply
[+] [-] cvg|15 years ago|reply
[+] [-] dholowiski|15 years ago|reply
[+] [-] holdenc|15 years ago|reply
[+] [-] coyled|15 years ago|reply
[+] [-] uggedal|15 years ago|reply
[+] [-] mike-cardwell|15 years ago|reply
[+] [-] tszming|15 years ago|reply
[+] [-] dholowiski|15 years ago|reply
[+] [-] jinhow|15 years ago|reply
[deleted]
[+] [-] ddemchuk|15 years ago|reply