Within 3 days another power outage at Linode (Fremont)

[+] jrnkntl|15 years ago|reply

I don't understand what they did the last days if this is another power outage.

In their posting about the outage on the 20th they said: "At this point all we know is a severe lightning storm in the area caused a power outage and redundant UPS systems failed." http://status.linode.com/2010/11/possible-power-outage-in-fr...

Redundant UPS systems failed? And now it fails again? What kind of data center are they running in Fremont?

[+] dholowiski|15 years ago|reply

I worked in a datacenter (serving many companies you've heard of) during a catastrophic power failure that lasted almost 24 hours. It's kind of like a plane crash - it's never one thing failing that causes the problem (that's what redundancy is for) - it's that perfect chain of events, multiple 'once in a lifetime' failures that causes it.

For example, power outage occurs at the same time the UPS batteries are being changed. Bypass fails and Diesel generators fail to kick in. Circuit breakers blow everywhere making it extremely difficult to get the generators back on line. This all happens in the middle of the night in a winter storm (or 'lightning storm) which causes a further delay in response time.

Been there, done that.

Edit: Also, bureaucracy, lack of documentation, and a manager CF added several hours to the outage. Sometimes you just have to STFU and let the geeks fix the problem.

[+] mrkurt|15 years ago|reply

Linode doesn't run datacenters, they colo with other providers (Hurricane Electric in Fremont).

HE doesn't have the best reputation for resilient datacenter services. However, redundant power is a very complex problem and prone to failure if you can't afford to do it right... which you can't if you're selling colocation for as cheap as HE does.

I really, really wish Linode would launch a sort of premium offering in better datacenters.

[+] davidw|15 years ago|reply

Wow, a "severe lightning storm" in the bay area? My friend mentioned that on facebook too. Not an every-day occurrence.

[+] nenolod|15 years ago|reply

you're expecting competence from HE?

i colo'd at HE until last year when they ran ~400V through my racks which had 110V circuits, several PDUs and servers were damaged. that was the cherry on top of the sundae though, there had been 2 power outages at that point and 2 other outages after that point.

[+] unknown|15 years ago|reply

[deleted]

[+] ghshephard|15 years ago|reply

Every Datacenter I've been in for more than a couple years, for the last 15 years has had a power outage (or two). AT&T, Qwest, Exodus, AIS (San Diego), Layer 42 (San Jose), Media Temple (Los Angeles). It's like cable-cuts on your circuit - they happen so reliably that you put these events into your business plan. If you need network redundancy, you always have two (diverse) circuits. If you need data center redundancy you always have two (diverse) data centers. These events happen so reliably that the surprise is when they _don't_ happen, not when they _do_ happen.

Plan for a minimum of one power outage every two-three years and you won't be disappointed.

I feel for the HE guys - back to back power outages has got to be killing them right now.

[+] 2timer|15 years ago|reply

I host directly with HE in Fremont1 and just experienced the power outage. I've had equipment there for 4+ years and this is the first power issues I've had with them. HE isn't perfect, but up until now I've been perfectly happy there. Yes, HE runs a fairly relaxed data center there for better or worse. HE hasn't communicated about this outage--which I find very disappointing. I would guess this power outage was a result of attempting to fix whatever broke Saturday.

[+] jread|15 years ago|reply

Here are my own availability stats for Linode this year. We use Panopta to monitor each data center. All are within the SLA:

Dallas - 99.951% Newark - 99.969% London - 99.986% Fremont - 99.989% Atlanta - 99.995%

[+] scalyweb|15 years ago|reply

Thanks for sharing. We're you able to identify with Linode the outages for each datacenter or do the stats include any false positives?

[+] saikat|15 years ago|reply

Linode's SLA is monthly, so they are under for the month in Fremont. Though they will, of course, give credit.

[+] rbarooah|15 years ago|reply

I found this a little disturbing (having just put a new app on a node there - fortunately pre-production), but then I remembered that the last two places I worked paid the premium to be hosted at 365 Main in SF, with its flywheels and diesel generators etc, and that didn't turn out to be magic:

http://www.365main.com/status_update.html

The moral of the story for me is;

* These things are complicated

* Failures will happen

* You have to be prepared to deal with them

[+] dotBen|15 years ago|reply

I wouldn't host anything in Hurricane Electric @ Fremont for several reasons:

Back in 2008 an HE based colo "McColo" were shut down because they were hosting a HUGE amount of botnet controllers, spamming operations, and similar shady operations. When they were shut off some security firms saw a 50% drop in spam going through their firewalls.

However it only happened after immense pressure on HE and other providers involved by Google, Washington Post and all sorts of other players.

HE would have been aware because large IP blocks were being blacklisted (I heard at one point all of HE Freemont's IPs were blocked by some of the more extreme SBL lists) but they turned a blind eye and/or claimed Ts & Cs were not being infringed.

More here http://news.cnet.com/8301-1009_3-10095730-83.html

I found that highly irresponsible, both in terms of the detriment to their other colo customers who were sharing the BGP-level bandwidth but also from a wider 'being a good actor' perspective.

Given that they are also on a fault line and that good connectivity to Europe is more important to me than Asia, I would prefer to host on East Coast.

I'm actually in Linode's New Jersey and London Colos, and they are both excellent.

[+] updog|15 years ago|reply

I'm not sure that is a bad thing, actually. Although I do not like spam and botnets, at least they are willing to stand up for their customers and require due process. Others will cave on bogus DMCA notices and assume that the customer is guilty.

I would actually call pulling the plug without due process irresponsible.

Also, I question your claim about it happening only after "immense pressure." Your own link praises them for their response, and some googling suggests similar wording in all coverage I can find.

[+] jojopotato|15 years ago|reply

Folks in their irc channel are saying that the problem is with Hurricane Electric.

[+] thethimble|15 years ago|reply

I bought my first Linode on the morning of the first power outage. Does this kind of thing happen regularly there? I'm having regrets...

[+] rbranson|15 years ago|reply

This is the exception for sure. Word is that the problem is with the datacenter their boxes are in (Hurricane Electric). Of course, ultimately this could cause customer loss for them, so it becomes their issue.

[+] hoop|15 years ago|reply

I've had Linodes in Newark, NJ for almost a year now and have not had any issues there at all

[+] citricsquid|15 years ago|reply

I signed up in September after a year of horrible problems at vps.net and I've had no issues so far, my personal rule (assuming this isn't mission critical servers) is to wait 1 month before I make a judgement, having a problem on the first day doesn't necessarily define the ongoing experience.

[+] slig|15 years ago|reply

I host there since jan/09 and never had problems in Atlanta.

[+] davidw|15 years ago|reply

It's been ages since i can recall problems in their New Jersey data center.

[+] Dobbs|15 years ago|reply

This week is the first time in the year that I've owned mine that it has been down.

[+] maushu|15 years ago|reply

Never had any problems in london.

[+] jrnkntl|15 years ago|reply

Resolved after 44 minutes of downtime (according to http://wasitup.com) for my server.

[+] revicon|15 years ago|reply

Posting a reference to the previous outage thread here as it's the only place I could track down their support IRC info. (Might need to dig deeper in the Linode support docs)...

http://news.ycombinator.com/item?id=1926368

Actual IRC info...

Server: irc.oftc.net Channel: #linode

[+] mrinterweb|15 years ago|reply

These recent outages have been giving me a lot of stress lately. I manage 7 Linodes at the Fremont data center. Fortunately these outages have been at non-peak times, but if this happens during a peak time. I will need to rethink my server architecture.

[+] floodfx|15 years ago|reply

Out of curiosity, I am wondering why someone would choose Linode over AWS?

[+] dotBen|15 years ago|reply

Nuanced, but different products.

AWS is a cloud product, with pros and cons -- instances can die at any time and data (RAM and disk) won't persist, network is slightly strange with their NAT IPs but in return you get a setup that lets you connect with big storage (S3), potentially large clusters (the new GPU core product) etc.

Linode is a VPS which is just presenting you with an abstracted server. If the instance or the hypervisor gets bounced anything on disk is preserved. Networking is normal (standard IP address) and everything runs as if it is a bare metal server (more true for XEN based VPS's like Linode rather than SolusVM)

[+] mrinterweb|15 years ago|reply

Performance and value.

[+] cvg|15 years ago|reply

I live down the street from this datacenter and my home experienced this same power outage. Makes me wonder how they handle power failures. My guess is that they don't.

[+] dholowiski|15 years ago|reply

Yikes, makes me happy I chose atlanta for my project. I kind of figured it would be a good idea to have my server a long way away from California, for many reasons.

[+] holdenc|15 years ago|reply

I would really like a way to move my linode out of the Freemont data center. My linode in Newark, New Jersey has been perfect for three years.

[+] coyled|15 years ago|reply

You can, just open a support ticket requesting a migration. They'll queue it up for you, then you click a button to migrate at your leisure.

[+] uggedal|15 years ago|reply

I noticed two network outages in the Newark data center over the last year. I moved to the London facility in May and have had 100% uptime there since.

[+] mike-cardwell|15 years ago|reply

I was under the impression you can migrate Linodes between data centres?

[+] tszming|15 years ago|reply

Linode's SLA: http://www.linode.com/faq.cfm#what-is-your-sla

[+] dholowiski|15 years ago|reply

" 99.9% uptime, or your lost time is refunded back to your account" - unless my math is wrong that's 7.2 hours a month (for a 30 day month). Most SLA's are terrifying if you do the math.

[+] jinhow|15 years ago|reply

[deleted]

[+] ddemchuk|15 years ago|reply

to say this is annoying is an understatement

62 comments