Amazon cloud accused of network slowdown

[+] akamaka|16 years ago|reply

It's pretty disappointing how these types of stories get parroted around the web, without any valuable research added.

As an EC2 user, I'd like to see less words and more measurements. Amazon has dozens of datacenters, probably each containing dozens or hundreds of independent sub-nodes, but we don't have any data regarding who is affected. In fact, Cloudkick hasn't given us any information regarding how they sampled their data.

I know that my EC2 instance isn't experiencing latency problems, so is it possible that a small number of nodes have developed problems, and that is skewing the average?

Anyone with some useful information, please share!

[+] moe|16 years ago|reply

I'm not sure if it's useful (and it's only a guess) but I think there are two mechanisms at work here:

1. Minor technical problems at Amazon affecting [at least] a few vocal customers

2. Rackspace running a PR campaign in favor to their newly launched cloud product

I'm not sure to what degree these two are connected but the timing seems a bit suspicious, with a bunch of pseudo-benchmarks (paid by rackspace) cropping up almost at the same time as these reports about EC2 problems.

As a matter of fact there has always been some variance in EC2 instance performance, as anyone who has run more than a few nodes can confirm. It's the nature of the beast.

None of the reports I have seen provided convincing data for serious large-scale problems. A peek on a few hundred instances simply doesn't cut it when amazon is said to be running around a million of them. To add a datapoint of my own; we are running almost a hundred 24/7 instances here for batch media processing and I can't see a difference in performance between now and November'09.

And even if the reports are true I'm not exactly worried. Amazon is the first and largest cloud operator, so it's just natural that they hit scaling barriers before others do. If there are problems they will fix them and move on.

Ofcourse all this sounds much more exciting when you wrap it up under a sensational headline, along with a few meaningless but colorful graphs...

[+] nettdata|16 years ago|reply

I think the article's headline is a bit unfair.

I read that and thought that they were/are manipulating the network in some way to slow it down, similar to ISP's that traffic shape torrenting clients to minimize the effect on their networks.

It could very well be that an increase in demand has meant a higher utilization of available hardware, and response times have slowed down as a result.

It could be that they're trying new, higher-density configurations to maximize ROI.

It could very well be that they're prioritizing the higher-paying customers, which seems reasonable (to me, anyway).

Sure, response times have apparently slowed down, regardless, but the headline could sound less "evil" or intentional, I guess.

Still, it'd be interesting to know what's really going on, rather than hear the generic "no over-capacity issues here" mantra from them, or the thundering silence when asked specifically about response times.

[+] pquerna|16 years ago|reply

I don't really care if they are doing any of the above, that is certinally their rights as a Business to optimize their infrastructure, but the problem is all about their lack of communication about the latency issues.

[+] codexon|16 years ago|reply

It could very well be that they're prioritizing the higher-paying customers, which seems reasonable

It's their right as a business to do that just as it's the customer's right to complain and use rackspace or google instead.

If a 1+ second internal ping for an entire day is reasonable for Amazon, I personally wouldn't be using EC2.

[+] jcsalterego|16 years ago|reply

Go Cloudkick!

6 comments