WingNews

dimes|6 years ago

I use both services heavily at work. The networking in GCP is terrible. We experience minor service degradation multiple times a month due to networking issues in GCP (elevated latency, errors talking to the DB, etc). We've even had cases where there was packet corruption at the bare metal layer, so we ended up storing a bunch of garbage data in our caches / databases. Also, the networking is less understandable on GCP compared to AWS. For instance, the external HTTP load balancer uses BGP and magic, so you aren't control of which zones your LB is deployed to. Some zones don't have any LBs deployed, so there is a constant cross-zone latency hit when using some zones. It took us months to discover this after consistent denials from Google Cloud support that something was wrong with a specific zone our service was running in.

AWS, on the other hand, has given us very few problems. When we do have an issue with an AWS service, we're able to quickly get an engineer on the phone who, thus far, has been able to explain exactly what our issue is and how to fix it.

jpmattia|6 years ago

> We've even had cases where there was packet corruption at the bare metal layer,

I'd love to know how this happens in the modern world. I've seen it myself only once (not GCP, but our own network with cisco equipment.)

Is something in the chain not checking the packet's CRC?

captn3m0|6 years ago

Just curious, is this on a specific region(s)?

pm90|6 years ago

GCP is incredibly bad at communicating when there are problems with their systems. Just terrible. Its only when our apps start to break that we notice something is down, then look at the green dashboard which is even more infuriating.

timdorr|6 years ago

AWS is often the same way. No one seems to be good at communicating outage details.

jjeaff|6 years ago

Their dashboard does show red on GCE and networking right now, for what it's worth. https://status.cloud.google.com/

hinkley|6 years ago

What aren’t these on separate systems? I never had the impression that google cheaps out on things but this sounds exactly like the sort of shit that happens when people cheap out. Not even a canary system?

kenhwang|6 years ago

AWS has what feel like monthly AZ brownouts (typically degradated performance or other control plane issues) with the yearly-ish regional brown/blackout.

GCP has quarterly-ish global blackouts, and generally on the data plane at that which makes them significantly more severe.

jjeaff|6 years ago

Are there any services that track uptime for various regions and zones from various providers? It's rare that everything goes down and thus the cloud providers pretend they have almost no downtime.

tubaguy50035|6 years ago

Obviously we don't know what the extent of the issue is yet, but afaik there has never been an AWS incident that has affected multiple regions where an application had been designed to use them (like using region specific S3 endpoints). GCP and Azure have had issues in multiple regions that would have affected applications designed for multi-region.

votepaunchy|6 years ago

> like using region specific S3 endpoints

AWS had the S3 incident affecting all of us-east-1: “Other AWS services in the US-EAST-1 Region that rely on S3 for storage, including the S3 console, Amazon Elastic Compute Cloud (EC2) new instance launches, Amazon Elastic Block Store (EBS) volumes (when data was needed from a S3 snapshot), and AWS Lambda were also impacted while the S3 APIs were unavailable.”

https://aws.amazon.com/message/41926/

theevilsharpie|6 years ago

I find GCP quicker to post status updates about issues than AWS, but GCP also seems to run into more problems that span across multiple regions.

I'm overall happy with it, but if I needed to run a service with a 99.95% uptime SLA or higher, I wouldn't rely solely on GCP.

rexarex|6 years ago

AWS has better customer service and I don’t remember the last time there was a huge outage like this besides the S3 outage

dodobirdlord|6 years ago

There was a terribly day 2-3 months back in us-west-2 where CloudWatch went down for a couple of hours and took out AutoScaling with it, causing a bunch of services like DynamoDB and EC2 to improperly scale in tables and clusters, and then 12 hours later Lambda went down for a couple of hours, degrading or disabling a bunch of other AWS services.

solidasparagus|6 years ago

I've heard from people who have worked with both AWS and GCP that AWS has far better availability.

thruhiker|6 years ago

I've also heard similar from a teammate who previously worked with GCP. That said I know several folks who work for GCP and they are expending significant resources to improve the product and add features.

(no title)

discuss