I use both services heavily at work. The networking in GCP is terrible. We experience minor service degradation multiple times a month due to networking issues in GCP (elevated latency, errors talking to the DB, etc). We've even had cases where there was packet corruption at the bare metal layer, so we ended up storing a bunch of garbage data in our caches / databases. Also, the networking is less understandable on GCP compared to AWS. For instance, the external HTTP load balancer uses BGP and magic, so you aren't control of which zones your LB is deployed to. Some zones don't have any LBs deployed, so there is a constant cross-zone latency hit when using some zones. It took us months to discover this after consistent denials from Google Cloud support that something was wrong with a specific zone our service was running in.
AWS, on the other hand, has given us very few problems. When we do have an issue with an AWS service, we're able to quickly get an engineer on the phone who, thus far, has been able to explain exactly what our issue is and how to fix it.
GCP is incredibly bad at communicating when there are problems with their systems. Just terrible. Its only when our apps start to break that we notice something is down, then look at the green dashboard which is even more infuriating.
What aren’t these on separate systems? I never had the impression that google cheaps out on things but this sounds exactly like the sort of shit that happens when people cheap out. Not even a canary system?
AWS has what feel like monthly AZ brownouts (typically degradated performance or other control plane issues) with the yearly-ish regional brown/blackout.
GCP has quarterly-ish global blackouts, and generally on the data plane at that which makes them significantly more severe.
Are there any services that track uptime for various regions and zones from various providers? It's rare that everything goes down and thus the cloud providers pretend they have almost no downtime.
Obviously we don't know what the extent of the issue is yet, but afaik there has never been an AWS incident that has affected multiple regions where an application had been designed to use them (like using region specific S3 endpoints). GCP and Azure have had issues in multiple regions that would have affected applications designed for multi-region.
AWS had the S3 incident affecting all of us-east-1: “Other AWS services in the US-EAST-1 Region that rely on S3 for storage, including the S3 console, Amazon Elastic Compute Cloud (EC2) new instance launches, Amazon Elastic Block Store (EBS) volumes (when data was needed from a S3 snapshot), and AWS Lambda were also impacted while the S3 APIs were unavailable.”
There was a terribly day 2-3 months back in us-west-2 where CloudWatch went down for a couple of hours and took out AutoScaling with it, causing a bunch of services like DynamoDB and EC2 to improperly scale in tables and clusters, and then 12 hours later Lambda went down for a couple of hours, degrading or disabling a bunch of other AWS services.
I've also heard similar from a teammate who previously worked with GCP. That said I know several folks who work for GCP and they are expending significant resources to improve the product and add features.
dimes|6 years ago
AWS, on the other hand, has given us very few problems. When we do have an issue with an AWS service, we're able to quickly get an engineer on the phone who, thus far, has been able to explain exactly what our issue is and how to fix it.
jpmattia|6 years ago
I'd love to know how this happens in the modern world. I've seen it myself only once (not GCP, but our own network with cisco equipment.)
Is something in the chain not checking the packet's CRC?
captn3m0|6 years ago
pm90|6 years ago
timdorr|6 years ago
jjeaff|6 years ago
hinkley|6 years ago
kenhwang|6 years ago
GCP has quarterly-ish global blackouts, and generally on the data plane at that which makes them significantly more severe.
jjeaff|6 years ago
tubaguy50035|6 years ago
votepaunchy|6 years ago
AWS had the S3 incident affecting all of us-east-1: “Other AWS services in the US-EAST-1 Region that rely on S3 for storage, including the S3 console, Amazon Elastic Compute Cloud (EC2) new instance launches, Amazon Elastic Block Store (EBS) volumes (when data was needed from a S3 snapshot), and AWS Lambda were also impacted while the S3 APIs were unavailable.”
https://aws.amazon.com/message/41926/
theevilsharpie|6 years ago
I'm overall happy with it, but if I needed to run a service with a 99.95% uptime SLA or higher, I wouldn't rely solely on GCP.
rexarex|6 years ago
dodobirdlord|6 years ago
solidasparagus|6 years ago
thruhiker|6 years ago