Disclosure: I work at Google Cloud (and directly with Derek and the Twitter team).
I’ll try to edit this page tomorrow when I’m at a computer, but there’s much more information in Derek’s talk at NEXT [1]. They (rightfully) didn’t want to get into a detailed “this is what we saw on <X>”, but Derek alludes to their careful benchmarking across providers.
While you should always assume smart people make economically reasonable decisions, Derek’s point about savings is about list price differences that result from total system performance (and not any sort of special discounting). I’m hoping a follow-up talk will let us say more about how the migration is going, while this talk was focused on the decision to move part of (!) their Hadoop environment to GCP.
FWIW I’m not a big cloud fan, but I was tasked with the obligation of finding the “least worst” cloud provider in terms of predictable throughout and total cost (inc dev/ops time) and GCP came out as a clear winner despite heavy financial incentives from both Azure and AWS. (We’re a very large corp).
Due to our new allegiance to google cloud I’ve been given a little more privileged access to engineers (after the fact) and I can tell that I definitely made the right choice, the people I spoke to favoured having a clean/clear backend with actual quality of life features; but since they don’t go nicely into feature comparison charts people often think that GCP is less mature or featureful.
I’m a real convert, and our company uses all three of the big cloud providers in some fashion- but my team only deals with GCP and we’ve had the least headaches.
What I’m trying to say is: you guys are doing great. I’m really happy with the product for my use-cases.
> Disclosure: I work at Google Cloud (and directly with Derek and the Twitter team).
Thanks for making this community awesome. I want to work at Google or with Google one day. One of the few companies I have always admired and always will.
It is interesting point to bring up but nonetheless discount pricing is not out of question.
It is possible that while GCP gives a good value based on the writing on the tin, someone else might beat that with a good margin under special terms.
That being said, I have reasons to think very few could beat Google on compute or storage game and definitely much harder to beat the big G on Networking side of things.
I like the google storage API the most honestly. Firebase is really nice but the storage API is very polished and basically does exactly what you need it to do without the complexity of S3.
> Derek’s point about savings is about list price differences that result from total system performance (and not any sort of special discounting).
Obviously that's the thing you'd choose to highlight in a public Google-hosted presentation after a strategic partnership that is a bonanza in marketing for Google Cloud. It wouldn't work so well to tell a room full of engineers "We moved to Google cuz they gave us a fat discount, one you can't get because you're not Twitter!"
Reading most comments here I feel like nobody wants to give a tiny bit of credit to either Twitter or Google, which I find disenchanting.
It is true that any cloud provider would get good publicity by being able to say that Twitter runs on their cloud, but that is precisely the same reason why we shouldn't just shrug this off as a publicity/business decision, because just like Google might have tried to persuade Twitter other cloud providers will most certainly have tried their best too. If Twitter still went with GCP then I think it is only fair to assume that there must have been some real advantage in going with GCP. I don't think anyone who says this must have been a pure business decision is entirely honest with themselves, because a cloud which cannot handle Twitter's volume is no good even if it was entirely for free.
I have nothing to do with Twitter (actually don't even like Twitter because it is such a toxic social media platform) and I don't like many things which Google does, but as an engineer myself I can only say that after using Azure, AWS and GCP for many years in a commercial setting that GCP is indeed years ahead. The quality, speed and reliability of GCP is second to none from my experience and I honestly couldn't say the same about AWS and especially not at all about Azure.
I'm not worried about GCP's quality or technical merits.
Google's management is what scares me. I'd never build a business around a company with so little follow-through, commitment to product longevity, or focus.
> The quality, speed and reliability of GCP is second to none from my experience
Can you be more specific on what reliability GCP is years ahead of the others? I'm guessing numbers will be pretty comparable on all platforms so if we want to declare outright winners it would be useful to cite a number to indicate where you experienced this.
the quality, speed and reliability of GCP is second to none?
I hope this is a joke. AS someone who started on GCP, and got tired of the TERRIBLE quality and reliability of their docs among other issues saying that they are ahead is an absolute joke.
I've found some corner cases with AWS, but got quick resolution even on things they have in "beta" - and consistency from documentation through to system is high.
Secondarily, AWS seems to support even old tech FOREVER. I have a simpledb based app. That tech is 9 years old now. Still ticking. When you are building stuff up over time and can't afford to rebuild on the new hotness every other year, this is nice.
Anyone know the revenue AWS and GCP generate? Seriously hard to believe GCP is so many years ahead in this space from my own experience.
The scale of these megacorps is crazy! I wonder how you even plan to move that much data. I know that Amazon has a service called Snowmobile for stuff like this. They probably did something similar here.
"You can transfer up to 100PB per Snowmobile, a 45-foot long rugged sized shipping container, pulled by a semi-trailer truck."
Correct me if I'm wrong, but this doesn't say what Twitter moved to the cloud. It could be literally anything, and may not in fact be core tweet or user data.
This strikes me as a team (or teams) outgrowing the storage options that were offered internally and choosing to outsource to fit their needs. Isolated use for one business case, and not indicative of a broad movement to the cloud on the part of the whole org.
I could be wrong, of course.
I'd like to know more. And I'd certainly like to know what the engineers at Twitter think.
This. GCP may be good and all, but it scares me to hell how
1) Liberal they are with blocking payment accounts
2) The fact that your payment account is tied to everything! If it goes down, your entire account is locked. Anything on GCP goes down. Hell GSuite goes down. Say goodbye to your accounts, your purchases and your identities.
If you also use Google Domains or Fi, also good luck getting your number and email.. You're basically locked out of all your accounts. With minimal chance of recovery.
I've scoured the comments and article, but I might have missed it... What did Twitter migrate from?? Did they have their own data centre before moving to Google Cloud?
Any inside stories on what twitter discovered when they benchmarked Google with other providers. I imagine they must have done a very exhaustive evaluation
Yeah, they've failed at every attempt of making a pure social network (they bought youtube, its a kind of social network) with Microsoft buying github, it would be prudent of them to buy Twitter before someone else does.
It will be interesting to watch. My hunch is that after looking at the shit storm that is happening around Facebook, Google may be having second thoughts about being in the social networking business (youtube excluded).
With Twitter's current state and Google's moves to kill their social side, this is unlikely. Why buy a moderate sized social network when you just killed your similarly sized social network?
side note, if you want to know why distributed infrastructure engineering has been the place to be recently, think about the fact that engineers making $200-500k/year are making decisions that can save big corporations $1-5mm per year in compute/storage costs
The issue with GCP and with Google in general is support. I'm certain that's not an issue for Twitter as there are probably multiple people on call just for the Twitter deal. But it might mislead the average company down a bad path believing there is decent support with GCP. Any custom issues (like billing) and it's a headache dealing with Google.
What Twitter is paying for and the level of service they receive from Google, is very different from the typical GCP pricing and service level that would be available for say, your company...
Realistically, it can't be considered as the same service, can it?
One thing to remember with the benchmarks is that when evaluating a cloud provider, the benchmarks are useless if talked about with absolute numbers or with regards to hardware specs. Because they're so horizontally scalable, it's more a question of average cost per compute operation or cost per (peta) byte.
Both AWS and Google clouds are perfectly capable of running Twitter, and running any software that Twitter uses, even if the number of machines is different. The only "benchmark" applicable is actually the total negotiated cost of the machines required to get the job done.
So it's not about being able to do things on Google cloud with fewer processors or less RAM or faster hard drives - Google was willing to give them a lower total cost of ownership for reasons known only to Google.
Do not assume that Google (or anybody else) will give you a similar preferential deal. Ignore the "benchmarks".
I don't think it's purely a measure of cost per compute operation. That assumes you would run the same software on any cloud using plain VMs or a service that slightly abstracts them. That may be true in this case since Twitter will continue to use Hadoop/Spark, but sometimes you can get a real advantage from switching to a service that only one cloud provider offers. For instance, someone in these comments pointed out that Twitter already migrated their anayltics workload to BigQuery. Evaluating BigQuery vs Redshift vs something else is not as clear cut as cost per compute operation.
If you want to check which cloud is best for you, run your application on all of them, measure your average+median+p95 cost per whatever (tweet/post/reader/dollar of revenue/ticket/?). Then factor in which platform is easier to work with, has better tooling and community support, because until you hit a certain scale your time will always be more expensive.
After you do all that, just start with Heroku and Postgres with some AWS Lambda, then move to ALB + AWS Fargate + RDS|Aurora / DynamoDB as you get bigger, then to NLB + ECS with a cluster of 20-80 On Demand and Spot Instances, then to a cluster of ARM instances on Spot.
If you need to build your own datacenter at that point, you'll know. And you would have built a 12Factor app to make all of the above work, so migrating will be easy.
[+] [-] boulos|7 years ago|reply
I’ll try to edit this page tomorrow when I’m at a computer, but there’s much more information in Derek’s talk at NEXT [1]. They (rightfully) didn’t want to get into a detailed “this is what we saw on <X>”, but Derek alludes to their careful benchmarking across providers.
While you should always assume smart people make economically reasonable decisions, Derek’s point about savings is about list price differences that result from total system performance (and not any sort of special discounting). I’m hoping a follow-up talk will let us say more about how the migration is going, while this talk was focused on the decision to move part of (!) their Hadoop environment to GCP.
[1] https://m.youtube.com/watch?v=4FLFcWgZdo4
[+] [-] dijit|7 years ago|reply
Due to our new allegiance to google cloud I’ve been given a little more privileged access to engineers (after the fact) and I can tell that I definitely made the right choice, the people I spoke to favoured having a clean/clear backend with actual quality of life features; but since they don’t go nicely into feature comparison charts people often think that GCP is less mature or featureful.
I’m a real convert, and our company uses all three of the big cloud providers in some fashion- but my team only deals with GCP and we’ve had the least headaches.
What I’m trying to say is: you guys are doing great. I’m really happy with the product for my use-cases.
[+] [-] fhoffa|7 years ago|reply
- >500k cores
- >300PB storage
- >12,500 cluster size
- >1T messages per day
And there's also this other talk, "How Twitter Migrated its On-Prem Analytics to Google Cloud" - focused on their migration to BigQuery:
- https://www.youtube.com/watch?v=sitnQxyejUg
- 20 TB/ day of raw log data, >100k events/sec
- Loading ~1TB/hour into BigQuery.
- Serving 5,000+ complex queries / second. p99 ~300ms
Disclosure: I'm Felipe Hoffa and I work for Google Cloud https://twitter.com/felipehoffa.
[+] [-] rblion|7 years ago|reply
> Disclosure: I work at Google Cloud (and directly with Derek and the Twitter team).
Thanks for making this community awesome. I want to work at Google or with Google one day. One of the few companies I have always admired and always will.
[+] [-] omeid2|7 years ago|reply
It is possible that while GCP gives a good value based on the writing on the tin, someone else might beat that with a good margin under special terms.
That being said, I have reasons to think very few could beat Google on compute or storage game and definitely much harder to beat the big G on Networking side of things.
[+] [-] burtonator|7 years ago|reply
https://getpolarized.io/2019/01/03/building-cloud-sync-on-go...
I like the google storage API the most honestly. Firebase is really nice but the storage API is very polished and basically does exactly what you need it to do without the complexity of S3.
[+] [-] eruci|7 years ago|reply
[+] [-] UncleJo7|7 years ago|reply
[+] [-] deanCommie|7 years ago|reply
Obviously that's the thing you'd choose to highlight in a public Google-hosted presentation after a strategic partnership that is a bonanza in marketing for Google Cloud. It wouldn't work so well to tell a room full of engineers "We moved to Google cuz they gave us a fat discount, one you can't get because you're not Twitter!"
The fact that you allocated a top level URL for it is embarrassing (https://cloud.google.com/twitter/). Can you imagine if Amazon advertised https://aws.amazon.com/netflix/?
[+] [-] dustinmoris|7 years ago|reply
It is true that any cloud provider would get good publicity by being able to say that Twitter runs on their cloud, but that is precisely the same reason why we shouldn't just shrug this off as a publicity/business decision, because just like Google might have tried to persuade Twitter other cloud providers will most certainly have tried their best too. If Twitter still went with GCP then I think it is only fair to assume that there must have been some real advantage in going with GCP. I don't think anyone who says this must have been a pure business decision is entirely honest with themselves, because a cloud which cannot handle Twitter's volume is no good even if it was entirely for free.
I have nothing to do with Twitter (actually don't even like Twitter because it is such a toxic social media platform) and I don't like many things which Google does, but as an engineer myself I can only say that after using Azure, AWS and GCP for many years in a commercial setting that GCP is indeed years ahead. The quality, speed and reliability of GCP is second to none from my experience and I honestly couldn't say the same about AWS and especially not at all about Azure.
[+] [-] smt88|7 years ago|reply
Google's management is what scares me. I'd never build a business around a company with so little follow-through, commitment to product longevity, or focus.
[+] [-] notyourwork|7 years ago|reply
Can you be more specific on what reliability GCP is years ahead of the others? I'm guessing numbers will be pretty comparable on all platforms so if we want to declare outright winners it would be useful to cite a number to indicate where you experienced this.
[+] [-] privateSFacct|7 years ago|reply
I hope this is a joke. AS someone who started on GCP, and got tired of the TERRIBLE quality and reliability of their docs among other issues saying that they are ahead is an absolute joke.
I've found some corner cases with AWS, but got quick resolution even on things they have in "beta" - and consistency from documentation through to system is high.
Secondarily, AWS seems to support even old tech FOREVER. I have a simpledb based app. That tech is 9 years old now. Still ticking. When you are building stuff up over time and can't afford to rebuild on the new hotness every other year, this is nice.
Anyone know the revenue AWS and GCP generate? Seriously hard to believe GCP is so many years ahead in this space from my own experience.
[+] [-] jeffshek|7 years ago|reply
[deleted]
[+] [-] pizza|7 years ago|reply
"You can transfer up to 100PB per Snowmobile, a 45-foot long rugged sized shipping container, pulled by a semi-trailer truck."
https://www.youtube.com/watch?v=8vQmTZTq7nw
[+] [-] derekalyon|7 years ago|reply
For this use case, we setup 800 Gbps of interconnect with Google.
[+] [-] pojzon|7 years ago|reply
First you can try to replicate some data and move read instances. After that step by step the rest.
Its really not possible to do that overnight or months even.
Fyi: im currently migrating twice the Twitter size one in the world infrastructure from aws to azure now.
[+] [-] fro0116|7 years ago|reply
https://what-if.xkcd.com/31/
[+] [-] andrewguenther|7 years ago|reply
The video linked on the page is also from last August: https://www.youtube.com/watch?v=T1zjmNAuMjs
Throw a 2018 tag on this?
[+] [-] echelon|7 years ago|reply
This strikes me as a team (or teams) outgrowing the storage options that were offered internally and choosing to outsource to fit their needs. Isolated use for one business case, and not indicative of a broad movement to the cloud on the part of the whole org.
I could be wrong, of course.
I'd like to know more. And I'd certainly like to know what the engineers at Twitter think.
[+] [-] egorfine|7 years ago|reply
Their credit card failed at Google Payments, acc gets marked for fraud and no one in support can help them recover. Twitter is gone for good.
[+] [-] dangoor|7 years ago|reply
[+] [-] rhamzeh|7 years ago|reply
If you also use Google Domains or Fi, also good luck getting your number and email.. You're basically locked out of all your accounts. With minimal chance of recovery.
[+] [-] jasonvorhe|7 years ago|reply
[+] [-] cyberferret|7 years ago|reply
[+] [-] joatmon-snoo|7 years ago|reply
https://www.datacenterknowledge.com/archives/2015/07/13/cust...
[+] [-] ripvanwinkle|7 years ago|reply
[+] [-] rapsey|7 years ago|reply
[+] [-] advisedwang|7 years ago|reply
[+] [-] gigatexal|7 years ago|reply
[+] [-] SmellyGeekBoy|7 years ago|reply
[+] [-] slackoverflower|7 years ago|reply
[+] [-] bduerst|7 years ago|reply
AFAIK Twitter allows search indexing and is already a publisher on their network.
[+] [-] dana321|7 years ago|reply
[+] [-] mobilekid7|7 years ago|reply
[+] [-] wstrange|7 years ago|reply
[+] [-] StudentStuff|7 years ago|reply
[+] [-] temp231239|7 years ago|reply
[+] [-] opportune|7 years ago|reply
[+] [-] kerng|7 years ago|reply
[+] [-] patrickg_zill|7 years ago|reply
Realistically, it can't be considered as the same service, can it?
[+] [-] sudhirj|7 years ago|reply
Both AWS and Google clouds are perfectly capable of running Twitter, and running any software that Twitter uses, even if the number of machines is different. The only "benchmark" applicable is actually the total negotiated cost of the machines required to get the job done.
So it's not about being able to do things on Google cloud with fewer processors or less RAM or faster hard drives - Google was willing to give them a lower total cost of ownership for reasons known only to Google.
Do not assume that Google (or anybody else) will give you a similar preferential deal. Ignore the "benchmarks".
[+] [-] sciurus|7 years ago|reply
[+] [-] sudhirj|7 years ago|reply
After you do all that, just start with Heroku and Postgres with some AWS Lambda, then move to ALB + AWS Fargate + RDS|Aurora / DynamoDB as you get bigger, then to NLB + ECS with a cluster of 20-80 On Demand and Spot Instances, then to a cluster of ARM instances on Spot.
If you need to build your own datacenter at that point, you'll know. And you would have built a 12Factor app to make all of the above work, so migrating will be easy.
[+] [-] thisgoodlife|7 years ago|reply
[+] [-] tome|7 years ago|reply
[+] [-] thecanman|7 years ago|reply
[+] [-] Rafuino|7 years ago|reply
[+] [-] yeukhon|7 years ago|reply
[+] [-] prepend|7 years ago|reply