top | item 12606995

Bringing Pokémon GO to life on Google Cloud

480 points| bryanmau1 | 9 years ago |cloudplatform.googleblog.com | reply

211 comments

order
[+] jcastro|9 years ago|reply
A bunch of the comments are already pointing out the launch issues Pokemon Go had, and it's well known that a rep from AWS was also throwing jabs at them during launch for their issues.

It would be naive for everyone to assume that a high traffic launch is all about the cloud underneath and only that.

The article didn't mention any of the technical details of the Pokemon application itself, for all we know the infrastructure was humming nicely and the application itself didn't scale. Or the other way around, or a combination of both or one of the other of thousands of moving pieces it takes to launch something.

[+] spacehunt|9 years ago|reply
> it's well known that a rep from AWS was also throwing jabs at them during launch for their issues

That's pretty low. FWIW Simcity 2013 is on AWS and the launch was far more disastrous. Doesn't prove anything.

[+] CorvusCrypto|9 years ago|reply
Upon first read I actually had the same thoughts as many and applied them to the google services as well as the application itself. However on reflection, yeah the services google provided were pretty impressive, especially after re-reading the following:

"Google CRE worked hand-in-hand with Niantic to review every part of their architecture, tapping the expertise of core Google Cloud engineers and product managers"

Architecture analysis may or may not be standard (I don't actually know since I've never had to deal with something like this) but that sounds great to me.

[+] empath75|9 years ago|reply
the scaled from nothing to the size of facebook almost in a few weeks... i'd say it's pretty impressive even with the speedbumps.
[+] deanCommie|9 years ago|reply
It was actually the CTO of AWS, Werner Vogels. :)
[+] xorgar831|9 years ago|reply
Load testing is really hard, it would be interesting to see more research in this area, such as tooling and design patterns that flag bottlenecks that could be outages at larger scales.
[+] balls187|9 years ago|reply
At the same time, Niantic was spun out of Google, so their choice of using Google Cloud is highly dubious.
[+] 0xdeadbeefbabe|9 years ago|reply
> The article didn't mention any of the technical details

Like contending with Team Rocket

[+] richardlblair|9 years ago|reply
> Not everything was smooth sailing at launch! When issues emerged around the game’s stability, Niantic and Google engineers braved each problem in sequence, working quickly to create and deploy solutions. Google CRE worked hand-in-hand with Niantic to review every part of their architecture, tapping the expertise of core Google Cloud engineers and product managers — all against a backdrop of millions of new players pouring into the game.

IMO This is the most valuable thing in this article. It essentially says what others are pointing out. You can't just press a button and have scale. It's not that easy. You have to tackle many layers. Considering they 50x'd their worst case scenario it would have only taken a few bad queries to fuck shit up.

[+] kevan|9 years ago|reply
At 50x designed max load bad queries would be one of the easier problems to solve. We design systems all the time that have hard scaling limits, and it's only a problem when you operate the system past those limits. e.g. I wrote a service with ACID assumptions from the underlying database but now the largest ACID db box I can buy isn't big enough. Oops. There's a bunch of possible ways around that but they usually involve a nontrivial amount of engineering effort.

I wouldn't expect any service to survive that much unplanned load. Maybe they could've estimated better, but how likely was it for the game to go viral? Worth sinking lots of dev time into and delaying launch? That's a hard question to answer without the benefit of hindsight.

[+] riquito|9 years ago|reply
> Considering they 50x'd their worst case scenario

Nitpick: they 10x'd their worst case scenario

[+] user5994461|9 years ago|reply
> You can't just press a button and have scale

Actually you can... if the system was designed for this purpose.

[+] mooman219|9 years ago|reply
Currently on Cloud here at Google. I would like to elaborate "Google CRE seamlessly provisioned extra capacity on behalf of Niantic to stay well ahead of their record-setting growth."

Just because you have the resources does not make for a well scaling service. As outlined in the post, "Google CRE worked hand-in-hand with Niantic to review every part of their architecture, tapping the expertise of core Google Cloud engineers and product managers". Look past the chart, this wasn't just an estimate for Google, it was also for Niantic. You don't have unlimited development resources, not every aspect of an application may have had the time to flesh out a scale-able approach.

Netflix didn't scale out overnight. I'm sure we've seen their techblog about X extremely specialized framework/tool they've built out over the years. I'm impressed with how quickly Niantic achieved a playable experience.

[+] Fiahil|9 years ago|reply
> I'm impressed with how quickly Niantic achieved a playable experience.

I have mixed feelings about this, didn't they drop more than half their peak user base to achieve playable experience?

[+] AaronFriel|9 years ago|reply
This is pure marketing that might convince decision makers and execs that didn't play Pokémon Go.

No doubt, the CRE program could prove valuable. But in this case, they are congratulating themselves on a rocky and widely panned launch of a product on their platform. One might wonder, "If this is what deploying a viral app on Google Cloud Platform looks like when you have help from Google engineers, what chance does anyone else have of getting something right on their platform?"

I think that's probably the wrong takeaway, but it's not difficult for me to imagine that being the only conclusion one has.

[+] itcmcgrath|9 years ago|reply
"Niantic phoned in to Google CRE for reinforcements" -> Note this was post-launch on the world's largest ever mobile launch.
[+] anon987|9 years ago|reply
Absolute marketing trash with almost zero technical value.

If I linked an old style "HP Whitepaper about Success with Customer X" it would be downvoted to hell - but that's exactly what this article is, but by Google instead of HP.

[+] kylecordes|9 years ago|reply
From a user point of view it did not go nearly as smoothly as it is made to sound in this post. I would love to see a follow-up post about what went wrong along the way, and whether the hype surge might have continued longer and stronger if more people picking it up for the first time would've had a smooth experience (rather than lots and lots of errors talking to the servers).
[+] FreakyT|9 years ago|reply
Agreed—this line in particular stood out to me as an outright fabrication:

> In response, Google CRE seamlessly provisioned extra capacity on behalf of Niantic to stay well ahead of their record-setting growth.

Anyone who played that game within the first week of launch would certainly not use "seamless" to describe the experience.

[+] vuldin|9 years ago|reply
I think it's possible that the disconnect between how the Google Cloud team describes what happened and the reality for users can be attributed to the fact that they are just one team in control of one aspect of the entire system. From their standpoint, taking into account their responsibilities, maybe it did go as smoothly they say. A similar post from Niantic would be very interesting.
[+] ultramancool|9 years ago|reply
Yeah, they talked about numbers of bugs fixed, but those surely seem to have impacted the user experience. My big worry is that if I were to consider deploying to such a service, that I would encounter similar bugs, but not have the political pull to get them fixed and simply have to perform hacky fixes around them myself.
[+] chimeracoder|9 years ago|reply
> From a user point of view it did not go nearly as smoothly as it is made to sound in this post.

I think a large part of that was due to the non-Google login from Pokemon Trainer Club (which must have been handled by Nintendo's servers, AFAIK), as opposed to the Google OAuth. Both groups of users had problems, but people who used their pre-existing, Nintendo-issued logins had much more problems (and it took a lot longer for those to get fixed).

There was a period of about a week or so during which time people who used the Pokemon Trainer Club accounts were still having just as much trouble logging in and staying logged in, but people who used their Google accounts were fine.

I wouldn't be surprised if providing multiple login options made it harder for them to properly separate their login servers from their game servers, and if this coupling meant that millions of users (effectively) DDoSing the PTC servers ended up impacting the game's uptime more than simply doing all authentication themselves would have.

[+] cloudjacker|9 years ago|reply
> Game traffic travels Google’s private fiber network through most of its transit, delivering reliable, low-latency experiences for players worldwide.

"reliable"

"low-latency"

[+] ShakataGaNai|9 years ago|reply
I feel really bad for this article. It is the Kobayashi Maru of sales pitches. Working in IT/DevOps/Servers/Software Dev/Etc all my life, I understand that even if you have the servers, scaling can be hard and time consuming. I also can't even imagine supporting the number of people they have. So they did an awesome job.

However, the Pokemon Go player in me says "Wow, even with all of Google's resources, they still couldn't manage to get this remotely stable for several weeks?".

I'm sure there was many amazing technical feats that occurred, and from a deeply technical level this is a good sales pitch. I'm sure a good sales person could spin it even better "50x your expected traffic? Google Cloud can do that!". But beyond that... most people will probably see this as a failure.

[+] wnevets|9 years ago|reply
On one hand handling such huge amount of traffic is crazy hard and an amazing accomplishment however the tone of the blog is off putting because of just how much a trainwreck it was from the users point of view.
[+] lnanek2|9 years ago|reply
> paid off when the game launched without incident in Japan, where the number of new users signing up to play tripled the US launch two weeks earlier.

The "without incident" part is hilarious. The game was unusable for over a week when they added more countries. There were memes all over the place about Niantic execs ignoring the burning server and pushing to launch in more countries anyway. I wonder if any of them actually tried to play as a user on the public servers and spent hours trying to logon and it failing, or locking up soon after for a week.

Not to mention they never even got the original tracker functionality (1 footstep, 2 footstep, 3 footstep for anything nearby) working again after that, they had to replace it with a lower load knock off where you just see what is around a certain location that isn't very popular. So not only did they not even keep login working, they cut features too.

[+] Tepix|9 years ago|reply
I think most of the problems predated the Japan launch. I was in Japan at the time and the game was available most of the time. It was unexpected really considering the insanely high interest over there.
[+] cbhl|9 years ago|reply
Those countries were all side-loading US APKs anyway, so not launching in those countries wouldn't have reduced load on the servers much.
[+] daveloyall|9 years ago|reply
To those that played and were not impressed with game/system stability: okay, okay. I wasn't there, I don't know.

...But, according to the nightly news, it was a tremendous success. The word 'ever' came up a lot.

PMG was the first successful overnight/viral planet-wide/client-side launch. People who had never heard of it saw it on the news and then visited their local app store in response.

And according to the googleblog, it took a tremendous expenditure of money, hardware, electricity, skills, and knowledge to pull it off.

Makes me wonder... Did some other game/app go almost global, but fall short for the want of those very resources described in the blog post?

Something for app devs to think about.

[+] enolan|9 years ago|reply
The broad consensus in the games press was that Pokemon Go is a great example of a strong gameplay loop overriding a massive technical failure.

Google's post is weird because they seem to think the game was a technical success. Google may have done great, it's impossible to tell from the outside, but the actual user experience is - or at least was when I played it - awful.

[+] neves|9 years ago|reply
This was one of the most downloaded apps from all time. Went from zero to gazillions of requests in a single day. Nobody could have planned this. Com'on these guys are great.
[+] zzguy|9 years ago|reply
Yeah the negativity here is overwhelming. Which, isn't surprising, HN's comment section isn't the cheeriest place on the internet. But seriously, for a tech news aggregator you'd think more of the users would appreciate at least the difficulty of scaling an app from nothing to THE most popular app ever, in a matter of days. Yeah they had/still have issues that they could've mentioned in the article, but it doesn't take away from what they DID do.
[+] learningman|9 years ago|reply
Great point. They created an MVP, it worked, and then put resources in when and where the needs declared themselves (stability, scaling...).
[+] randomsofr|9 years ago|reply
Yeah, i never complained about the servers because of that, but the app was also really buggy from the beginning.
[+] Declanomous|9 years ago|reply
I really appreciate this blog post. It gives a great insight into what is going on behind the scenes. I was really surprised by how low their their worse case scenario was. Absolute worst case would be every single person capable of running the game playing. Obviously this wouldn't happen, but for a brand with as much recognition as Pokemon, I think "What if everyone in the world started using this" is a good place to start. Obviously this won't happen, but it's important to think about why it won't happen. "What if everyone who has played Pokemon or wanted to know more about Pokemon downloaded this game?" is still unlikely, but it's less unlikely. It's probably not far off from what actually happened.

I don't want to criticize their model too much, because it's obviously simplified for our benefit. However, it appears that their worst-case scenario was "What if we become the next bejeweled or [insert popular F2P game here]?" It's a ridiculous assumption, because Pokemon has a much broader appeal than any other casual game, cause the IP is so insanely popular, and the game still appeals to people who just want a casual game. I know it is a lot easier to get fired for spending too much money than it is for not spending enough, but it's a stretch to say their launch traffic was beyond imagination. Niantic should start looking for new analysts now if their current analysts honestly thought this traffic was outside the realm of possibility.

I don't consider the server issues to be much of a problem though. It's hard to ensure everything will work perfectly under that kind of load. You have to accurately predict who will be playing, how much they will be playing, how they will interact with the game, and so much more. However, I do think they need to figure out their communication with the fan base. I know that there will be a vocal portion of any constituency that hates everything. That isn't a good excuse for communicating poorly. Good communication will help almost every relationship.

[+] jdcarter|9 years ago|reply
> I was really surprised by how low their their worse case scenario was.

There's no Y axis on the chart, so we don't know exactly what their estimate actually was. Regardless, I'm pretty sure Pokemon GO exceeded any reasonable expectations of popularity, even accounting for the brand and marketing efforts behind it.

From my own experience, lots of people that never engaged with mobile games before started playing Pokemon GO within days of its release. My entire extending family was playing the game. Local bars have become arenas for Pokemon fights. The adoption of this game was absolutely crazy.

So even given the scaling problems, the features they had to remove from the game, and the bugs they introduced, I think this is still a solid win for Google CRE.

[+] Tepix|9 years ago|reply
Nice, but it's disappointing that they do not mention any hard numbers such as concurrent players, requests per second, traffic, etc.

That makes the article a lot less interesting and worthwhile.

[+] lucb1e|9 years ago|reply
Or anything technical whatsoever. The article sounds very much written by a marketing person.
[+] azurezyq|9 years ago|reply
I would say these numbers are really sensitive and may lead to calculations for active user numbers, etc. Yeah, internal reports are way more interesting than publich post.
[+] Perixoog|9 years ago|reply
>... Google Cloud customer...

That's pretty misleading - I believe Google's parent company still own part of Niantic. So other customers shouldn't expect the (implied) same access to Google resources.

[+] bitmapbrother|9 years ago|reply
Google is just one of many series A investors in Niantic:

Alsop Louie Partners

Cyan Banister

Google

Lucas Nealan

Nintendo

Pokémon

Scott Banister

You & Mr Jones Brandtech Ventures

[+] KirinDave|9 years ago|reply
I can't be the only person who looked at that graph and burst out with a cackle that startled everyone around them. The deep and inescapable dread of that fire burning around you even as you make history must have been quite a feeling.

Or in the vernacular of youth: "This is fine. Everything is fine" as a scaling graph.

[+] daok|9 years ago|reply
So their estimation was that nothing would increase? I am not sure I trust the two parallels lines in the graph. The estimation should have been a spike at the release date a small drop and some grow over time, no?
[+] Thaxll|9 years ago|reply
It's easy to blame on the cloud where the application server probably had a lot of issues.
[+] randomsofr|9 years ago|reply
Pokemon GO really disappointed me, i'm a big fan of Pokemon and this app really sucks. It is really buggy. I stopped playing two weeks ago because of the GPS instability. I hope they get it right some day. But i'm glad they fixed the server issues.
[+] kriro|9 years ago|reply
Interesting read but a bit too positive overall. I think the biggest failure of the launch was not learning from the initial launch zones before launching the other zones.

The launch in Europe was a catastrophe imo (constant crashes and freezes). I don't know how much of this is to blame on the cloud infrastructure but I suspect it's not nothing. I feel they didn't provide nearly enough infrastructure given the data they should have had from Australia/USA.

All that being said I think they smoothed out everything and the system seems to be running very nicely now given the scale. It's certainly a positive engineering tale overall.

[+] Cyph0n|9 years ago|reply
Anyone know what kind of backend Pokemon Go is running on? I'm guessing Java or Go?