top | item 19854302

Google’s Cloud TPU Pods are now publicly available in beta

148 points| cokernel_hacker | 7 years ago |cloud.google.com | reply

71 comments

[+] pd0wm|7 years ago|reply

I wonder if they will apply the same terms of service as with their Cloud Machine Learning offerings (Auto ML, Cloud Vision, etc).

A snippet from https://cloud.google.com/terms/service-terms#12-google-cloud...:

  Customer will not, and will not allow third parties to: (i) use these Services
  to create, train, or improve (directly or indirectly) a similar or competing
  product or service or (ii) integrate these Services with any applications for
  any embedded devices such as cars, TVs, appliances, or speakers without Google's
  prior written permission. These Services can only be integrated with
  applications for the following personal computing devices: smartphones, tablets,
  laptops, and desktops. In addition to any other available remedies, Google may
  immediately suspend or terminate Customer's use of these Services based on any
  suspected violation of these terms, and violation of these terms is deemed
  violation of Google's Intellectual Property Rights. Customer will provide Google
  with any assistance Google requests to reasonably confirm compliance with these
  terms (including interviews with Customer employees and inspection of Customer
  source code, model training data, and engineering documentation). These terms
  will survive termination or expiration of the Agreement.

[+] bduerst|7 years ago|reply

This is the TOS specifically for two of the managed services - it says so right in the link "Google Cloud Machine Learning Group and Google Cloud Machine Learning Engine"

These specific ML-as-a-service products are pretty easy to plug into and resell to third parties, which is why the wording is so restrictive in the TOS.

Many of Google Clouds other ML products, which typically are more customizeable and powerful, do not have these restrictions. The way your comment is worded makes it seem like this restriction is on all of Google Clouds ML products, but it's not.

[+] grej|7 years ago|reply

Sincerely thank you for the public service of posting that. You’ve probably saved a lot of folks on here from a big mistake.

I had no idea GCP had such terms. I had been considering alternative cloud hosting for a ML SaaS but will definitely not consider GCP.

[+] _hyn3|7 years ago|reply

.. We will force you to admit that we OWN machine learning for all consumer use cases and will happily sue you just for renting our servers, or force you to turn over your Intellectual Property Rights to us. Thanks.

[+] penagwin|7 years ago|reply

> These terms will survive termination or expiration of the Agreement.

Is this legal? Forgive me because I actually don't know if it could be?

[+] inapis|7 years ago|reply

>(ii) integrate these Services with any applications for any embedded devices such as cars, TVs, appliances, or speakers without Google's prior written permission.

Genuinely curious about the restriction regarding embedded devices.

If there's a mishap, it is the responsibility of the owning company, certainly not Google.

So what gives? Why would they do this?

[+] jstanley|7 years ago|reply

> violation of these terms is deemed violation of Google's Intellectual Property Rights

Oh, it's deemed? I guess that's settled then.

The word "deemed" is almost always used by somebody who doesn't actually have the authority to decide something, but wants you to think they do.

[+] amelius|7 years ago|reply

Regardless of the ToS, my clients don't allow sending data to third parties so this service is completely useless to me.

[+] kitotik|7 years ago|reply

Wow! That is even more aggressive than I would have guessed, and I’m cynical.

Do they have actual business customers that agree to those terms?

[+] m0zg|7 years ago|reply

I was taken aback by this one as well. Basically this is an ineffectual attempt at preventing giving away their AI advantage by letting others use their proprietary models (and thus, indirectly, the gigantic dataset that make those models so good) as a "teacher" for their own models, using the fairly standard model refinement techniques popularized by their own Geoff Hinton. Keeping the cards close to their vest, as it were.

See e.g. https://www.quora.com/What-is-hiybbprqag for an example where Microsoft trained their ranker on Google's long tail and substantially reduced the relevance gap on the cheap, and in a way that makes it impossible for Google to trivially regain the advantage. People misinterpreted this as "Microsoft is copying the results", but that's not what it was. Google was unwillingly teaching Bing how to rank. Google responded with personalized search.

I don't think they can realistically prevent this, though. And it's not going to be anywhere near as traceable as "hiybbprqag" was.

[+] mamon|7 years ago|reply

Google has been using different trick for the same reason with their own employees: that famous 20% of working time that can be dedicated to personal projects. You might think they did that to increase employee satisfaction, but the real reason is that this way they own copyright to whatever their employees create in their spare time, and avoid Facebook-WhatsApp scenario (i.e. having to spent $14B on something you could have aquired for free).

Now I guess they apply the same principle to their GCP customers.

[+] paulddraper|7 years ago|reply

> create, train, or improve (directly or indirectly) a similar or competing product or service

Talking about Google/Alphabet? Good luck with finding something not similar or competing.

[+] rryan|7 years ago|reply

Cloud TPU pods are seriously amazing. I'm a researcher at Google working on speech synthesis, and they allow me to flexibly trade off resource usage vs. time to results with nearly linear scaling due to the insanely fast interconnect. TPUs are already fast (non-pods, i.e. 8 TPU cores are 10x faster for my task than 8 V100s) but having pods open up new possibilities I couldn't build easily with GPUs. As a silly example, I can easily train on a batch size of 16k (typical batch size on one GPU is 32) if I want to by using one of the larger pod sizes, and it's about as fast as my usual batch size as long as the batch size per TPU core stays constant. Getting TPU pod quota was easily the single biggest productivity speedup my team has ever had.

[+] totoglazer|7 years ago|reply

How, if at all, do you have to tweak model architecture or hyperparameters for pods vs tpu vs gpu?

[+] rytill|7 years ago|reply

16k batch size going that fast... Damn. Use it for good.

[+] sytelus|7 years ago|reply

Are TPUs drop-in replacement for CUDA if you were using TF?Can I simply change device from CUDA to TPU and run any TF code? Last I heard, TPUs still had long way to go towards making this happen...

[+] p1esk|7 years ago|reply

I thought TPUs are comparable in speed to V100. What makes them 10x faster for your task?

[+] 64738|7 years ago|reply

If you want to view their header image at larger size but right-clicking doesn't give you the option to "Open image in new tab", the direct link is below. Not a big deal, but it might save a few clicks for some:

https://storage.googleapis.com/gweb-cloudblog-publish/origin...

[+] oooshha|7 years ago|reply

Is this pretty bad news for Nvidia?

[+] p1esk|7 years ago|reply

Nvidia could have released a DL specific chip a long time ago, if they wanted to. I’m not sure why they haven’t (market not big enough?), but they probably will at some point.

[+] lamchob|7 years ago|reply

No, because many companies run their own clusters with no access to TPU hardware, or are not keen on shipping all their models + data over to google.

[+] harigov|7 years ago|reply

Can anyone comment on the reliability/availability of these pods?

[+] reilly3000|7 years ago|reply

They have been at this for multiple years internally; I have to imagine they are battle tested.

[+] m0zg|7 years ago|reply

Unfortunately for Google, NVIDIA's offerings are very strong, and TPUs are a pain the rear to use and require TensorFlow, which in itself is a pain to use, making it doubly painful, to the extent that using their offering requires a significant degree of desperation or not knowing any better.

[+] nl|7 years ago|reply

Well you'll be glad to know that PyTorch is available (in development form) on the TPU: https://github.com/pytorch/xla

[+] akoumis|7 years ago|reply

Have you ever used Keras?