top | item 11954988

How Google Is Remaking Itself for “Machine Learning First”

274 points| steven | 9 years ago |backchannel.com | reply

116 comments

order
[+] arbre|9 years ago|reply
I don't believe in "everyone should work on machine learning". I worked on several deep learning models but I don't really like it. It is a very different job than software engineering in my opinion. ML is more about gathering data and tuning the models as opposed to building stuff. I have spent months working on models and barely wrote any code. It is more efficient to have ML experts focus on the modeling and software engineers use the model.

I do believe however that some experience is needed to understand what is possible and best benefit from existing tools or to be able to communicate with machine learning engineers about your needs.

[+] giardini|9 years ago|reply
I concur. ML isn't programming per se; it is experimental problem-solving with a particular dataset and algorithm. Your result may/not work well, may/not generalise, and will almost undoubtedly not contribute anything new to any discipline, even to ML. When all ML work is done we'll have great pattern recognizers but nothing remotely akin to thought. And we won't understand how they work or the best way to build the next one. It isn't AI, although it is a part of AI, just as the visual system is part of AI.

I was reading Domingos' "The Master Algorithm" several days ago and a mathematician inquired about the book. He knew a group of ML developers. His opinion was that "ML doesn't look very interesting: all you do is play with the parameters, turn the knobs, and/or change the model until something works. There's no real progress there; nothing substantial."

Rather than sending a batallion of bright developers into the ML swamp where they will largely be frustrated, learn little and contribute less, I'd be tempted to guide them into other fields.

[+] etangent|9 years ago|reply
I don't know. I know some engineers who have spent months going back-and-forth over communication protocols while barely writing any code, yet somehow their job is considered to be quite core to software engineering. I don't really see how fine-tuning communication protocols is fundamentally different from fine-tuning machine learning models. But overall, I agree with your sentiment: different things are different and appropriate for different people.
[+] personjerry|9 years ago|reply
Hi arbre, would you mind explaining what is possible and what benefits from existing tools in machine learning at the moment? I am clueless and find ML rather frustrating to get into.
[+] dgacmu|9 years ago|reply
Absolutely. In general, ML needs a collaboration between ML expertise and application domain expertise. It's very helpful if there's someone who can help bridge those two - enough app experience to understand the domain deeply, and enough ML experience to know what questions to ask of the ML gurus and what pitfalls to expect. As I see it, that's one of the goals of the ML ninja program.
[+] zmj|9 years ago|reply
This is how software eats your job.
[+] srtjstjsj|9 years ago|reply
> "everyone should work on machine learning"

> software engineers use the model.

You aren't disagreeing.

[+] uola|9 years ago|reply
"Moving data around" is what a lot of software engineering is these days. Facebook, Google etc. are more data companies than software companies (and probably close to media than communcations companies).
[+] yomly|9 years ago|reply
Articles like this for me tend to vindicate Google's notorious hiring processes.

While it is true that for most people will not need to be able to whiteboard a binary tree inversion in their day to day, it seems like they expect their engineers to be able to throw themselves at any problem they're given and require them to be able to pivot in skillset quickly, and have an appreciation of all the developments going on around them so they can apply anything novel ideas developed internally to what they are currently working on.

In those cases, hiring based on sound knowledge of CS fundamentals seems like a good bet...

60k engineers is a pretty terrifying number though.

[+] arcanus|9 years ago|reply
I'm skeptical nevertheless. In my experience, most programming is very different than r+d, which often does require significant concentrated training or even the smartest will spin their wheels.

It's hard to describe, but research (which the vast majority of ML remains) is something that even a sound knowledge of fundamentals might not remotely be enough.

[+] TulliusCicero|9 years ago|reply
60k is total number of full-time employees. It includes non-engineers, and does not include contractors.
[+] gonyea|9 years ago|reply
Google's largely moved away from those BS questions. They just bias towards people who memorize answers on Leetcode, but aren't actually capable of producing anything.
[+] raverbashing|9 years ago|reply
> Articles like this for me tend to vindicate Google's notorious hiring processes.

No, because they have rejected ML experts if they can't do their stupid dog & pony show.

> hiring based on sound knowledge of CS fundamentals seems like a good bet...

Too bad many of them can't get their heads around the ML math.

[+] dj-wonk|9 years ago|reply
"probably almost half of its 60,000 headcount are engineers"
[+] xenihn|9 years ago|reply
Anyone happen to have a suggested self-teaching path for Machine Learning? I.e. books and courses. I know that Andrew Ng's course is a great resource, but I know that I'm not ready to start it yet. I'm actually way behind on the mathematical pre-requisites, so recommendations for that would be greatly appreciated as well. I've never taken a statistics course, and never received any formal education for mathematics past trig. I know that I'm looking at a good 6 months to a year just to get caught up on the math alone.
[+] TDL|9 years ago|reply
I'm sure others in this thread will have some good advice on the math front. You will want to be comfortable with statistics (as it seems you already are aware), but you will also want to be comfortable with linear algebra as well. Andrew Ng's course has a quick tutorial on linear algebra, you might also want to check codingthematrix.com. Khand Academy is a decent place for stats, probability, linear algebra, & calculus. I know there has been some criticism of K.A. in the past, but I think it's a good resource to get an intro level understanding of those topics.

As an intro to ML, I am a fan of Courseras ML specialization that is done by the University of Washington (https://www.coursera.org/specializations/machine-learning). It's free, except for the capstone, and the instructors do a good job of giving both theoretical & practical grounding in various aspects of ML.

I am sure others will have good suggestions as well. Good luck.

[+] bhntr3|9 years ago|reply
I actually think the hardest part about ML is the lingo. It's very alienating that even simple concepts seem to have their own lingo. A lot of the ideas are just what you as a developer might do intuitively if you had to implement it. But the language tends to be a bit mathy and obscure. So, when you try to read something without understanding the lingo, it seems impenetrable. But once you know things like "quantization is basically rounding" . . . it becomes easier.

Since ML comes from statistics, math, programming, but also other scientific fields, it can even have many terms for essentially the same thing.

For me, as a developer, it was actually easiest to just read some tutorials like the docs for scikit learn and then just start digging through the code of a bunch of libraries. How people name the classes tells you what they think things should be called. But the code tells you what it actually does. I just bounced back and forth between code, tutorials/blogs and books. After a few months, I can actually have a reasonable conversation with our ML people in the language they use and everything else I look at seems easier because I understand most of the terms.

I think asking how to learn ML is a lot like asking how to learn German. It might feel like you need to start with the grammar rules. But I think immersion is the best way. Get the vocabulary, then come back to the rules. I also find that having a burning question in my mind helps me with immersion. So, if you can find a project that drives you, maybe that will help.

So starting with the math fundamentals as a developer seems like an easy way to burn yourself out. But everyone does learn differently. If not, there wouldn't be so many ML algorithms, right? Right?

[+] sputknick|9 years ago|reply
Six months ago, I would have said Kaggle, Juptyer, Python, figure things out. I've since discovered Microsoft's ML Studio. It allows you to start out with drag and drop (no code to learn) and, most importantly, you can visually see the output of your experiments. For example if you run a binary decision tree algorithm you can actually look at images of the 1000 trees it created and what the nodes from them is. Not important for practical functioning in the real world, but I like it a lot as a tool to learn.
[+] nighthawk454|9 years ago|reply
Check out this (free) book http://ciml.info. I used it in my ML course (professor was the author), and remember it being one of the better textbooks I've read. Covers a variety of topics in a relatively easy-to-read and succinct manner, given the subject matter.

Not exactly light on math, so you may want to read up on some multivariate Calculus and Linear Algebra before the later chapters. First few sections should be approachable regardless.

[+] krosaen|9 years ago|reply
I'm taking time off to study ML and keep an ongoing list of curriculum resources, as well as a blog of my day to day, here:

http://karlrosaen.com/ml/

[+] nobullet|9 years ago|reply
There was HN thread about this: https://news.ycombinator.com/item?id=11859165

Below is my favorite response by vaibkv:

vaibkv 15 days ago

Here's a tentative plan- 1. Do fully AndrewNg's course from Coursera 2. Do a course called AnalyticsEdge by MIT folks from edx.org. I can't recommend this course highly enough. It's a gem. You will learn practical stuff like RoC curves, and what not. Note that for a few things you will need to google and read on your own as the course might just give you an overview. 3. Keep the book "Elements of Statistical Learning" by Trevor Hastie handy. You will need to refer this book a lot. 4. There is also a course that Professor Hastie runs but I don't know the link for it. I highly recommend it as it gives a very good grounding on things like GBM, which are used a lot in practical scenarios. 5. Pick up twitter/enron emails/product reviews datasets and do sentiment analysis on it. 6. Pick up a lot of documents on some topic and make a program for automatically producing a summary of those documents - first read some papers on it. 7. Don't do Kaggle. It's something you do when you have considerable expertise with ML/AI. 8. Pick up flights data and do prediction for flight delays. Use different algorithms, compare them. 9. Make a recommendation system to recommend books/music/movies (or all). 10. Make a Neural Network to predict moves in a tic-tac-toe game. These are a few things that can get you started. This is vast field but once you've done the above in earnest I think you have a good grounding. Pick a topic that interests you and write a paper on it - it's not such a big deal.

[+] argonaut|9 years ago|reply
You should start with an intro calculus class (e.g. Calculus I). Andrew Ng's Coursera course teaches you the necessary linear algebra. After his Coursera course it'll be worthwhile to take a linear algebra class.
[+] vecter|9 years ago|reply
If the only math you know is up to trig, you're probably multiple years away from getting caught up on the math.

You need to first learn calculus and linear algebra, and learn them very well. I would also recommend having a good understanding of probability. Learning all of these well will take at least a year, if not longer. For instance, I took one year of calculus in high school and then one semester each of linear algebra and probability, which that adds up to two years.

You'll need calculus so you can do optimization (i.e. at the simplest level, take a derivative, set it to 0, and solve. Of course there's more you can do with calculus in Machine Learning). You'll need linear algebra for almost everything in Machine Learning. Lastly, probability will be useful for understanding very basic methods like Naive Bayes[0]. There are other methods built on probability also[1].

If you skimp on learning any of these, you will never be able to understand Machine Learning at a deep level, much less even a shallow level.

[0] https://en.wikipedia.org/wiki/Naive_Bayes_classifier

[1] https://en.wikipedia.org/wiki/Graphical_model

[+] kriro|9 years ago|reply
"Python Machine Learning" is a pretty good book. I also like "Natural Language Annotation" which is a bit specialized but there aren't all that many books on the annotation process.
[+] matt_wulfeck|9 years ago|reply
And my anecdotal experience is that it's working extremely well. Take the Google Photos app that does automatic image recognition and tagging. The other day I was looking for a picture we took of our cat the first night we brought him home. I remembered we left him with a blanket in the bathroom but couldn't remember much else.

"kitten bathroom 2013"

And there was a picture of the cat sitting in the tub on a blanket. Simply amazing.

[+] hoodoof|9 years ago|reply
I seem to recall Google focusing the entire company on social/GooglePlus. Is this now saying the company is now being focused on machine learning in the same way?

Reminds me of the Ballmer/Gates strategy of everything must be Windows, which seemed flawed to me.

[+] Bjorkbat|9 years ago|reply
That's an interesting way to look at it.

I would argue that Google+ didn't work out because Google was trying to play catch-up in a field that it just lacked knowledge in (social networks).

Whereas with machine learning, they're not playing catch-up, everyone else is. Of all the other tech titans out there, they're the ones really leading the pack.

That remark aside though, I agree with you. An attempt to go hard on machine learning and apply it everywhere will probably work out pretty badly. As fascinating as ML is, I just haven't bothered to learn it yet because I haven't the slightest idea what new and novel problem I'd solve with it that doesn't have a better solution through a more straight-forward approach.

[+] srtjstjsj|9 years ago|reply
In between, they focused the entire company on switching from Desktop to Mobile.
[+] xg15|9 years ago|reply
I was kind of surprised this article hooks with that relatively small "Ninja" workshop. My impression so far was that Google more or less created the whole machine Learning movement (out of necessity from their two core field, search and ads/analytics) and is employing several authorities of the field.

After Google Now, DeepDream and all the self driving car hype, reading about that workshop being the start of the big transformation seems strange.

[+] glx1441|9 years ago|reply
Peter Domingos? Really? Did they mean Pedro?

Sigh. Another instance of pop science getting most everything wrong (and I haven't even bothered to write anything about the technical content in the article).

[+] z92|9 years ago|reply
That's a good change from "social first" from a few years back. Google was never a social company to start with. Remember Orkut?

AI is google's leverage. It should explore on that path.

[+] Dowwie|9 years ago|reply
I find this article alarming.

Jeff Dean said, "The more people who think about solving problems in this way, the better we'll be". I sincerely hope that Sundar emphasizes the thoughtful application of ML and not allow black box algorithms take too central a role.

This kind of hubris swept through wall street banks during the structured products boom, ultimately leading to products such as synthetic collateralized debt obligations. Taking Jeff Dean's opinion about whether machine learning would be a good thing is like taking the opinion of the creator of synthetic CDOs whether they were a good thing. The authors and evangelists are blinded by optimism and opportunity.

Is Sundar Pichai swept away by the opportunities of machine learning and too biased to be aware of risks ? Is Sundar acting like Stan O'Neil did as he pulled all the stops at Merrill Lynch and went all-in with CDOs? I hope he isn't. It does not seem to be the case as he mentions thoughtful use of ML.

Nonethless, caution should be taken.

[+] nborwankar|9 years ago|reply
Bit of a self-plug here - LearnDataScience http://learnds.com has been well received as a starting point for newcomers. It's a set of Jupyter notebooks with a lot of hand holding. Git repo has data sets included so you can clone and go. All Python.
[+] DrNuke|9 years ago|reply
Not sure where it is going at all: evolutionary leaps often come from outliers and sometimes from serendipity. What about this reinforced confirmation bias?
[+] rhizome|9 years ago|reply
On first blush, my sense is that a translation could go something like "we're prioritizing the analytics API over the results API." Not analytics in the webserver sense, but the OLAP/DW one. So, e.g. ad targeting fidelity over results presentation algorithms. Backend biz vs frontend.
[+] entee|9 years ago|reply
This is a really great idea, especially when done right. The difficulty with machine learning and AI is understanding the pitfalls inherent in selecting data and training systems. You can fool yourself pretty easily into thinking you've got something that works when you really don't. That said it sounds like they're doing things well, I have no doubt this will have a positive impact in demystifying the "magic" of ML/AI and making all those Google products I use better!
[+] tdkl|9 years ago|reply
I guess now we know who's responsible for asinine UI decisions lately (YouTube apps, Material wastespace design). /s
[+] jdeisenberg|9 years ago|reply
The article says that Mr. Giannandrea is no longer head of the machine learning division; out of curiosity, who has taken that position? It's not clear from the article.
[+] ycosynot|9 years ago|reply
Maybe I talk nonsense, but the term "machine learning" could be detrimental to learning it, because it feels so machinesque ... It's a cool term, but also very vague and mystical, and from the antropomorphism it kinda implies the engineer is a teacher, or a translator. You're not even started, and you're already confused.

Surely it is better to talk of learning deep neural nets, and such things. Or maybe "machine training" would be less intimidating. But I guess we're stuck with it, and it's not so bad.

[+] srtjstjsj|9 years ago|reply
When will they move past the "Slogan First" magpie direction-switching?
[+] StevePerkins|9 years ago|reply
Great article, but I can't help but CRINGE at the "ninja" references. I think that's already played out within the industry... and although pop-tech writers tend to lag a few years behind, it will sound extremely dated in the mainstream within a few years.
[+] eplanit|9 years ago|reply
Agreed, and I've been waiting over a decade now for the demise of tiresome qualifiers like "ninja" and "on steroids" (we could all add a few, I'm sure). I was really, really tired of "uber", too, but now that one seems here to stay for quite a while longer. Oh, well...
[+] lugg|9 years ago|reply
> “The tagline is, Do you want to be a machine learning ninja?”

I don't really like the word, but I don't really give a flop either.

I'm not sure how its better or worse than guru, rockstar, or any other lame word recruiters like to use to make us feel like the special snowflakes we are.

Which word would you like to see in place of 'ninja'?

[+] vimota|9 years ago|reply
That first paragraph almost made me stop reading.
[+] holografix|9 years ago|reply
Reading shit like this makes me wanna drop everything and start a Maths degree and get seriously into Machine Learning. Can you imagine being picked at work to study something AWESOME while being paid for it?!? She must be a genius.