Machine Learning Fairy Dust

[+] law|14 years ago|reply

What makes me really nervous is that we're nearing the point when Google's Prediction API and its knock-offs will increasingly pervade web sites much in the way that AJAX and other technologies have. While overuse of AJAX and the Facebook "Like" button is extremely annoying, it's still pretty harmless.

Machine learning, on the other hand, isn't innocuous. In order to use the Prediction API, you need a large corpus of data, which will just further incentivize web sites to ignore the privacy implications of their actions. Machine learning is far too abstract and too much of an "umbrella" term for it to be anything but careless to refer to it as some sort of panacea.

If you thought that Facebook's "Beacon" was a slap in the face to online privacy, just wait until you see what the feature holds. Once machine learning libraries with extremely robust, completely unsupervised classifiers become more abundant, we're going to see an exponential increase in the market for data. Banner advertisements will be replaced with much more terrifying 'targeted' ads, and we will enter into an age where we are judged not by the empirical evidence of our actions, but the inferences made from people who behave like us.

[+] bluekeybox|14 years ago|reply

> Banner advertisements will be replaced with much more terrifying 'targeted' ads

Can someone please explain to me why ads for stuff I might actually be willing to buy (as opposed to hyper-annoying junk thrown at me every day) terrify so many people?

Not that I am ambivalent to privacy issues; just playing devil's advocate here.

[+] the_cat_kittles|14 years ago|reply

If you act on the internet like you would in real life, it doesn't seem like such a big deal. How is getting relevant ads a bad thing?

[+] Estragon|14 years ago|reply

Do you really think such a turnkey ML service is possible, though?

[+] hammock|14 years ago|reply

There are NO shortcuts. This is a fantastic article, and he lists a lot of good examples- machine learning, "social", crowdsourcing, AJAX, real-time.

I would add to that list "create a forum." Maybe that's part of "social." In marketing I hear it all the freaking time- you get a half-ass mediocre idea and it always includes some type of "forum" your customers will recruit themselves into somehow, and start to form a community. Most of these people have never been on a forum so I can't blame them for not knowing how it works, but it is a challenge.

[+] dholowiski|14 years ago|reply

I was nodding my head u til I got to the end - it seems like the google prediction API _is_ the magic fairy dust we've been waiting for?

[+] hooande|14 years ago|reply

I think your comment is an example of the point of view that the author was talking about. The Google Prediction API won't automate the process of grouping comments or stories by content. Someone has to do the work of collecting the data and preparing the corpus, determining the best way to analyze it and prepping the inputs and outputs. There are levels of understanding and effort between having an idea involving machine learning and getting accurate predictions.

The google prediction api takes care of the code for algorithmic computation. While that's handy, it's only one step of a much larger process. The scale of that process is something that many people don't fully understanding about machine learning (yet).

[+] dougws|14 years ago|reply

There have been readily available machine learning toolkits available for decades--if you want to use a SVM in your project, you just need to grab an implementation and get going. The trick, as others have described, is getting your data into a usable format, choosing features, and experimenting to find the method that gives the best results. As far as I can tell, the Prediction API doesn't do much to make any of that easier.

[+] athst|14 years ago|reply

I think his point is that it's just a tool that makes it a little easier for startups to incorporate machine learning into their products - like he said, it may be appropriate for some types of problems, but not all. But I'm sure we'll start to see more tools like that become more widely used.

When AJAX first came out, not everyone knew how to do it - but now, everyone can drop in jQuery and do all sorts of complex things relatively easily.

[+] wccrawford|14 years ago|reply

I think 'machine learning' is so complex that people just don't feel like trying to explain it. That, or their business secrets are tied up in it, and they don't want to give away the golden goose.

[+] _delirium|14 years ago|reply

That's an explanation for some of the examples, but I think a lot of the times it's actually really simple, along the lines of, "we sift through some data and correlate it". The odd thing is, that often works, especially for user-facing perceptual stuff where there's a strong placebo effect, even more especially if you salt liberally with some hand-tuned biasing. Sort of how The Sims is able to use some super-simple algorithms to give the impression of interesting characters.

However, if you do need some real magic to be done, and your product really won't work without it, then things get trickier; bad statistics, or at least statistics not really used correctly, is really common in the innards of these kinds of products.

[+] hooande|14 years ago|reply

The problem with explaining machine learning is that it's not only complex, but it goes against the way that people normally think. Humans generally aren't built for making statistical calculations and going with highest probable outcomes. People are wired to understand narratives and compelling stories. To explain machine learning, you have to bridge a conceptual gap while also discussing a very technical idea.

Personally, I think that the benefits of a product should be so evident that people don't care if machine learning was used or not. The pitch shouldn't be "This aggregator is awesome because of machine learning", but "This aggregrator is awesome (oh and we used machine learning)"

[+] stonemetal|14 years ago|reply

I think 'machine learning' is so complex that people just don't feel like trying to explain it.

I think it is very simple(outside of the secret sauce part) to people who know it, so they don't feel the need to explain it. People who haven't sat down and thought about it see it as magical.

As an example(lifted from Programming Collaborative Intelligence by Segaran ) say you want to recommend movies to people. You have them rate movies. Then you take people in pairs and compare movies they have both rated to produce a distance between those two people. When you want to recommend a movie to Joe, you take the people who are closest to Joe and then find a movie that they rate highly that Joe has not rated, and suggest that to Joe. The secret sauce is in coming up with the distance function.

[+] j_baker|14 years ago|reply

I think this is usually a case of marketing having a bit too much say in product discussions. In the publishing industry, it seems like "My widget does X" doesn't get as strong a reaction from publishers as "My widget does X and it adapts to your readers".

The problem being (of course) that people forget how hard a problem machine learning can be.

[+] the_cat_kittles|14 years ago|reply

Its fine if people want to say that ML will take care of the "details" ...let them try to use ML right and they will see you need to spend a long time understanding how to do things right. Most of the time, you can't use linear regressions right out of the box, let alone SVM's.

[+] arasraj|14 years ago|reply

Agreed. The use of ML is highly dependent on the data. Having a something like the Prediction api is fine, but seems like the use-cases would be rigid.

[+] danw|14 years ago|reply

The cloud will solve it

34 comments