Facebook’s automatic alt-text for images

[+] hacker_9|10 years ago|reply

Accurate, automatic descriptions of snapshots of peoples lives?? This is surely sending shock waves down the data mining community. Additionally Facebook are saying this was built as an aid for blind people, but this is surely just a cover for being able to take targeted ads to the next level.

[+] ma2rten|10 years ago|reply

This is surely sending shock waves down the data mining community.

Not really. From a technical point of view there is nothing impressive about this, except that they spend the time and money to collect all the training data. Also that it says "this image may contain:" tells you something about how accurate this actually is.

[+] coroutines|10 years ago|reply

I kind of want to take this at face-value and say how great it is that they're making visually-impaired viewers a priority.

[+] hanspeter|10 years ago|reply

If that was the case wouldn't they choose to simply not announce anything?

I'm not ruling out that they might use it for targeting eventually, but if this was solely done as a cover it would be the equivalent of a terrorist entering an airport shouting out "My suitcase is just really heavy, I do not have a bomb in it at all!"

[+] zappo2938|10 years ago|reply

Could be worse. Google has me training AI to do the same thing every time I choose all the pictures with train locomotives in a reCaptcha.

[+] leoalves|10 years ago|reply

Google, Facebook, twitter .... Are all doing this in their backend. If you are worried about it. Don't use their services.

The problem I see here is that now they are giving this data to spammers.

[+] DonHopkins|10 years ago|reply

Well if 98% of your images are pictures of cats, you don't really have to solve the hard general problem.

[+] iaw|10 years ago|reply

This has been possible for, at a minimum, three years. There's an effort gap between building an ad targeting version and building a blind enabling version.

The only new thing here is that Facebook released it to the public.

[+] speedyapoc|10 years ago|reply

I always find Facebook's example feed to be funny since it's a completely unrealistic depiction of what their site actually is for most users.

Just quickly looking at the top few posts in my feed, I see someone celebrating their two year friendship with someone I don't know, one person sharing a link to a new airplane, four people sharing videos, one person liking a sponsored video, and finally one person updating their profile picture.

I wish I could see actual updates from people instead of being kept abreast as to what piece of third party content they've liked at some point in time, what third party content they're sharing, etc.

[+] manigandham|10 years ago|reply

How is this unrealistic? This is exactly the kind of stuff my feed shows. The people you follow probably aren't posting any updates (that you like to see)... and sharing content is an update, that's what that person decided to post.

Your feed is what you make it.

[+] Toenex|10 years ago|reply

> I wish I could see actual updates from people instead of being kept abreast as to what piece of third party content they've liked at some point in time, what third party content they're sharing, etc.

This nicely captures the problem. Facebook (and probably most social media systems - I'm looking at you LinkedIn) are primarily interested in keeping you up to date with Facebook via the medium that is your human relationships. So Facebook wants you to know what your friends did on Facebook today so that you might do that same Facebook thing. This increases Facebook interactions which in turn become more information to propagate to others on Facebook. In the limit there is no need for Facebook because all anyone is ever doing is Facebook.

[+] _qbjt|10 years ago|reply

The first thing I thought was, "I wonder how this will work with memes."

[+] seanalltogether|10 years ago|reply

Interesting, when I go to facebook (which admittedly is not very often) I only see updates and photos posted by friends in my feed.

[+] ospfer|10 years ago|reply

Last July, I lead a project in support of a federal agency to analyze current business processes and identify weaknesses in the agency Section 508 office. My work focused primarily on externally accessible internet sites and one of the most common 508 violations that we encountered was the lack of ALT-text on images. This agency utilized a number of automated scanning tools and processes, but lacked any ability to efficiently remediate these errors. While we never talked more than from a conceptual standpoint, a coworker and I discussed something along the lines of what Facebook has accomplished here through the use of Google's Neural Networks. Very cool to see this advancement come to life.

[+] dr_zoidberg|10 years ago|reply

This kind of systems/algorithims also allows to asign a certain semantic component to images (with a grain of salt, of course), which might enable further developments that weren't considered posible yet.

Sadly, it also brings another complete set of cases to the oh-so-anoying "but Facebook/Google/Twitter/Amazon does it!" clichés that we'll now have to deal with...

[+] verusfossa|10 years ago|reply

I'm waiting for the day this is just a library you pass an image to and it returns an array. No, not a SaaS. Then on my own pump.io, diaspora, redMatrix etc. it just works. My data, my images, my network. I'm not against the tech at all though. Neat

[+] ma2rten|10 years ago|reply

There are already pre-trained networks out there. TensorFlow comes with an example command line tool that you can pass any image and it will tell you what is in the image.

The classes that it can detect are from ImageNet, so that might be limiting.

[+] skrjon|10 years ago|reply

There is more information on what Facebook is doing on the research site.

https://research.facebook.com/blog/how-blind-people-interact...

Including a link to the publication that was written on the technology here.

https://research.facebook.com/publications/how-blind-people-...

I think its exciting and an honest attempt to make peoples lives better.

[+] sidcool|10 years ago|reply

With all due respect to conspiracies, this is a cool feature.

[+] bla2|10 years ago|reply

Warning, that page has an auto-play video with sound.

[+] shogun21|10 years ago|reply

This might be asking for too much, but why not use more of the image meta-data than these computer vision techniques?

If I were blind, I really wouldn't care that this is an image of "two people, smiling". Facebook has facial recognition, tagging, and locations. It would be much more valuable to me to say "Peter and Laura smiling at Channel Islands State Park."

[+] visarga|10 years ago|reply

They are just tagging, but there are ML solutions for describing in natural language.

Demo: http://googleresearch.blogspot.ro/2014/11/a-picture-is-worth...

[+] chippy|10 years ago|reply

I'd like to compare Facebook's image tagging with Google Cloud Vision API https://cloud.google.com/vision/ I think it would be interesting to see which one is more accurate or verbose.

[+] TazeTSchnitzel|10 years ago|reply

I suppose it'll be like YouTube's automatic subtitles for audio. It'll do a bad, but passable, job: at least the blind and visually impaired have some idea of what the image contains.

[+] whatever_dude|10 years ago|reply

"Cat. Cats. Cat. Baby. Dog. Baby. Cat. Baby with dog."

[+] visarga|10 years ago|reply

Bag. Duck face. Nails. Duck face.

[+] SimeVidas|10 years ago|reply

Glad to see Mark explaining what a screen readers is to millions of people :-D

[+] buro9|10 years ago|reply

This is really what I wanted to use the Google Image API to do.

But it's way too expensive.

All I wanted was keywords for alt-text, dimensions for placeholder, and the dominant colour for placeholder background.

https://cloud.google.com/vision/

The price for that would be $7.50 per 1,000 images for the first million images.

I have some 60,000 images on the site I run and don't happen to have $450 in loose change laying around (the whole site costs less than that to run each month).

I guess I don't care about alt tags that much.

[+] fudged71|10 years ago|reply

Re-upload them to facebook then scrape the generated descriptions ;)

[+] dflock|10 years ago|reply

Do you have $7.50 per month? Just do a thousand a month.

[+] cphoover|10 years ago|reply

Is this tool open source? would be a great contribution to the accessibility community.

[+] tlrobinson|10 years ago|reply

Very cool.

Obvious next step: build this into the OS/browser/screen reader.

[+] Spearchucker|10 years ago|reply

I do wonder when somebody uses this technology to troll.

[+] Crespyl|10 years ago|reply

The techniques already exist[0] for some sophisticated trolling, though it may be hard to achieve in practice without direct access to the classifier being used.

[0] http://karpathy.github.io/2015/03/30/breaking-convnets/

[+] nickysielicki|10 years ago|reply

that's honestly very creepy

[+] odinduty|10 years ago|reply

Ah, the Twitter app for Android (beta version) recently added a feature that allows you to add a description to pictures you upload for impaired people.

[+] unknown|10 years ago|reply

[deleted]

[+] d33|10 years ago|reply

Obligatory: https://www.youtube.com/watch?v=_wXHR-lad-Q

It's impressive that people don't really connect the dots and see that as a huge threat to their freedom.

77 comments