top | item 11436501

Facebook’s automatic alt-text for images

196 points| boyter | 10 years ago |facebook.com | reply

77 comments

order
[+] hacker_9|10 years ago|reply
Accurate, automatic descriptions of snapshots of peoples lives?? This is surely sending shock waves down the data mining community. Additionally Facebook are saying this was built as an aid for blind people, but this is surely just a cover for being able to take targeted ads to the next level.
[+] ma2rten|10 years ago|reply
This is surely sending shock waves down the data mining community.

Not really. From a technical point of view there is nothing impressive about this, except that they spend the time and money to collect all the training data. Also that it says "this image may contain:" tells you something about how accurate this actually is.

[+] coroutines|10 years ago|reply
I kind of want to take this at face-value and say how great it is that they're making visually-impaired viewers a priority.
[+] hanspeter|10 years ago|reply
If that was the case wouldn't they choose to simply not announce anything?

I'm not ruling out that they might use it for targeting eventually, but if this was solely done as a cover it would be the equivalent of a terrorist entering an airport shouting out "My suitcase is just really heavy, I do not have a bomb in it at all!"

[+] zappo2938|10 years ago|reply
Could be worse. Google has me training AI to do the same thing every time I choose all the pictures with train locomotives in a reCaptcha.
[+] leoalves|10 years ago|reply
Google, Facebook, twitter .... Are all doing this in their backend. If you are worried about it. Don't use their services.

The problem I see here is that now they are giving this data to spammers.

[+] DonHopkins|10 years ago|reply
Well if 98% of your images are pictures of cats, you don't really have to solve the hard general problem.
[+] iaw|10 years ago|reply
This has been possible for, at a minimum, three years. There's an effort gap between building an ad targeting version and building a blind enabling version.

The only new thing here is that Facebook released it to the public.

[+] speedyapoc|10 years ago|reply
I always find Facebook's example feed to be funny since it's a completely unrealistic depiction of what their site actually is for most users.

Just quickly looking at the top few posts in my feed, I see someone celebrating their two year friendship with someone I don't know, one person sharing a link to a new airplane, four people sharing videos, one person liking a sponsored video, and finally one person updating their profile picture.

I wish I could see actual updates from people instead of being kept abreast as to what piece of third party content they've liked at some point in time, what third party content they're sharing, etc.

[+] manigandham|10 years ago|reply
How is this unrealistic? This is exactly the kind of stuff my feed shows. The people you follow probably aren't posting any updates (that you like to see)... and sharing content is an update, that's what that person decided to post.

Your feed is what you make it.

[+] Toenex|10 years ago|reply
> I wish I could see actual updates from people instead of being kept abreast as to what piece of third party content they've liked at some point in time, what third party content they're sharing, etc.

This nicely captures the problem. Facebook (and probably most social media systems - I'm looking at you LinkedIn) are primarily interested in keeping you up to date with Facebook via the medium that is your human relationships. So Facebook wants you to know what your friends did on Facebook today so that you might do that same Facebook thing. This increases Facebook interactions which in turn become more information to propagate to others on Facebook. In the limit there is no need for Facebook because all anyone is ever doing is Facebook.

[+] _qbjt|10 years ago|reply
The first thing I thought was, "I wonder how this will work with memes."
[+] seanalltogether|10 years ago|reply
Interesting, when I go to facebook (which admittedly is not very often) I only see updates and photos posted by friends in my feed.
[+] ospfer|10 years ago|reply
Last July, I lead a project in support of a federal agency to analyze current business processes and identify weaknesses in the agency Section 508 office. My work focused primarily on externally accessible internet sites and one of the most common 508 violations that we encountered was the lack of ALT-text on images. This agency utilized a number of automated scanning tools and processes, but lacked any ability to efficiently remediate these errors. While we never talked more than from a conceptual standpoint, a coworker and I discussed something along the lines of what Facebook has accomplished here through the use of Google's Neural Networks. Very cool to see this advancement come to life.
[+] dr_zoidberg|10 years ago|reply
This kind of systems/algorithims also allows to asign a certain semantic component to images (with a grain of salt, of course), which might enable further developments that weren't considered posible yet.

Sadly, it also brings another complete set of cases to the oh-so-anoying "but Facebook/Google/Twitter/Amazon does it!" clichés that we'll now have to deal with...

[+] verusfossa|10 years ago|reply
I'm waiting for the day this is just a library you pass an image to and it returns an array. No, not a SaaS. Then on my own pump.io, diaspora, redMatrix etc. it just works. My data, my images, my network. I'm not against the tech at all though. Neat
[+] ma2rten|10 years ago|reply
There are already pre-trained networks out there. TensorFlow comes with an example command line tool that you can pass any image and it will tell you what is in the image.

The classes that it can detect are from ImageNet, so that might be limiting.

[+] sidcool|10 years ago|reply
With all due respect to conspiracies, this is a cool feature.
[+] bla2|10 years ago|reply
Warning, that page has an auto-play video with sound.
[+] shogun21|10 years ago|reply
This might be asking for too much, but why not use more of the image meta-data than these computer vision techniques?

If I were blind, I really wouldn't care that this is an image of "two people, smiling". Facebook has facial recognition, tagging, and locations. It would be much more valuable to me to say "Peter and Laura smiling at Channel Islands State Park."

[+] chippy|10 years ago|reply
I'd like to compare Facebook's image tagging with Google Cloud Vision API https://cloud.google.com/vision/ I think it would be interesting to see which one is more accurate or verbose.
[+] TazeTSchnitzel|10 years ago|reply
I suppose it'll be like YouTube's automatic subtitles for audio. It'll do a bad, but passable, job: at least the blind and visually impaired have some idea of what the image contains.
[+] whatever_dude|10 years ago|reply
"Cat. Cats. Cat. Baby. Dog. Baby. Cat. Baby with dog."
[+] visarga|10 years ago|reply
Bag. Duck face. Nails. Duck face.
[+] SimeVidas|10 years ago|reply
Glad to see Mark explaining what a screen readers is to millions of people :-D
[+] buro9|10 years ago|reply
This is really what I wanted to use the Google Image API to do.

But it's way too expensive.

All I wanted was keywords for alt-text, dimensions for placeholder, and the dominant colour for placeholder background.

https://cloud.google.com/vision/

The price for that would be $7.50 per 1,000 images for the first million images.

I have some 60,000 images on the site I run and don't happen to have $450 in loose change laying around (the whole site costs less than that to run each month).

I guess I don't care about alt tags that much.

[+] fudged71|10 years ago|reply
Re-upload them to facebook then scrape the generated descriptions ;)
[+] dflock|10 years ago|reply
Do you have $7.50 per month? Just do a thousand a month.
[+] cphoover|10 years ago|reply
Is this tool open source? would be a great contribution to the accessibility community.
[+] tlrobinson|10 years ago|reply
Very cool.

Obvious next step: build this into the OS/browser/screen reader.

[+] odinduty|10 years ago|reply
Ah, the Twitter app for Android (beta version) recently added a feature that allows you to add a description to pictures you upload for impaired people.