Ask HN: How do I keep child porn out of my site?
252 points| VexedSiteOwner | 10 years ago
A year or two ago I started an image sharing site that's been modestly successful in terms of traffic (a blessing). No money or fame, but it's nice to see movement.
I try to filter user uploads to at least classify the sexual stuff (80% of it) as nsfw and feature good stuff on the homepage. This is excruciatingly time consuming with 6000 galleries posted per day, but I suffer through it as I can.
Sadly, I've noticed a huge amount of extremely taboo photos on the site. From rape and bdsm, which I can kind of tolerate, all the way to extreme child porn. The latter is extremely disturbing.
Amazingly, these people post this openly.
I never see the press talking about the nsfw side of Youtube, Tumblr, Reddit, Imgur, and others. How do those sites deal with this problem? What kind of content filtering systems do they use to keep the visible parts of the site clean? How many interns are flagging photos all day long? Is it wise to allow these pages to be indexed? What's my legal burden under Safe Harbor?
And.. more importantly.. how does the organic traffic in the nsfw sections play into the strategy of these huge user-generated content sites.
NB. I've attempted to build user profiles and a kind of self-moderation system, akin to how Reddit flagging works, but my users seem to be mostly interested in "one thing," and no community-focused members have emerged so far. I still have hope, but need a solution that I can use now.
[+] [-] VieElm|10 years ago|reply
http://www.missingkids.com/Contact
[+] [-] unclebucknasty|10 years ago|reply
[+] [-] kanamekun|10 years ago|reply
NCMEC has protocols around how to report the images/video, and how to delete it on your end.
I would highly recommend against calling the FBI. You should work with NCMEC, as they have experience working with this stuff and their CyberTipline is one of the major ways that Congress has mandated that online service providers should report this stuff. Plus talking to law enforcement employed by the federal government has a host of risks associated with it:
https://en.wikipedia.org/wiki/Making_false_statements
[+] [-] Elepsis|10 years ago|reply
Hopefully this can help you.
(Disclosure: I work at Microsoft but not on PhotoDNA.)
[+] [-] kyledrake|10 years ago|reply
I'm happy that Microsoft is providing this as a free service. It's going to be a lot less painful for me to use it than to figure out how to run my own (or in this case, figure out how to even get it).
[+] [-] eli|10 years ago|reply
https://www.law.cornell.edu/uscode/text/18/2258A
http://www.ncsl.org/research/telecommunications-and-informat...
[+] [-] eli|10 years ago|reply
The Laborers Who Keep Dick Pics and Beheadings Out of Your Facebook Feed http://www.wired.com/2014/10/content-moderation/
[+] [-] klunger|10 years ago|reply
Anyway, I think your best bet is to outsource this kind of work to the sort of company described in the article. It seems to be a regrettable necessity for any sizable user-generated content site.
Also, of course, please try and get in touch with the relevant authorities mentioned in other comments and assist their efforts in tracking users who try distributing that kind of... content.
[+] [-] brudgers|10 years ago|reply
"Everybody who signs up" isn't a community. There has to be some higher order interest...and what you're finding is that unfortunately the higher order interest of the community for your site is child porn.
There's no fixing DNS. If child porn is not what you want, your site is broken. Shut it down. The sort of users you want don't either don't care enough to keep out the bad or are overwhelmed by it's volume just as you are. They are or will be moving on. You have my sympathies.
Yeah it sucks but you have learned some things:
Consider this version 0.1. You've gotten feedback and that says that the product (not the code) has failed by your definition of "fail" because it has not attracted the market segment you want. You have a platform from which to relaunch.Good luck.
[0] https://en.wikipedia.org/wiki/Attractive_nuisance_doctrine
[+] [-] orionblastar|10 years ago|reply
Someone brought it to my attention that Bing's cache is full of CP, after the offending websites are taken down, Bing keeps the images for a long time. The Rapidshare sites are also full of it and they password protect RAR files so admins cannot peak into it. It is a major problem that has no solution for it yet. People run Wordpress blogs and spambots leave comments that link to CP sites.
This has become a hot topic issue because that Jared guy from Subway had a manager of his foundation that was found with CP, and they raided Jared's computers and found more evidence.
My ethics and morals won't allow me to look at porn, but it is a big industry. There are all kinds of porn out there. The CP is the worst of it, and a lot of children are trafficked as sex slaves for it. They grow up with a criminal record and sex offender record, and by the time they expunge the record they are in their 40s and can't find work. I was contacted by a woman who was in that situation on Github during the Opal CoC debates. She is trying to get out of her situation by programming and cannot find work because of it.
This CP stuff ruins the lives of the children who suffer abuses for it. Once they grow up they have a hard time in life trying to make ends meet. Some have serious psychological problems that are hard to treat and deal with.
I remember that in some cases the website is found responsible for the content that users post on their websites. Laws in your nation may vary on that. If you find illegal content you should remove it, least you be found liable for it. Make sure to report the IP address of the poster to the government or a non government agency that handles it.
[+] [-] VexedSiteOwner|10 years ago|reply
[+] [-] njloof|10 years ago|reply
[+] [-] mirimir|10 years ago|reply
Edit: Other advantages: 1) you never risk viewing stuff that you can't unsee; and 2) you outsource content review to concerned users and other third parties.
[+] [-] chx|10 years ago|reply
> They say “the lawyers” tell them they can’t edit out an obscenity or remove a rude or abusive post without bringing massive legal liability upon themselves [...] That’s not true, and hasn’t been true since 1996.
[+] [-] chmike|10 years ago|reply
[+] [-] thescriptkiddie|10 years ago|reply
[+] [-] frigg|10 years ago|reply
[+] [-] tacostakohashi|10 years ago|reply
Yes, you need a way of finding and flagging this stuff. Algorithms help, but people always need to be involved, and that's problematic. It can be hard to find people that want to be exposed to this material as their full-time job, and it's a liability headache. Even if some employees are ok with being exposed to it as part of their jobs, other employees might have a legitimate expectation of not having to be exposed at their workpace, and it's difficult to contain.
Yes, you will need to develop a relationship with law enforcement. They have a number of programs for submitting evidence, they're actually quite easy-to-use, and they are cooperative if you follow their rules. Even so, it's time consuming, and if you don't maintain a good relationship and comply fully, then you can become a target for enforcement.
You say you've become moderately successful in terms of traffic, but there's a big proportion of dubious content. Frankly, this means that certain people have noticed that your site is not as good at identifying, flagging, and reporting this content, so they're gravitating to you, having been kicked out of facebook, etc. That's fine in the short term, in the long term it's unsustainable from a business and legal perspective. Either you'll need to devote more resources to fighting this (instead of development, marketing, more interesting things), and find a way to attract more legitimate users, or you will become the next attractive target for legal issues.
This is not a simple problem that can be solved with mechanical turk, an algorithm, etc. It's a never-ending game of cat and mouse, walls and ladders, and a fundamental problem to be dealt with on any site that allows sharing. It's not just sexual stuff, there's also copyright - the music and movie industries are pretty keen about finding targets too.
It might be feasible to compete with facebook on product, or popularity with niche audiences, but competing with them on their ability to keep bad content off their site so that it's palatable for a wide audience is a lot harder. That's their core business, and they employ a lot of humans to make it work.
[+] [-] subb|10 years ago|reply
Google "safe image search" has the additional help of searching the content of the page the image is used. You might be able to do the same, up to some limit, by checking the http referer header field to know where requests are coming from. You could scan the referer's page for some keywords. This might give you a better idea of the context where the image is used. Note that this might be tricky, since you probably don't want traffic coming out of your server to some child porn site.
That said, those are just some ideas. Youtube has a good community that flags videos, but also an army of reviewer that look at the flagged content.
http://mobile.nytimes.com/2010/07/19/technology/19screen.htm...
Another way to look at it would be to try to manually select some images as "front page worthy", instead of trying to filter the bad stuff.
[+] [-] kanamekun|10 years ago|reply
The fine for not complying started off fairly low, but has been increased in subsequent legislation. In my experience though, NCMEC is mostly just interested in getting regular reports uploaded to their system. I met with them once, and they have a rough sense for how many reports should be sent over for a site of a certain activity/traffic level, and if the number of repots is zero... then they know you're not in compliance.
Their reporting interface is beyond awful though. Maybe they've improved it in recent years; when I last saw it, everything had to be uploaded and reported manually.
[+] [-] mirimir|10 years ago|reply
[+] [-] BorisMelnik|10 years ago|reply
The PhotoDNA API looks absolutely brilliant. That is one reason why I love Bill Gates he always (or a lot of the time) gets involved with projects that truly help people.
Don't think of it as a problem, it is an opportunity to help a child or a parent that may not know a relative, teacher, or stranger is hurting their child.
Many times these crimes are committed by loved ones and the children are not abducted, they are lured / tricked by people near and dear to them.
[+] [-] AnotherWebmster|10 years ago|reply
1. Monitor only most-viewed pages as 99% of images nobody will never see again, not the uploader nor the law agencies. The page must have some traffic to be discovered. Just make a page "top 200 today" and have a look from time to time.
2. "report nsfw" button does not work. The pedophiles do not report, the rest have no chance to hit the pedo-page.
3. Almost all the pedo-uploaders use Tor. Check how many non-pedophiles use Tor and consider to block IPs of exit nodes (or make Tor-uploaded images initially hidden until reviewed).
4. Own your IP address or setup a relationship with the owner of your site's IP. Law enforcers send email to them (or to you and CC: to them). If "whois $YOURIP" shows not your email but for example [email protected] or [email protected] then your server have a good chance to be disconnected hours before you would know why).
5. About the big players - at least Twitter has a lot of pedo-content (my service is screenshot-oriented and I have seen many screenshots of Twitter pages with CP). "how does the organic traffic in the nsfw sections play into the strategy of these huge user-generated content sites" - very good question, I would like to know as well.
6. About the advice from comments to check images in the porn context. It does not work. SFW-images are very clickable being surrounded by NSWF (think of a portrait of a celebrity in the context).
PS. My advices may look as semi-measures, but they provide the same level of quality as Machine Learning or Mechanical Turk solutions (which are not endloesungen as well) for lower price.
[+] [-] daenz|10 years ago|reply
[+] [-] rendx|10 years ago|reply
[+] [-] dsl|10 years ago|reply
[+] [-] jagermo|10 years ago|reply
Thanks for bringing that up, might it be ok if I use your question to build around it and see what it is like for non US-websites?
[+] [-] nness|10 years ago|reply
dHash is fairly simple to implement, and you might even be able to offload the hash checking at the database level. Comparing dHash's is just a matter of AND'ing the two hashes and counting the number of bits.
Obviously as the sample size increases so will the computation time. You could help the process by prioritising checks against new accounts, certain IP ranges (if you're seeing more or less content of a certain type from different countries or VPN providers) or if an account has a history of uploads in the past.
Its a horrible problem to have. Best of luck!
[+] [-] michaelmior|10 years ago|reply
[0] https://www.crowdflower.com/type-content-moderation
[+] [-] hayksaakian|10 years ago|reply
is there any incentive to participate in your community?
With moderators you feed the "power tripper".
With karma you feed people obsessed with points.
This is a bit complicated: what if you had some sort of capcha that required users to classify images as nsfw/sfw/illegal?
[+] [-] Houshalter|10 years ago|reply
[+] [-] Bill_Dimm|10 years ago|reply
If the user normally encounters photos on the site by requesting them (e.g., by entering a search query or browsing a friend's album) rather than having random photos thrown in their face (like HotOrNot.com), I would think you could run into some very upset users (and possibly legal problems) if you are throwing random photos that might contain disturbing images in their faces. I mean, if you go to a website intending to browse photos your friend took of his boat and the site throws up some random child porn on your screen, you'd be pretty annoyed, right?
[+] [-] VexedSiteOwner|10 years ago|reply
I do support Twitter login which is fairly common, and have had a few hundred users sign up (out of like >10mil uniques), but I wouldn't say that they've exhibited overly engaged behavior after doing so. They're basically the same from a stats perspective from what I've observed.
I'm afraid to make user logins compulsory, especially considering the kinds of knuckleheads that are on my site.
How do I get from "eh, have fun as a guest" to "everyone's got an id" without destroying my stats?
[+] [-] chmike|10 years ago|reply
See if if you could turn it into a game where people would gain karma points by properly labelling/tagging images. Make it peoples choice to participate in it instead of forcing them into it with a captcha.
Your image stack would gain significant value by being labelled and searchable by label.
See https://www.cs.cmu.edu/~biglou/ESP.pdf
[+] [-] aurizon|10 years ago|reply
One way it to make contact with the police and get permission to list the names of the police agencies that are allowed to inspect the site via backdoor etc. Of course this might enrage some?? So some sort of middle ground might be to quietly approach the police for advice
[+] [-] pessimizer|10 years ago|reply
I can't tell you what to do to meet the minimum legal standard of covering your ass, but that's going to vary by jurisdiction and current whims over time. I can tell you, though, that by the time somebody has stumbled over a terrible image and reported it, they 1) will be horrified by your site and never use it again, and 2) the poster will have shared the url with everyone they wanted to and the image will already have been distributed as far as it was intended to be. If the number of terrible galleries is increasing, you're probably becoming well known within tiny circles as a convenient place to share the stuff.