top | item 9868352

Ask HN: How do I keep child porn out of my site?

252 points| VexedSiteOwner | 10 years ago

(Pardon this disturbing subject interfering with your Friday night rest and my (very necessary) throw away account.)

A year or two ago I started an image sharing site that's been modestly successful in terms of traffic (a blessing). No money or fame, but it's nice to see movement.

I try to filter user uploads to at least classify the sexual stuff (80% of it) as nsfw and feature good stuff on the homepage. This is excruciatingly time consuming with 6000 galleries posted per day, but I suffer through it as I can.

Sadly, I've noticed a huge amount of extremely taboo photos on the site. From rape and bdsm, which I can kind of tolerate, all the way to extreme child porn. The latter is extremely disturbing.

Amazingly, these people post this openly.

I never see the press talking about the nsfw side of Youtube, Tumblr, Reddit, Imgur, and others. How do those sites deal with this problem? What kind of content filtering systems do they use to keep the visible parts of the site clean? How many interns are flagging photos all day long? Is it wise to allow these pages to be indexed? What's my legal burden under Safe Harbor?

And.. more importantly.. how does the organic traffic in the nsfw sections play into the strategy of these huge user-generated content sites.

NB. I've attempted to build user profiles and a kind of self-moderation system, akin to how Reddit flagging works, but my users seem to be mostly interested in "one thing," and no community-focused members have emerged so far. I still have hope, but need a solution that I can use now.

186 comments

order
[+] VieElm|10 years ago|reply
If you're in the United States you should call the National Center for Missing & Exploited Children[1]. They already work with internet service providers to help identify unencrypted images depicting abuse transported over their network. They do this, I think, at an automated level. They should have the information you need. You should probably also call the FBI.

http://www.missingkids.com/Contact

[+] unclebucknasty|10 years ago|reply
But, should he/she contact legal counsel prior to contacting the FBI or anyone else? Personally, I think I would want to understand my potential culpability and other factors here.
[+] kanamekun|10 years ago|reply
You should report any child porn to the CyberTipline, run by NCMEC: https://report.cybertip.org/index.htm

NCMEC has protocols around how to report the images/video, and how to delete it on your end.

I would highly recommend against calling the FBI. You should work with NCMEC, as they have experience working with this stuff and their CyberTipline is one of the major ways that Congress has mandated that online service providers should report this stuff. Plus talking to law enforcement employed by the federal government has a host of risks associated with it:

https://en.wikipedia.org/wiki/Making_false_statements

[+] Elepsis|10 years ago|reply
Microsoft made an automated system (PhotoDNA) for detecting known child pornography images available to the public a few years ago and it's probably a good starting point: http://www.microsoft.com/en-us/PhotoDNA/

Hopefully this can help you.

(Disclosure: I work at Microsoft but not on PhotoDNA.)

[+] kyledrake|10 years ago|reply
PhotoDNA is the gold standard for this. I tried to get access to this via the NCMEC to use with Neocities, but the process was, frankly, very convoluted. I signed at least 10 forms and still didn't end up getting what I needed.

I'm happy that Microsoft is providing this as a free service. It's going to be a lot less painful for me to use it than to figure out how to run my own (or in this case, figure out how to even get it).

[+] eli|10 years ago|reply
You may be interested in this article:

The Laborers Who Keep Dick Pics and Beheadings Out of Your Facebook Feed http://www.wired.com/2014/10/content-moderation/

[+] klunger|10 years ago|reply
I came here to share this article.

Anyway, I think your best bet is to outsource this kind of work to the sort of company described in the article. It seems to be a regrettable necessity for any sizable user-generated content site.

Also, of course, please try and get in touch with the relevant authorities mentioned in other comments and assist their efforts in tracking users who try distributing that kind of... content.

[+] brudgers|10 years ago|reply
By default, any site that allows users to share content will devolve toward an attractive nuisance [0]. Like any security issue, passive measures are a Maginot Line awaiting blitzkrieg, even all the resources of a Google or Facebook aren't enough to automate all these things...they depend on communities to report issues [e.g. webmasters for Google]. And that's the only defense in depth: community.

"Everybody who signs up" isn't a community. There has to be some higher order interest...and what you're finding is that unfortunately the higher order interest of the community for your site is child porn.

There's no fixing DNS. If child porn is not what you want, your site is broken. Shut it down. The sort of users you want don't either don't care enough to keep out the bad or are overwhelmed by it's volume just as you are. They are or will be moving on. You have my sympathies.

Yeah it sucks but you have learned some things:

  1. Community is the hard part.
  2. Technology is necessary but not sufficient.
  3. You can build something that scales to the point where
     it becomes useful to a community.
Consider this version 0.1. You've gotten feedback and that says that the product (not the code) has failed by your definition of "fail" because it has not attracted the market segment you want. You have a platform from which to relaunch.

Good luck.

[0] https://en.wikipedia.org/wiki/Attractive_nuisance_doctrine

[+] orionblastar|10 years ago|reply
Until there is a Machine Learning algorithm that can detect CP, you'll have to have human beings flag it and then other human beings view it and remove it.

Someone brought it to my attention that Bing's cache is full of CP, after the offending websites are taken down, Bing keeps the images for a long time. The Rapidshare sites are also full of it and they password protect RAR files so admins cannot peak into it. It is a major problem that has no solution for it yet. People run Wordpress blogs and spambots leave comments that link to CP sites.

This has become a hot topic issue because that Jared guy from Subway had a manager of his foundation that was found with CP, and they raided Jared's computers and found more evidence.

My ethics and morals won't allow me to look at porn, but it is a big industry. There are all kinds of porn out there. The CP is the worst of it, and a lot of children are trafficked as sex slaves for it. They grow up with a criminal record and sex offender record, and by the time they expunge the record they are in their 40s and can't find work. I was contacted by a woman who was in that situation on Github during the Opal CoC debates. She is trying to get out of her situation by programming and cannot find work because of it.

This CP stuff ruins the lives of the children who suffer abuses for it. Once they grow up they have a hard time in life trying to make ends meet. Some have serious psychological problems that are hard to treat and deal with.

I remember that in some cases the website is found responsible for the content that users post on their websites. Laws in your nation may vary on that. If you find illegal content you should remove it, least you be found liable for it. Make sure to report the IP address of the poster to the government or a non government agency that handles it.

[+] VexedSiteOwner|10 years ago|reply
Are there any good ML algorithms for detecting porn at all? I tried to implement the standard "pink detector" with mixed results.
[+] njloof|10 years ago|reply
You'd think HN would spot a market opportunity like that and exploit it. Good programmers with unfair criminal records at below market rates?
[+] mirimir|10 years ago|reply
Your safest bet is running a system where you have no way of knowing what users upload. Depending on jurisdiction, reviewing and moderating content may increase your civil and/or criminal liability. There's typically a "safe harbor" for service providers. You just need to respond to LEA and DMCA takedown requests.

Edit: Other advantages: 1) you never risk viewing stuff that you can't unsee; and 2) you outsource content review to concerned users and other third parties.

[+] chmike|10 years ago|reply
Wouldn't this result in providing a service for pedophiles or sick peoples to propagate and amplify their voice ?
[+] thescriptkiddie|10 years ago|reply
This is the best answer so far. While some jurisdictions protect web admins from the actions of their users, others don't. So if you don't want to be extradited to some fascist state, you had better make sure that you can prove you have no ability to moderate or even know what content is being uploaded.
[+] frigg|10 years ago|reply
Doesn't there need to be a "report abuse" button on each image? Users could report it and only then would he use something like Microsoft's PhotoDNA (which an user mentioned above).
[+] tacostakohashi|10 years ago|reply
I used to work for an also-ran social network (20m users), and this was a big problem for them too, particularly when they found that they were a popular option for sharing CP. When I say a big problem, it's really an existential threat for any kind of user generated content sharing site.

Yes, you need a way of finding and flagging this stuff. Algorithms help, but people always need to be involved, and that's problematic. It can be hard to find people that want to be exposed to this material as their full-time job, and it's a liability headache. Even if some employees are ok with being exposed to it as part of their jobs, other employees might have a legitimate expectation of not having to be exposed at their workpace, and it's difficult to contain.

Yes, you will need to develop a relationship with law enforcement. They have a number of programs for submitting evidence, they're actually quite easy-to-use, and they are cooperative if you follow their rules. Even so, it's time consuming, and if you don't maintain a good relationship and comply fully, then you can become a target for enforcement.

You say you've become moderately successful in terms of traffic, but there's a big proportion of dubious content. Frankly, this means that certain people have noticed that your site is not as good at identifying, flagging, and reporting this content, so they're gravitating to you, having been kicked out of facebook, etc. That's fine in the short term, in the long term it's unsustainable from a business and legal perspective. Either you'll need to devote more resources to fighting this (instead of development, marketing, more interesting things), and find a way to attract more legitimate users, or you will become the next attractive target for legal issues.

This is not a simple problem that can be solved with mechanical turk, an algorithm, etc. It's a never-ending game of cat and mouse, walls and ladders, and a fundamental problem to be dealt with on any site that allows sharing. It's not just sexual stuff, there's also copyright - the music and movie industries are pretty keen about finding targets too.

It might be feasible to compete with facebook on product, or popularity with niche audiences, but competing with them on their ability to keep bad content off their site so that it's palatable for a wide audience is a lot harder. That's their core business, and they employ a lot of humans to make it work.

[+] subb|10 years ago|reply
This is just an idea, since I never built such filter, but you could automate a large part of filtering NSFW images. A quick search on google lead to this paper : http://cs229.stanford.edu/proj2005/HabisKrsmanovic-ExplicitI... Once you have that in place, I guess it's better to make it agressive and report false positive as NSFW.

Google "safe image search" has the additional help of searching the content of the page the image is used. You might be able to do the same, up to some limit, by checking the http referer header field to know where requests are coming from. You could scan the referer's page for some keywords. This might give you a better idea of the context where the image is used. Note that this might be tricky, since you probably don't want traffic coming out of your server to some child porn site.

That said, those are just some ideas. Youtube has a good community that flags videos, but also an army of reviewer that look at the flagged content.

http://mobile.nytimes.com/2010/07/19/technology/19screen.htm...

Another way to look at it would be to try to manually select some images as "front page worthy", instead of trying to filter the bad stuff.

[+] kanamekun|10 years ago|reply
Other posters are correct; you are obligated under US law to report child porn to the NCMEC CyberTipline.

The fine for not complying started off fairly low, but has been increased in subsequent legislation. In my experience though, NCMEC is mostly just interested in getting regular reports uploaded to their system. I met with them once, and they have a rough sense for how many reports should be sent over for a site of a certain activity/traffic level, and if the number of repots is zero... then they know you're not in compliance.

Their reporting interface is beyond awful though. Maybe they've improved it in recent years; when I last saw it, everything had to be uploaded and reported manually.

[+] mirimir|10 years ago|reply
Would it be enough to report content flagged by other users as CP? Is there a requirement to review content before submission? That's not something I'd ever want to do. I don't think that I'd want to pay someone else to do it either.
[+] BorisMelnik|10 years ago|reply
As a father this horrifies me. If this were my site, hobby or not I would spend a great deal of time implementing a system to report to the authorities. Just imagine if you did manage to help/save just one kid how good that would feel.

The PhotoDNA API looks absolutely brilliant. That is one reason why I love Bill Gates he always (or a lot of the time) gets involved with projects that truly help people.

Don't think of it as a problem, it is an opportunity to help a child or a parent that may not know a relative, teacher, or stranger is hurting their child.

Many times these crimes are committed by loved ones and the children are not abducted, they are lured / tricked by people near and dear to them.

[+] AnotherWebmster|10 years ago|reply
From the experience of running a similar site.

1. Monitor only most-viewed pages as 99% of images nobody will never see again, not the uploader nor the law agencies. The page must have some traffic to be discovered. Just make a page "top 200 today" and have a look from time to time.

2. "report nsfw" button does not work. The pedophiles do not report, the rest have no chance to hit the pedo-page.

3. Almost all the pedo-uploaders use Tor. Check how many non-pedophiles use Tor and consider to block IPs of exit nodes (or make Tor-uploaded images initially hidden until reviewed).

4. Own your IP address or setup a relationship with the owner of your site's IP. Law enforcers send email to them (or to you and CC: to them). If "whois $YOURIP" shows not your email but for example [email protected] or [email protected] then your server have a good chance to be disconnected hours before you would know why).

5. About the big players - at least Twitter has a lot of pedo-content (my service is screenshot-oriented and I have seen many screenshots of Twitter pages with CP). "how does the organic traffic in the nsfw sections play into the strategy of these huge user-generated content sites" - very good question, I would like to know as well.

6. About the advice from comments to check images in the porn context. It does not work. SFW-images are very clickable being surrounded by NSWF (think of a portrait of a celebrity in the context).

PS. My advices may look as semi-measures, but they provide the same level of quality as Machine Learning or Mechanical Turk solutions (which are not endloesungen as well) for lower price.

[+] daenz|10 years ago|reply
If there isn't already, maybe there should be some kind of public perceptual hash database (http://www.phash.org/) for this kind of stuff.
[+] dsl|10 years ago|reply
There is a government ran database of all known child exploitation images. For obvious reasons you need to show a reasonable need for access. Contact the National Center for Missing and Exploited Children.
[+] jagermo|10 years ago|reply
No need to apologize, this is not only an interesting topic and problem, but also a very good discussion.

Thanks for bringing that up, might it be ok if I use your question to build around it and see what it is like for non US-websites?

[+] nness|10 years ago|reply
There is already some interesting solutions posted here. If you wanted to try and tackle the issue with a stop-gap in the meantime you could add an image hashing step in the upload process to identify images that have already been flagged as NSFW or worse.

dHash is fairly simple to implement, and you might even be able to offload the hash checking at the database level. Comparing dHash's is just a matter of AND'ing the two hashes and counting the number of bits.

Obviously as the sample size increases so will the computation time. You could help the process by prioritising checks against new accounts, certain IP ranges (if you're seeing more or less content of a certain type from different countries or VPN providers) or if an account has a history of uploads in the past.

Its a horrible problem to have. Best of luck!

[+] hayksaakian|10 years ago|reply
At least with reddit, there's community moderation (read free employees) which enforces the contents of each section.

is there any incentive to participate in your community?

With moderators you feed the "power tripper".

With karma you feed people obsessed with points.

This is a bit complicated: what if you had some sort of capcha that required users to classify images as nsfw/sfw/illegal?

[+] Houshalter|10 years ago|reply
Reddit doesn't allow users to upload images. Just links. And they ban problematic domains.
[+] Bill_Dimm|10 years ago|reply
This is a bit complicated: what if you had some sort of capcha that required users to classify images as nsfw/sfw/illegal?

If the user normally encounters photos on the site by requesting them (e.g., by entering a search query or browsing a friend's album) rather than having random photos thrown in their face (like HotOrNot.com), I would think you could run into some very upset users (and possibly legal problems) if you are throwing random photos that might contain disturbing images in their faces. I mean, if you go to a website intending to browse photos your friend took of his boat and the site throws up some random child porn on your screen, you'd be pretty annoyed, right?

[+] VexedSiteOwner|10 years ago|reply
I'm not sure why it never caught on with my users.

I do support Twitter login which is fairly common, and have had a few hundred users sign up (out of like >10mil uniques), but I wouldn't say that they've exhibited overly engaged behavior after doing so. They're basically the same from a stats perspective from what I've observed.

I'm afraid to make user logins compulsory, especially considering the kinds of knuckleheads that are on my site.

How do I get from "eh, have fun as a guest" to "everyone's got an id" without destroying my stats?

[+] chmike|10 years ago|reply
Another possibility is to add a service to label/tag images.

See if if you could turn it into a game where people would gain karma points by properly labelling/tagging images. Make it peoples choice to participate in it instead of forcing them into it with a captcha.

Your image stack would gain significant value by being labelled and searchable by label.

See https://www.cs.cmu.edu/~biglou/ESP.pdf

[+] aurizon|10 years ago|reply
If you give people a way to send passworded links to cash subscribers, then none of his subscribers will rat him out. If you make all images open to view, you can then appeal to people to flag items for removal, or just autoremove them with - say - 2 or 3 flags, and trust to your nicer clients to police the site. If your clients all sign up with throw-aways, then load a huge block of images, all with their own password, then they can sit far away and sell passwords all day and never emerge to be caught. If they want to add more images = a new thow-away account every day if you like. Full accounatbility is the answer so all images can be tracked back to a real address and name. sadly, only good people deal with this, but it might be a way to thin the crowd. A secret untrackable photo site will also soon attract the police as they hunt for child porn sellers, so they will sooner or later come knocking on your door.

One way it to make contact with the police and get permission to list the names of the police agencies that are allowed to inspect the site via backdoor etc. Of course this might enrage some?? So some sort of middle ground might be to quietly approach the police for advice

[+] pessimizer|10 years ago|reply
Paid moderation. Putting up an open image sharing site with no security is akin to opening up a nightclub with nobody checking ids at the door and no security.

I can't tell you what to do to meet the minimum legal standard of covering your ass, but that's going to vary by jurisdiction and current whims over time. I can tell you, though, that by the time somebody has stumbled over a terrible image and reported it, they 1) will be horrified by your site and never use it again, and 2) the poster will have shared the url with everyone they wanted to and the image will already have been distributed as far as it was intended to be. If the number of terrible galleries is increasing, you're probably becoming well known within tiny circles as a convenient place to share the stuff.