Fake News Challenge

[+] padseeker|9 years ago|reply

The problem is not "fake news". The problem is people are treating the news like a product in the capitalist paradigm.

If you don't like the way your local bakery makes their bread you go somewhere else. Now if the media reports on something you don't like someone else will tell you what you want to hear.

This issue happens on both sides of the political spectrum BUT it does seem to be a lot worse on the conservative side of things. All the "liberal" news sources like the NYT and The WashPo seem to be doing lots of reporting on Trump supporters, discussing their fears and hopes, concerns and motivations. Even more conservative sources like the WSJ, not including their editorial page, are trying to take a bigger view of the world.

On the other hand my Apple News feed seems to perpetually have a FOX News story. The one this morning was "Is Trump Bashing the new Celeb nude selfie?"

All I know is the people who seem prone to believing the stupidest stories also believe global warming is a hoax and Obama was not born in this country. A significant portion of the electorate has been drinking from the well and cannot be reached.

[+] enknamel|9 years ago|reply

> The problem is not "fake news". The problem is people are treating the news like a product in the capitalist paradigm.

I would say a huge part of the problem is that news is no longer news. It's all commentary. News organizations are not only incentivized to to sensationalize stories to make money but they are also ran by people who have extreme political views and want to essentially mind control the population. They don't want to report the facts of a story. They want to make you accept their interpretation of the facts.

A great example is how different news organizations cover the recent executive order on immigration. Depending on where you get your news you were told there was a Muslim ban, a ban from 7 Muslim majority countries, only accepting Christian refugees from the Middle East, 7 countries that harbor terrorists were banned, etc.

Wherever you fall on that issue there is a news org trying to force you into a certain viewpoint one way or the other.

[+] digler999|9 years ago|reply

> A significant portion of the electorate has been drinking from the well and cannot be reached.

How is that any different than people in the Bible Belt who believe the earth was made in 7 days ? Or elderly people who become vulnerable and fall prey to exploitation (because they believe phone scams, or sob stories from strangers, etc) ? (Edit: or young people who think they're going to grow up and be an astronaut or NBA star). There will always be a sizable proportion of the population who believes wacky stories.

I know you weren't implying this, but NYT and WaPo's hands aren't clean either. I really didn't appreciate how they pushed the "Russia HACKED the election" narrative. That's predatory on the same level: take advantage of the nebulous word "hacking" and misapply it to something that doesn't even rely on computers, do some hand-waiving, and voila, there's some "fake news" to dish up to the masses [which is ironic when your other stories are about the spread of "fake news"].

[+] tokenizer|9 years ago|reply

And yet, I'm sure none of your news, your sources, have been compromised.

I'm sure one of these anti science boogeymen you've created would think you are equally brainwashed and they are equally correct in their views...

[+] dalbasal|9 years ago|reply

I don't know if I'd class that as a capitalism-paradigm problem. If the press is free, choices exist and you can find news that tells you what you want to hear.

I think if we compare today's new to a decade ago's, the change is not more capitalism (ie, for-profit, corporate, quarterly-report focused...). The change is more choice, more information, more volume and the chaotic, many-to-many distribution systems of the social internet.

The clickbait imperative has got a quasi-capitalist single mindedness to it, but I think we'll be chasing down the wrong culprit by characterising the problem as a capitalist one.

In fact, I think it's best if we try not to jump to conclusions at all. We're hitting some problems with the way our democracies work and I haven't really heard a narrative yet that feels satisfying. Lets not jump to conclusions. We need to be right about it.

[+] crispyambulance|9 years ago|reply

Forgive me if I exercise a brand new buzzword but this project might better be classified as an exercise in "computational journalism" ?

A fake news detector isn't going to do anything by itself to solve the problem. Fake news consumers are going to guzzle the junk regardless of warning tags, scores, or nutritional labeling.

In the hands of a skilled journalist/activist/thought-leader whose abilities are "augmented" by this tool, however, it could really be a powerful antidote.

[+] nkkar|9 years ago|reply

> On the other hand my Apple News feed seems to perpetually have a FOX News story. The one this morning was "Is Trump Bashing the new Celeb nude selfie?"

> All I know is the people who seem prone to believing the stupidest stories

It goes both ways I think. Consider the possibility that there are more people that believe that other 'stupid' people believe these stories than there are 'stupid' people who actually believe these stories.

[+] RandyRanderson|9 years ago|reply

Noam called it manufacturing consent and it's been a thing for quite a while[0].

[0] https://en.wikipedia.org/wiki/Manufacturing_Consent

[+] imron|9 years ago|reply

> All I know is the people who seem prone to believing the stupidest stories also believe global warming is a hoax and Obama was not born in this country

Really, because not more than a few weeks ago I remember a whole bunch of people believing the "we can't verify, but we're going to press anyway" stories about Trump and golden showers in Russia, not to mention a whole bunch of other similar fake news regarding Trump and Trump supporters.

I think you (and many others) are selectively applying a bias to what is and isn't fake news and the type of people who believe it based on your own political preferences.

And therein lies the real problem.

[+] ConfuciusSay02|9 years ago|reply

The problem is that we ever believed the nonsense that there's some information source that is NOT a product, complete with biases and agendas.

The day the printing press was invented was the day print journalism was subverted to push agendas.

[+] lend000|9 years ago|reply

> The problem is people are treating the news like a product in the capitalist paradigm.

News is a product, but it should be noted that fraud is a crime in a free market. Not that it discounts the rest of what you're saying.

[+] dgrealy|9 years ago|reply

Classification is still very needed. I'm interested to see what comes out of this. Especially with regard to eliminating biases of a classification system.

Capitalism isn't going anywhere obviously and shouldn't. On the other end is state-run media, e.g. North Korea. We wouldn't want things to swing in that direction or even toward RT or many other examples.

I think we should be looking to replicate whatever magic is and has been in PBS, early BBC, ITN, and other earlier UK and French programming.

[+] unknown|9 years ago|reply

[deleted]

[+] microcolonel|9 years ago|reply

If you don't pay for the news, who does the news agency have a responsibility to? Certainly not you.

You can see this right now. The mainstream media falsifies or omits the truth on behalf of their backers or the organizations they fear: major political parties, subsidiaries of their parent companies, major investors, etc. I would argue that they do this because the public is not their customer, it's their target demographic.

[+] MichaelMoser123|9 years ago|reply

https://www.youtube.com/watch?v=6r_6hBIX71Y&t=111s I saw an interesting talk the other day (talks at google); Often there is a problem with misrepresentation of facts; (i think the general standard of journalism is quite low); amazing how a subtle editorial slant can twist things... Will Moy is speaking for an organization that does some fact checking and is giving some examples in the talk.

[+] spdustin|9 years ago|reply

Apple News does try to increase visibility of sources/topic you read a lot. If you swipe the story tile to the left, you can see more actions for that source.

[+] khana|9 years ago|reply

[deleted]

[+] Kenji|9 years ago|reply

Now if the media reports on something you don't like someone else will tell you what you want to hear.

I don't see the problem. I don't want to hear lies, therefore I create a demand for true news. I don't need a nanny facebook/google/state telling me what news I like, and neither does the rest of the population.

[+] JacobJans|9 years ago|reply

You'll notice so many posts here that deny the existence of fake news, or attempt to redefine it as "bias."

Frankly, I find it disturbing that so many well educated people aren't able to objectively think about an actually objective problem.

The fake news that started the concept of "fake news" is not a subjective problem. The problem is literal invention of facts not even related to reality, combined with the mass distribution of those invented facts. It is a problem of mass deception.

The classic example: Millions of people shared a post saying the Pope endorsed Donald Trump.

This is objectively false. It's not about liberal vs. conservative. It's not about whether it fits your worldview or not. It simply isn't true.

During the election there were so many literally false stories that got a very large amount of attention.

That is the problem. And yet, so many people seem to think it is a political issue.

It's a sad world we live in where even objective facts no longer matter.

I think that is why so many people are terrified of the direction things are headed.

[+] DanielBMarkham|9 years ago|reply

It makes me sad to see yet another tech team go down the road of "machines will help us filter the truth"

They will not, and the reason has to do with language. Ludwig Wittgenstein tackled this 100 years ago. The best that machines will do for you is to label something as true or not as if you had consumed the article and decided on your own

That is a completely different thing from identifying fake news or truth

There's some value here. There are also some hard stops. You'll find them :) Best of luck, guys!

[+] euyyn|9 years ago|reply

As an example, the fake news about the guy that found the warehouse full of fake ballot boxes for Clinton:

A Google Image Search of the picture in the fake article would have found the picture is much older, and isn't what the caption described.

If I'm tech-savvy and skeptical, I might have done that when consuming the article, and decided on my own it was fake. Machines can do that automatically for people that aren't tech-savvy, or that don't start reading it with a skeptical mindset.

[+] smsm42|9 years ago|reply

I don't think it has to do with language (unless by "language" you mean what I mean, in which case sorry for duplicating). I think it has to do with that a lot of what people call "truth" is a product of their opinions, feelings, hopes, desires, etc.

There are facts of course - like such and such event happened or not, such and such statistical data is showing this number of not. But it is never served raw, it is always garnished with opinion. E.g. the fact is that Trump signed certain executive order. But you can say it's "Muslim ban" or "temporary restriction on admittance of nationals from terror-ridden countries". Not the same.

And various "factchecking" sites are fully complicit in this and inject opinion as often as not - to the point exactly the same thing said by two different people can be deemed "mostly true" or "mostly false", moreover, exactly the same thing said by the same person can be "mostly true" and "false".

You can't hope to have robots make sense of it if we can't. You can, of course, build a neural network and train it, but it would just put "robot approved" stamp on biases of whoever trained it.

[+] astrodust|9 years ago|reply

How about a challenge to try and educate people better, to get them "immunized" against propaganda and fake news?

We need more cynical takes on things.

It's utterly appalling that on one side we have right-wing extremists yelling "FAKE NEWS" about CNN because CNN might get something wrong one time in ten, implying it's no different than Breitbart or Fox News which are wrong nine times in ten, and on the other we have left-leaning people beating up CNN for making mistakes and suggesting their trust is completely eroded.

News organizations offer a view. You should be prepared to vet everything you read from anyone. We need to give people a toolkit to help verify stories, to develop their bullshit instinct.

[+] kirykl|9 years ago|reply

An anecdote on language that happened to me earlier today: I wrote an email and said I would re-transmit something. I made a typo and wrote re-re-transmit.

"re-transmit" sounds neutral but "re-re-transmit" sounds passive aggressive. This would be very hard for a machine to interpret

[+] debt|9 years ago|reply

Wittgenstein wasn't around when a president used Twitter to both control the media and get elected.

Maybe news could rely on first principles approach. Quantify things like political stability, information availability, cultural problems, etc. and maybe apply a first principles approach and maybe an algorithm could tell use what the hell is going on at any given moment and maybe give us some insight into what may go on in the future as well.

I would assume intelligence agencies have something similar to the above. The problem is things happen in realtime and if a regime falls or a state fails or whatever, we have to adjust our models immediately.

Throwing out hands up and saying Wittgenstein solved it seems lazy but idk.

[+] lend000|9 years ago|reply

How about a 'subjectiveness' ranking instead, with notations for "unable to find link to a source of factual claims"? That's the best we could ever safely do, and we would likely find interesting results (WaPo might be labeled more subjective than Fox News, for example, even though both would generally have low numbers of notations for 'unable to find link to source of factual claim').

As I said in another post: The only thing you can possibly check for with any reliability is the validity of the base, source event, if there is one. Any higher derivation in the wrong hands (Facebook, for example) is systemic censorship. How can we safely do that? Recursive link tracking, perhaps, to a "primary source," which relies on the AI matching the factual claim to an appropriate source, either a video, image, or text that can be considered the legitimate source based on context (for example, an article about Apple's earnings could link to a primary source of their press release).

Regardless, the most dangerous form of "fake news" is by the selection/omission of stories, and that seems pretty impossible to quantify with today's technology (and of course the most prevalent on all of our 'real news' sources).

[+] eplanit|9 years ago|reply

> Regardless, the most dangerous form of "fake news" is by the selection/omission of stories

This. Very true, and woefully absent from most discussions regarding "fake news".

[+] VLM|9 years ago|reply

Your arguments are strong, however have you considered an algorithm based on the laziness and cheapness of humans? Consider that a lot of propaganda is heavily coordinated and not creative in the sense of being centrally created and issued.

An algorithm could watch, say, 50 sources and see if they simultaneously start using identical phrases, given access to a thesaurus aren't those odds a little unusual? This happened a couple times in the last election. You'd have a much simpler task of quote analysis of course, to exclude directly quoted content.

Then factor in something to detect press releases and video news releases where the verbiage of large sections is identical, this strategy is almost too easy.

Another factor is "large known propaganda sources" simultaneously releasing the story but small propaganda sources don't operate at all because fundamentally there is a size cutoff. The Wash Post will be in every propaganda operation because they are huge, but the "nowheresville register" certainly will not. Real news would of course be covered by both, so it should be very easy to detect organic self organizing behavior vs small scale conspiratorial behavior.

Now algo-proof propaganda could emerge much like handwritten letters to your congressional representative, before being inevitably ignored, are considered more impressive than mere form letters or signing a list. But it would detect at least some fake news based on fake news being part of a coordinated centralized mass produced campaign rather than trying to detect fake news based on its inherent fake-ness itself. Sorta like swatting bugs that bite you or try to, rather than identifying each individual bug then swatting only the mosquitos. Its very difficult to detect if something is a Walmart product, but its easy to detect if its mass produced, if the quality is very low, if it was made in China, if it contains chemicals banned in the USA for safety reasons, and in the long run works just as well.

[+] smsm42|9 years ago|reply

> How about a 'subjectiveness' ranking instead, with notations for "unable to find link to a source of factual claims"?

What would that mean? Let's say I write an article where I claim White House is controlled by reptiloids from planet Nibiru. Would it check that I have any links in my article at all? Or that the links actually prove my claims? Or that these proofs can be trusted (I may just link to my own website or one created by my friend)? I could insert a hundred links to a most respectable austronomy sites to prove that planet Nibiru exists, but how would you know whether these links actually prove it? I could refer to a lot of actual news content as the proof White House occupants can't actually be human. How you evaluate if these links actually support my claim?

[+] 23443463453|9 years ago|reply

[deleted]

[+] zeteo|9 years ago|reply

>It should be possible to build a prototype post-facto “truth labeling” system [...] Such a system would tentatively label a claim or story as true/false based on the stances taken by various news organizations on the topic, weighted by their credibility.

And of course nobody would love this technology more than the Chinese and Russian governments. Is a system aimed at quickly identifying obscure blogs that disagree with "high-credibility" sources supposed to help democracy?

[+] niftich|9 years ago|reply

Their headline is a bit hyperbolic (oh, the irony, given stance detection!) but the FAQ [1] covers what's really going on:

Q: Why did you choose the stance detection task rather than the task of labeling a claim, headline or story True/False, which seems to be what the fake news problem is all about?

A: (...) Our extensive discussions with journalists and fact checkers made it clear both how difficult "truth labeling" of claims really is, and how they'd rather have reliable semi-automated tool to help them in do their job better rather than fully-automated system whose performance will inevitably fall far short of 100% accuracy. (...)

Q: OK, but what does stance detection have to do with detecting fake news?

A: (...) From our discussions with real-life fact checkers, we realized that gathering the relevant background information about a claim or news story, including all sides of the issue, is a critical initial step in a human fact checker's job. One goal of the Fake News Challenge is to push the state-of-the-art in assisting human fact checkers, by helping them quickly gather the information they need to make their assessment.

In particular, a good Stance Detection solution would allow a human fact checker to enter a claim or headline and instantly retrieve the top articles that agree, disagree or discuss the claim/headline in question. They could then look at the arguments for and against the claim, and use their human judgment and reasoning skills to assess the validity of the claim in question. Such a tool would enable human fact checkers to be fast and effective. (...)

This means they're very much aware that 'solving the fake news issue' in a fully-automated way is a folly, so they are instead looking for tools to classify and retrieve corroborating or dissenting reports about the same topic. I feel this approach demonstrates an awareness of the problem, addresses some of the criticisms raised in this thread, and could lead to useful tools and datasets down the road.

[1] http://www.fakenewschallenge.org/#faq

[+] 3princip|9 years ago|reply

Mainstream media and their corporate backers have already lost this battle. This whole fake news attempt to remain relevant has backfired already, since it's much easier to prove that the peddlers of fake news are those who uncritically backed and still excuse the Iraq war, Libya catastrophe, attempted overthrow of the Syrian government just to name a few recent adventures.

They've been beaten at their own game by an opponent better at using modern/popular methods of disseminating information.

[+] draw_down|9 years ago|reply

Yes, that is about the size of it.

[+] hueving|9 years ago|reply

>Such a system would tentatively label a claim or story as true/false based on the stances taken by various news organizations on the topic, weighted by their credibility.

Oh right, because we can all agree on which news organizations are credible. /s

[+] rdtsc|9 years ago|reply

Alas we are not supposed to use that term anymore:

https://www.washingtonpost.com/lifestyle/style/its-time-to-r...

Also somehow CNN shows in the front page of result for "fake news" image search on Google. It wouldn't be totally wrong, depending on how the term is defined.

[+] billfor|9 years ago|reply

Probably because the washington post has gotten caught enough times writing fake news: http://www.forbes.com/sites/kalevleetaru/2017/01/01/fake-new...

[+] chippy|9 years ago|reply

I'd prefer to see computer learning to identify political propaganda and state the biases in the text as that would be truly unpartisan.

Does anyone know if this challenge has anything to do with Media Matters / Shareblue?

[+] HoppedUpMenace|9 years ago|reply

Not to sound cliche but I believe the bigger problem in this country is education, which apparently has failed many in this country on both sides of the political spectrum, if the past couple of years are any indication. I guarantee you fix people's problems with English, Math, and History in the classroom, you will have no need for machines or algorithms to filter junk that influences people online.

[+] msielski|9 years ago|reply

I am I missing something or is this challenge to produce a system which determines if the headline agrees with a body of text? Many so called "fake news" sites would pass this test with flying colors as their fake headlines are supported by equally fake stories. This seems more like a test against clickbait.

[+] antiffan|9 years ago|reply

It seems like Wikipedia would be familiar with a lot of the same challenges that come with identifying fake news articles. Anyone more familiar than I am with Wikipedia care to comment?

[+] necessity|9 years ago|reply

Wikipedia has a policy to only accept reliable secondary sources on controversial political articles. What comprises a "reliable source" is loosely defined, and what sources are considered reliable is ultimately decided by the editors in a somewhat per-article basis, with some global definitions which are not centrally listed anywhere but are enforced by senior editors and their admin friends. Because of the systematic bias that Wikipedia suffers from there is strong partisan bias in this selection, which resulted in a number of wiki clones with their own bias (e.g. Conservapedia), forums dedicated to show how bad this is (e.g. /r/wikiinaction) and has - among other things - been leading to a steady decline in the number of editors in the last couple years - everyone eventually gets bullied away from that place by a clique of editors + admins.

Quoting Wikipedia policy itself:

>If Wikipedia had been available around the sixth century B.C., it would have reported the view that the Earth is flat as a fact and without qualification. And it would have reported the views of Eratosthenes (who correctly determined the earth's circumference in 240BC) either as controversial, or a fringe view. Similarly if available in Galileo's time, it would have reported the view that the sun goes round the earth as a fact, and if Galileo had been a Vicipaedia editor, his view would have been rejected as 'originale investigationis'.

[+] fooker|9 years ago|reply

Their approach only works with reasonable people unfortunately..

[+] nl|9 years ago|reply

I've been following this and somewhat involved on the Slack channel since late last year.

A few comments based on what I've seen here so far:

"Fake News" might not be "the problem", but it is a problem. There is a real set of completely fake stories with no supporting evidence. Taking this away from politics for a minute, the archetype of this is "celebrities XX is moving to town YY"[1]

Comments around "this is just mainstream news attempting to justify its role" absolutely and completely have a point. There is plenty of room for new players to fix this problem (and hopefully be rewarded for it), on both sides of the political spectrum.

Comments around "The NYT does fake news - see their Iraq war advocacy" are somewhat misguided IMHO. The NYT has a number of problems, but no one should make the mistake of confusing opinion, analysis and forecasts for news. In the NYT Iraq War case, their analysis and forecasts were completely wrong and their opinion was misguided because of that. It is completely fair to make the point that their reputation suffered because of this, but we need to separate that from news reporting. This is something that news organizations don't do very well at the moment, IMHO.

As has been pointed out, this won't fix the whole problem. Most of the "Yes, but..." discussions here have been had when trying to come up with a reasonable but helpful task. I think stance detection is a reasonable subset of the problem.

[1] http://www.snopes.com/celebrity-moving-small-towns/

[+] lossolo|9 years ago|reply

This is not solvable by humans or machine learning. For example "leonardo dicaprio charity linked to 3 bil money laundering scandal", "raport says trump spent night with hookers in Russia", "angelina jolie divorcig brad pitt", "terrorist attack in Paris, 200 dead" etc. Some of those headlines seems like clickbait and false news but they are not, good luck with classifying those and similar.

[+] wyager|9 years ago|reply

To detect "fake news", all you have to do is solve all problems in the field of epistemology, especially the ill-founded ones.

[+] drewkarri|9 years ago|reply

‪#fakenews formula, sensation+celebrity+cause=fakenews‬ Use twitter to label all #fakenews (a code hashtag that isn't visible to viewers by changing the twitter foundation. The articles are then required to be vetted following rules from a panel that provides clear check lists to be considered fake news. Anytime news Is to be broadcast it must include standardized format, citations, rules for ad revenue and a list of all contributors to prevent a conflict of interest. Articles are not allowed to be written about companies who contribute to an outlet. There also needs to be a limit on how many ads can be displayed as well as how many a company can buy across platforms. Do not allow them to perform any "news broadcasting" unless followed and if it is violated, fine the culprit. The party identifying the "#codeword" would get paid the fines minus an administrative fee paid to the ethics committee as a commission. It's a system that ensures checks and balances in the journalism world. This would have a political component basically requiring places like twitter to follow these rules and guidelines. Otherwise they will not do it because this solution takes away a major source of their income and then they will actually have to compete for your subscriptions instead of just pumping out continuously overwhelming sensationalism

[+] Lord_Yoda|9 years ago|reply

I applaud this initiative. I am sure something good will come out of it, even though it does not completely resolve the problem. At least acknowledging and stepping in that direction is more than doing nothing.

That said, I think the other part of this problem is how the news is consumed. Our lazy minds are trained to read the shorter version if there is one available. Every day more people are liking the 140 character version of news. The more you read (from more people), the more you're reinforced in that belief. So while we train the models on news articles, we should also attempt to train on tweets. Just my two cents.

[+] beatpanda|9 years ago|reply

I don't think they're going about this the right way.

The first step to identifying 'fake news' (even in NYT or WSJ!) is to be able to pull out attributions from the source text.

This would give you some idea of where the information in an article actually came from. "Fake news" will tend to be either unattributed or misrepresent the thing it's quoting.

But a good first step toward better media literacy would just be giving people a representation of who actually said the the things in an article, and that seems much easier.

[+] brianbreslin|9 years ago|reply

I would like to see a test/quiz that shows you headlines or how they would be shared on facebook news feed and asks you to identify which you think are real/fake.

220 comments