Grover – A State-of-the-Art Defense Against Neural Fake News

[+] YeGoblynQueenne|6 years ago|reply

>> Our study presents a surprising result: the best way to detect neural fake news is to use a model that is also a generator.

This could indicate the model is actually representing a grammar.

In abstract terms a grammar is at the same time a recogniser and a generator. The fact that in practice, encoded grammars are only used in one or the other mode, is only an artifact of the implementation. And as I've said here before, there is at least the example of Definite Clause Grammars in Prolog, that are both generators and recognisers out of the box [1].

Anyway, a grammar capable of generating fake news text would also be capable of recognising fake news text.

________________

[1] https://en.wikipedia.org/wiki/Definite_clause_grammar#Exampl...

[+] backpropaganda|6 years ago|reply

It indicates that the model is learning a good likelihood function, and that it has still not trained to convergence which makes the samples have lower likelihood scores than real text. That's all that's needed to make it a good recognizer.

[+] evolutionas|6 years ago|reply

If this is the state-of-art then I have some bad news. I recently finished writing my thesis and copy-pasted several paragraphs that did not included any mathematical formulas. Most of them were classified as written by a machine (8 out of 12). It might be due that I am not a native speaker. It seems that the model struggled the most where there are mathematical details discussed but on the topics where I wrote more freely as conclusions and some analysis, it classified as written by human.

[+] schmmd|6 years ago|reply

Thanks for sharing! If your content (such as a thesis) is out of domain (not a news article) all bets are off on how the model will perform.

[+] tasubotadas|6 years ago|reply

Same here - articles that I've written (not a native speaker) before were classified as written by a machine :-D.

Edit: Nevertheless, it is still an amazing piece of work. The quality of the generated text is astounding.

[+] etaioinshrdlu|6 years ago|reply

To state the obvious, detecting anything generated with a neural net by using another neural net is an arms race because the generator was trained explicitly to fool a detector...

If you get your detector working well enough today, next week is another story.

[+] gwern|6 years ago|reply

Not in this case, as Grover/GPT-2 are pure max likelihood/predictive loss training; they don't use anything like use one Grover to finetune another. So no adversarial dynamics... yet.

[+] rat9988|6 years ago|reply

There is a fallacy in your reasoning as you discount one side's effort and focus on the others.

>If you get your detector working well enough today, next week is another story.

Yes, you need to do some work to be able to catch up. And you have no guarantee you'll succeed.

> the generator was trained explicitly to fool a detector

Here again you have some work to do to fool the detector, and you have no guarantee of success either each time the detector gets smarter.

Moreover, the generator wasn't trained explicitly to fool a detector but to fool humans. On the other side, the detector was trained explicitly to unmask the generator.

[+] QuadrupleA|6 years ago|reply

Really impressive - you can tell after a paragraph or two that things don't quite make sense, ideas aren't really carried through, but on a handful-of-sentences level I have trouble distinguishing it from a real author. Not to mention it creates content much more quickly.

Generative neural networks these days are both fascinating and depressing - feels like we're finally tapping into how subsets of human thinking & creativity work. But that knocks us off our pedestal, and threatens to make even the creative tasks we thought were strictly a human specialty irrelevant; I know we're a long way off from generalized AI, but we seem to be making rapid progress, and I'm not sure society's mature enough or ready for it. Especially if the cutting edge tools are in the service of AdTech and such, endlessly optimizing how to absorb everybody's spare attention.

Perhaps there's some bright future where we all just relax and computers and robots take care of everything for us, but can't help feeling like some part of the human spirit is dying.

[+] erichocean|6 years ago|reply

> Especially if the cutting edge tools are in the service of AdTech and such, endlessly optimizing how to absorb everybody's spare attention.

The simple and obvious solution is a universal ban on advertising, in all forms, outright. No more p-hacking humans, period.

[+] im3w1l|6 years ago|reply

> you can tell after a paragraph or two that things don't quite make sense, ideas aren't really carried through, but on a handful-of-sentences level I have trouble distinguishing it from a real author. Not to mention it creates content much more quickly.

So it could wreak havoc on Twitter?

[+] cortesoft|6 years ago|reply

Why would machines being able to be creative hurt the human spirit? Creating is not a job that can be taken; we don't create art because we are filling some market need, we create art because we like to create art.

[+] antisemiotic|6 years ago|reply

Machines battling to death in hopes of evolving to become more human somewhat reminds me of Nier: Automata, just without the cute robot girls and boys. Instead, we've got adversarial networks generating ads. As always, science fiction sets overly romantic standards.

[+] jasonhansel|6 years ago|reply

(1) Generate 100 fake news articles

(2) Remove the 92 articles that Grover detects (with 92% accuracy rate)

(3) Choose the best of the remaining 8 articles

[+] hippiecow|6 years ago|reply

"Note that, even if Grover fails to detect a given piece as fake, our findings suggest that releasing many such articles taken together would be relatively easy to spot. Thus, if a source of Neural Fake News disseminates a large number of articles, Grover will be increasingly capable of spotting these articles as malicious."

[+] steve19|6 years ago|reply

The generated news also needs to look like it was written by a human and be readable by the average human.

So you need a step that filters out obvious junk, maybe by eyeballing or with another net.

It is possible that the only way to beat grover is rubbish such as contorted, but valid, grammer or overuse of synonyms.

[+] theblackcat1002|6 years ago|reply

I tried feeding a random CNN news I found [1], with the first few paragraph as follows:

"London (CNN)The interminable US process for picking a President has its faults but as a democratic exercise it outshines the closed-door system being used in Britain to choose a new prime minister.

Theresa May resigned as Conservative leader last week after an ill-starred premiership and triggered a party leadership race that will select the next resident of 10 Downing Street by late July."

as I added more paragraphs it started to shift from "quite sure this was written by a machine." to "quite sure this was written by a human.".

Does this mean the model has learned to take advantage of short text being generated by machine while longer are human written text. A similar effect can be replicate from this article as well[2]

[1] https://edition.cnn.com/2019/06/09/politics/uk-prime-ministe...

[2] https://nypost.com/2019/06/08/mexican-troop-deployment-may-h...

[+] Zak|6 years ago|reply

paulgraham.com

Why Java is a great programming language

June 6, 2019 - Paul Graham

Java, by its very nature, is a programming language. If you write a script in a language like CSS, you are coding in Java, as well. But in Java you can use a GUI to interact with various components of your code. You can edit and re-enable widgets or restrict the content of a template. And perhaps most importantly, Java lets you interact with native applications, like the iPhone, in a non-traditional way.

Given that functionality, one would assume that we are better off in the iOS or Windows universe. But Java does work in both environments, albeit at vastly different speeds. While iPhone development can be done just as smoothly in Java, Windows isn’t the best place to be right now.

(truncated)

[+] didibus|6 years ago|reply

Is it me, or all these efforts seem like they are by definition always going to be insufficient.

It's just a matter of time for algorithm written text to actually be similar to some human written text. At that point, there is no longer a way to distinguish them, no matter how smart. If the texts are actually written the same way, there's no secret pattern that can be picked up on, and the fight is over.

I think to combat fake news, especially algorithmic one, we'll need to innovate around authentication mechanism that can effectively prove who you are and how much effort you put into writing something. Digital signatures or things like that.

[+] stri8ed|6 years ago|reply

As these generators get better, the amount of false positives will increase, eventually rendering the classifier as useless.

A sybil-resistant method of authentication, where each entity is tied to a single human, seems to be the only way. I suppose you could still pay people to publish under their credentials, or steal private keys, but this comes at a cost, and such accounts can be blacklisted.

Also, I don't think its correct to equate machine written news with fake news. It need not be the case. Eventually I think the only way to deter fake news is authentication + holding people accountable.

[+] jpindar|6 years ago|reply

>It's just a matter of time for algorithm written text to actually be similar to some human written text.

https://xkcd.com/810/

[+] nabla9|6 years ago|reply

This is very amusing and entertaining

domain: whitehouse.gov

author: Mike Pence

headline: Indecency of Naked Animals act

result: (best parts)

... Last week, our country’s capital took a major step to curb this despicable animal abuse. On Monday, the House voted to end the, “The Indecency of Naked Animals Act.” The legislation, which was first passed in 1894, protects protected animals under the title “Afternoon of a Faggot.” The act states that, “No person in this Congress shall engage in any indecent, obscene, or repulsive conduct or dance with, direct or indirect, to, or on any dog, cat, or donkey, or any other animal thereof, or who is dressed and styled in any manner and dressed or styled with such exhibitions of nudity and proclivities that seek to serve as a prop to any entertainment or sport that if indulged in the same misconduct other than the now-remedied sex offense, would be deemed indecent, obscene, or repulsive.” The House voted to kill the act because it was unnecessary. The federal government doesn’t have the power to regulate every kind of cultural expression. That should be up to states.

... The act is an insult to federalist conservatives and originalists. It spits in the face of originalism, a fundamental tenet of the conservative movement.

[+] api|6 years ago|reply

Hey that sounds like a solid piece of legislation to me. I for one am tired of my elected representatives buggering goats.

[+] debbiedowner|6 years ago|reply

BadGAN found that to improve the classifier, you need a generator that has outputs that are not on the support of the classes you are trying to distinguish. That eventually to get your classifier a bit better at distinguishing digits, your generator has to start producing non digits like Korean characters. At first I didn't believe it, but then I confirmed it myself.

This was in the limited labelled data regime tho.

[+] tachyonbeam|6 years ago|reply

Can you elaborate what you mean by "not on the support of the classes you are trying to distinguish"? In the case of your digit classifier, in order to improve your classifier, you want to train with an extra output that says "not a digit"?

[+] jaytaylor|6 years ago|reply

It's hard to get a sense for how cherry-picked the examples are with such a limited set of neural-generated articles exposed for our inspection.

With that said, this is definitely interesting work! I've researched, published, and presented (at the Web Conference in SF, just last month) on using NLP with Discourse Analysis to detect lies and deception in product reviews [0]. I wonder if improvements in accuracy could be achieved by using both techniques in concert.

[0] https://jaytaylor.com/WWW19COMPANION-138--An_Anatomy_of_a_Li...

[+] schmmd|6 years ago|reply

You can generate your own examples with the "Generate" button. It will fill in that field based on generated text using the other fields as input.

For example, you could clear all the fields, enter a headline, and generate an article for that headline.

[+] pelagic_sky|6 years ago|reply

If you want to have fun with the generator, use domains such as theonion.com or cracked.com with an outlandish headline about someone you know, or yourself.

Also, this seems great for coming up with plots for bad horror movies. :)

[+] imvetri|6 years ago|reply

Would it be possible for both AI's to find out other AI's technique?

Don't bring up any gate walls/ firewalls stories, please. What if there is a missed case?

Here is an example.

1. News about AI-driven-fake-news gets published. 2. AI reads it and understands it. 3. EXPLAIN HERE - HOW THE SITUATION IS KEPT UNDER CONTROL ?

Its an intellectual competition. Just like how facebook's bots went on a fight that we didn't understand, the shared example may also get into that stage.

[+] zitterbewegung|6 years ago|reply

The problem is not that you can defend or detect against fake news but if someone is at the point that they are already reading it and then convinced then you already have lost.

Using this as a filter to flag fake news and or add a disclaimer when it is an automated system would have false positives as discussed in the paper. You would also have tertiary effects when performing this blocking.

[+] wilde|6 years ago|reply

In the vision of the future this research is responding to, won’t we have neural real news too? (Looking at you AP.) I’m confused as to why they’re focusing on writing style rather than pushing the state of the art on reading comprehension to actually fact check things.

[+] cr4zy|6 years ago|reply

Wow, the generator is amazing. Two out of three articles were beyond what I perceived was possible, beyond maybe GP2. Is there a browser extension that detects and warns about possible fake news this can be built into?

1.

  seed: nytimes.com
  seed: AI will certainly be good
  seed: June 6, 2019 - Paul Krugman
  generated: ∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨ 
  Artificial intelligence – by definition, computer code – can provide
  more than a speed boost. Many researchers now believe that intelligence,
  better defined as capacity to learn, will be increasingly provided
  by machines. Autonomy, already on the march, has become even more
  central. Future programs will likely be as familiar as Adam Smith’s
  invisible hand.

  If true, such predictions open up fundamentally new possibilities
  for future development. We will be able to quickly develop services
  based on or even built around lots of self-learning software. We’ll
  be able to use AI-based services without physically owning the software.
  As sensors in buildings, cars, buildings and cities start recording
  our activities, we will be able to experience “smart cities” and
  “smart cities” the moment they materialize.
  
  This future, however, is not yet in the cards. The gains are probably
  years or even decades away. Even in the areas where AI might dominate,
  such as physical warehousing, improving automation has lagged behind
  improvements in other areas of business. And our thinking, though
  improving, is still far from sophisticated enough to take full
  advantage of AI.
  
  Also:
  
  Please take a moment to watch the full program of this interview.
  Every panelist will speak out in favor of AI.

2.

  seed: nytimes.com
  seed: Uncertainty around AI is overhyped
  seed: June 6, 2019 - Paul Krugman
  generated: ∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨
  There’s no denying that AI, in many ways, is in its infancy. It isn’t
  something that belongs on a refrigerator or wall clock; it’s a whole 
  new area of research and development. While it may be commonplace 
  within a few years, AI will likely be a long way off; if it even 
  becomes an important factor in the economy as a whole, well, 
  probably about a century from now.
  
  It’s also true that many worry about the potential for 
  superintelligent machines, who could threaten all sorts of things 
  from human liberty to national security. But that’s not the right 
  question to ask.
  
  There is no “right” or “wrong” answer here — nobody really knows 
  how AI will evolve. And there’s no “theory” (although there are 
  several, ranging from quantum theory to AI techniques employed 
  by biologists, and from genetics to postulates to mathematical 
  models) that will explain it. Neither are there explanations 
  for the values, levels or usefulness of AI models. So, what 
  we’re left with is sheer hard scientific uncertainty.
  
  But let’s say we do know a little. The odds of a superintelligent 
  agent that can become sentient, one that would harm other humans 
  or destroy our civilization, seem minuscule. If you think that 
  chance is small, and that AI is likely to be less likely to 
  cross that threshold than is real intelligence, then the effects 
  on your consciousness and your mental space and our experience 
  of it may seem inconsequential. And that’s no reason to spout 
  half-baked fear mongering.

3.

  seed: nytimes.com
  seed: AI Uncertainty is Exaggerated
  seed: June 6, 2019 - Paul Krugman
  generated: ∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨
  As noted, Donald Trump uses the expression “AI Doom” – in part because 
  he understands AI but mostly because he responds to the media’s reports 
  on AI, which Trump himself frequently repeats, frequently without 
  evidence.
  
  AI may have risks – as in the threats from robots and remotely piloted 
  aircraft – but most analysts who follow AI say it has such promise 
  that its damage to people’s security and quality of life is likely 
  to be contained. Indeed, the main ethical question with AI is what 
  to do when machines decide that killing their human masters is a 
  good idea.
  
  It’s easy to exaggerate AI’s immediate threat. The machine that 
  driverless cars are supposed to replace is still pretty active. 
  And if, as predicted, the development of AI renders it easier for 
  people to work, then it’s also likely to give them more jobs.
  
  The main threat with AI is what it could do to workers, although 
  the more automation, the more all sorts of jobs become easily 
  replaceable. But a small dose of AI is actually a very good thing.

[+] antpls|6 years ago|reply

Check that out :

nytimes.com The Earth is flat. June 6, 2019 - Paul Krugman

generated: ∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨

Fox & Friends, NBC’s First Read, the Washington Post, the conservative National Review, and even the dreaded left-wing ThinkProgress have all suggested that world leaders must either be fools or lying propagandists, or else confused young men who want to overthrow the government. Not only is every country’s first priority mastering an original strategy for global domination and deploying its military for whatever the moment might call for, but all that has to be tested and refined in clandestine underground units!

Of course none of that makes much sense. But it’s one of the many crazy arguments put forward by the dumb doubters of climate change. And other dumb arguments we’ve had over the years in our ever-growing culture wars: whether guns were really necessary for self-defense in an age of mass shootings and real militias; whether Obama was really a Muslim-American socialist and wasn’t actually born in the United States; whether Harry Reid was a tool of the “Iran-Contra” gang or of a real anti-communist who just happened to be hiding it from the voters.

Both Republicans and Democrats should be able to give up these nonsense claims and instead trust that after generations, people will just no longer believe them. How else can one explain the failure of sane arguments to take hold, for climate science or other issues?

We should figure out what we’re afraid of and try a lot of sensible things instead. We should challenge authority and embrace instability, because there’s no meaningful alternative — and we should try and invent better, healthier versions of the original ideas that create stability and happiness. Even if those ideas don’t pan out, we should try them anyway.

It’s the method of living that counts.

#end of generation

[+] ve55|6 years ago|reply

Corresponding paper: Defending Against Neural Fake News https://arxiv.org/abs/1905.12616

[+] ForHackernews|6 years ago|reply

Their generated example is:

> nytimes.com

> Why Bitcoin is a great investment

> June 6, 2019 - Paul Krugman

If you know anything about bitcoin or Paul Krugman, this is obviously fake or satire just from the headline and the author's name.

[+] anderskaseorg|6 years ago|reply

I'm guessing that is deliberate: the article text can be generated from a human-written author and headline.

[+] SlowRobotAhead|6 years ago|reply

There has to be a website or record somewhere of someone doing the polar opposite of everything Krugman says and making money.

[+] jjoergensen|6 years ago|reply

The first text I wrote (myself), it detected as being written by a machine.

[+] malicioususer11|6 years ago|reply

can grover please write me a cheesecake recipe writen by woody allen called 'i made this cheesecake, but i only want to have it.'?

:3

[+] schmmd|6 years ago|reply

You could try yourself using the interactive demo, but it was trained on news text so it might struggle.

[+] mdonahoe|6 years ago|reply

generated: vvvvvvvvvvvvvvvvvvvvvv

http://www.songlyrics.com

June 6, 2019 - Woody Allen

I made this cheesecake, but I only want to have it

I like slightly toasted almond crust, so I baked it. I served it with nice whipped cream and nuts.

Some calls to lighten the cream cheese, I did it as smooth and dark as I could. A butter caramel was on top of the cream cheese, vanilla and cinnamon.

I just made one and I wanted to share it. It’s fancy, it’s good, but it’s mostly delicious.

Addictive Chocolate Chip Cheesecake with Vanilla Caramel

Ingredients

For crust:

1 cup chopped almonds

2 cups all-purpose flour

1 teaspoon baking powder

1/2 teaspoon baking soda

1/2 teaspoon salt

1/4 cup unsalted butter

1/4 cup packed brown sugar

1/4 cup granulated sugar

3 large eggs

1/2 cup brandy

1 cup confectioners’ sugar

Preheat oven to 350 degrees. Spray a 9-inch springform pan with vegetable spray.

For filling:

1 cup (2 sticks) unsalted butter, cut into 1/4-inch cubes

4 tablespoons packed brown sugar

2 cups whole milk

1/2 cup chocolate chips

For caramel sauce:

3 ounces unsalted butter, cut into 1/4-inch cubes

1/2 cup packed brown sugar

1/2 teaspoon vanilla extract

1/2 cup confectioners’ sugar

1 teaspoon vanilla extract

Special equipment:

Bake brush, silicone piping bag, piping bag filled with 2 teaspoons sugar

For crust:

Place almonds in a medium bowl. In a medium bowl, sift together flour, baking powder, baking soda and salt. In a medium bowl, combine butter and sugars. Mix with a stand mixer on medium speed until well combined.

Add brandy to mixing bowl. Add to almond mixture. Add eggs one at a time, mixing until just combined. Add flour mixture, and mix well. Fill springform pan with dough, tapping down sides to remove excess.

Bake for approximately 1 hour, until the edges are golden brown. Cool at room temperature for 10 minutes. Refrigerate for 10 minutes.

For filling:

In a medium saucepan, combine butter and sugar. Let sit for 1 hour. Add milk, chocolate chips and confectioners’ sugar. Let sit for 1 hour. Stir in vanilla.

Pipe cream cheese mixture into crust. Pour caramel sauce on top. Place in freezer for 10 minutes, until cooled.

From Cooking Light

Chef Woody Allen is a Virginia Beach native and lives with his wife and four daughters in the comfort of Newport News.

[+] danielovichdk|6 years ago|reply

Rofl

68 comments