OpenAI Releases Largest GPT-2 Text Generation Model

[+] vicaya|6 years ago|reply

Just like various chat bot incarnations, it's reasonably OK at generating generic sentences without real understanding. I tried several attempts with this leading fragment: "Is borrow checking unique to Rust? Probably not. But". The results are not pretty (the best one is as follows). Not impressed at all with this giant model :|

"Is borrow checking unique to Rust? Probably not. But the Rust library is written by and for Rust programmers, so if you're looking for a library that's designed for the language you already know, this might be your first choice.

The library is still in a fairly early stage. It does not support the standard libraries.

If you're interested in reading about how the library works, the docs are up on GitHub for reference!

How can I get this for my project?

Get it here: https://github.com/paulp/rustbinder

I've just read your blog post and want to use it, what can I do?

If you're looking for a project to use this library for, check out the docs for the "RustBinder project".

What are the current limitations and how will future changes affect this library?

This is still a work in progress. The library currently does not support the standard Rust library. There are a few work in progress"

[+] csomar|6 years ago|reply

Am I the only one impressed by the text generated? Sure, it doesn't have any understanding but are you factoring that 1. most people on the world do not know that Rust is a programming language and 2. a single person can not have that much general knowledge. Sure they can know about Rust borrow checker but will not be able to expand that much on another subject.

[+] nl|6 years ago|reply

Language models actually do "understand" things in the sense that they make decent foundations for knowledge bases (not forgetting that this is NOT what they are designed to do).

See for example https://www.aclweb.org/anthology/D19-1250.pdf (released today) which shows that the BERT language model performs extremely competitively with specialised knowledge bases and LB construction methods.

[+] ssivark|6 years ago|reply

Wow! That is meaningless but difficult to distinguish at a glance, especially when read by someone not familiar with the subject.

Are we going to see such auto generated content take over as the primary pillar of the SEO content farms?

... Kinda like an automated (text) version of Siraj Raval videos :-P

[+] etaioinshrdlu|6 years ago|reply

Love the fake link to github... Which model was this? Was it trained on software type discussion?

[+] stared|6 years ago|reply

Well, for detailed knowledge it is not enough.

But for Internet ramblings about anything (politics, religion, capitalism vs socialism), I bet it is well beyond the average human level. (If you want to protest, go to some random Facebook, YouTube, Reddit or Twitter thread. No, not HN, or specialized groups of interests, or anything dominated by academics or IT specialist.)

Also, somewhat related: https://news.ycombinator.com/item?id=21438318 "Undercover reporter reveals life in a Polish troll farm"

A few friends of mine became parents and started participating in some parenting FB groups. It was from them a shocking contact with getting outside of the intellectual bubble.

I would be really interested in judging the quality of GPT-2-generated tests against human texts. Questions like "does the person know what they talk about?", "are they smart?" with control on the knowledge of a particular subject (e.g. do they know Rust?) would give some insight into the effective level of AI for text generation.

[+] londons_explore|6 years ago|reply

Everyone knows that PaulP only writes Scala and Boa - Rust just isn't his style! So unrealistic!

[+] rm_-rf_slash|6 years ago|reply

At a credibility score of 6.91/10, many people will rightly judge that the full GPT-2 model will remain insufficient for malicious use in creating fake news.

However, even the smaller models are already good enough for spamming/trolling/astroturfing. It doesn’t take a Shakespearean soliloquy to convince people of a point. Just enough of a flood of short 1-3 sentence pro/con comments on a forum can drastically affect the perceived public opinion of an issue. Those comments can then spur real people to reply, which could result in an ultimately organic but directed propaganda vector. Propaganda directors will carefully craft something for people to look at, and the GPT-2 bots will move people’s eyes in that direction.

You can see the same happen on r/subsimulatorgpt2, where the longer titles and prompts and replies eventually sprawl into incoherence, but the shorter sentences from the finetuned bots in the comments section are effectively indistinguishable from the kinds of short comments you would find on their respective subreddits.

Or in other words, the malicious uses for GPT-2 won’t be a tidal wave, but a flash flood.

[+] krick|6 years ago|reply

Wow, some samples are frighteningly good. I was impressed by previous models and I don't know if I'm just lucky this time, but... wow. Can anybody who is not into climbing even tell this is all fake?

Jain Kim is an experienced climber.

In 2006, she became the first woman from Korea to climb all five 8,000 meters (24,064 ft) peaks in the Swiss alpine ski run Alps in 24 hours. In 2009, she made history again by setting the record for the fastest time to climb an 8,000 meter peak with a team from China and South Korea.

She made the first ascent of 8,832-meter K2 in China, the second highest mountain in the world, in 2009 and the third highest mountain in Europe. She also is the first female Korean to summit a world-class peak.

During her two years as a mountaineering professor at Sogang University in Korea, she established two new routes in the Yalu River area. The first of these routes is a 3,547-meter peak named K2 on Mount Long in China. Her second route is on the same mountain, called the Lomonosov Ridge, at 3,632 meters.

[+] cure|6 years ago|reply

> Can anybody who is not into climbing even tell this is all fake?

Yeah, there are zero 8,000 meter peaks in the Swiss Alps. The Mont Blanc is the tallest at 4,808 meters (https://en.wikipedia.org/wiki/List_of_mountains_of_the_Alps_...).

I'm not a climber :)

[+] pure-awesome|6 years ago|reply

> Can anybody who is not into climbing even tell this is all fake?

Yes, quite clearly from the following:

> She made the first ascent of 8,832-meter K2 in China, the second highest mountain in the world, in 2009 and the third highest mountain in Europe.

Firstly, this sentence scans poorly. I'm guessing it should be:

> In 2009, she made the first ascent of 8,832-meter K2 in China, the second highest mountain in the world, and the third highest mountain in Europe.

Second, how can a mountain in China be the third highest mountain in Europe? How can the second highest mountain the world be the third highest in Europe?

If I came across this in the wild, then even if I didn't think it was fake, I'd definitely think it was poorly proofread.

[+] bonoboTP|6 years ago|reply

> Can anybody who is not into climbing even tell this is all fake?

Hm, I'm pretty sure it's hard to climb five 8000 meter peaks in 24 hours :)

[+] backpropaganda|6 years ago|reply

Given sibling comments, this seems relevant: https://www.lesswrong.com/posts/4AHXDwcGab5PhKhHT/humans-who...

[+] vpzom|6 years ago|reply

Well, it gives two different heights and locations for a single mountain

[+] unknown|6 years ago|reply

[deleted]

[+] clmnt|6 years ago|reply

We (Hugging Face) added it to Write With Transformers if you want to try the text generation capabilities of the model: https://transformer.huggingface.co/doc/gpt2-xl

[+] epoch_100|6 years ago|reply

Paper: https://d4mucfpksywv.cloudfront.net/papers/GPT_2_Report.pdf

Code: https://github.com/openai/gpt-2

[+] rfhjt|6 years ago|reply

Prompt: "Real things don't exist unconditionally and things that exist unconditionally are not real. However the reality has an essense. It is"

Response: "an actual thing, and it is not the thing to which we attach meaning. It is not real because it is not a thing. And therefore, it does not possess the qualities that are inherent in all real things."

Just wow. Sure, there are a few logical mistakes here, but this response serves as a good prompt for my bio-GPT. In other words, we usually need some starting points or hints for analysis and discovering these hints is non trivial because whatever we can think of is not very new to us. This GPT just gave me an answer that smells like a serious wisdom and I'll surely dig in that direction to see if this idea has any substance.

Edit: what's happening here is while I can't ask this model to give me a short and conscise summary on a topic, I can still interrogate this model and find out what it's seen in the training set. I can't possibly read all these books in the training set, but now I can rapidly navigate in the multidimensional meaning space: I tell it where to start and it says what it seems in close proximity to my prompt. This is a breakthru.

[+] hint23|6 years ago|reply

You can try it at: http://textsynth.org

[+] buboard|6 years ago|reply

> (CTEC) found that extremist groups can use GPT-2 for misuse, specifically by fine-tuning GPT-2 models on four ideological positions: white supremacy, Marxism, jihadist Islamism, and anarchism. CTEC demonstrated that it’s possible to create models that can generate synthetic propaganda for these ideologies

I wonder how they tested that

[+] rq1|6 years ago|reply

Is hegelianism a better ideology? I don’t understand the underlying message.

[+] chaz6|6 years ago|reply

Surely we are not far off models capable of submission-quality essays that will enable a new generation of cheating.

[+] JRKrause|6 years ago|reply

From my observation, even the largest GPT-2 model has difficulty retaining any long-range relationship information. In the "unicorn" writing example that was published originally, the model 'forgets' where the researchers are (climbing a mountain versus being beside a lake iirc) after just a few sentences. Because of this, it's hard to imagine models of this type being able to write long-form coherent papers. Now if we could somehow constrain the generated text to conform to a predefined graph structure that isn't forgotten so quickly...

[+] minimaxir|6 years ago|reply

The next frontier is conditional generation. The CTRL model (https://github.com/salesforce/ctrl) with a similar architecture to GPT-2 emphasizes conditional generation (e.g. generate a news article based on a URL) and the results are pretty good: https://minimaxir.com/2019/09/ctrl-fake-news/

[+] FillardMillmore|6 years ago|reply

As another mentioned, these models currently can't maintain a believably cohesive train of thought for any longer than 3 or 4 sentences. They are great at drawing statistical probabilities related to what words best conclude sentences and how punctuation should properly be used, but thus far have proven quite lacking in the ability to replicate true human creativity. The Economist ran an excellent article on this very concern recently:

https://www.economist.com/books-and-arts/2019/10/31/dont-fea...

[+] ebj73|6 years ago|reply

I think it's very far off. You can see how this model drifts off into gibberish after only 3 or 4 words. It does not really understand the topic, even within a sentence, and much less within a whole paragraph.

For it to understand the topic throughout a whole essay, that would probably require it to have full general intelligence on par with humans. And that's very, very far into the future, still.

[+] taneq|6 years ago|reply

I'd say generating an entire coherent essay like this would constitute a solid pass of the Turing test. I don't think we're that close to that yet (or alternately the singularity is right around the corner, because you could feed that same bot a corpus of AI papers and have it write new publishable ones.)

[+] TaupeRanger|6 years ago|reply

We are very far off, unless plagiarization is not considered "off the table".

[+] rfhjt|6 years ago|reply

Prompt: The coming global recession is a real possibility and"

Response: "The coming global recession is a real possibility and the Fed is playing games, creating artificial market conditions to make a recovery seem possible in the short-term. The Fed has an option to change its monetary policies but it will not make the problem go away, so it is in their best interest to pretend it won't happen."

Change and to however and you'll get another stereotype opinion. It really just composes pieces of texts it's seen around the prompt, but it does this really well.

Most of the news agencies can now fire most of their monkey typewriters: this GPT will outperform them on every metric.

[+] unknown|6 years ago|reply

[deleted]

[+] k8si|6 years ago|reply

Omfg can we stop making these things bigger PLEASE

Like, who cares??

* What I mean is, text gen models are big enough. We need controllable text generation; like, so it can talk about a specific THING sensibly. Rather than spew statistically plausible nonsense.

[+] oaskmutboard|6 years ago|reply

I think this could make a great Tinder feature to suggest chat lines.

[+] YeGoblynQueenne|6 years ago|reply

Oh, I can imagine a few:

<input> "If I told you your body is hot, would you hold it against me?" <input>

<output> It was hot and the body of a young woman was lying in a bloody hell. Hell was hot and was full of beautiful young women. The body was lying in the entrace of the lobby and there was a small crowd gathering. it was a hot day in hell <output>.

Could work on the right person though.

[+] odkamkfn|6 years ago|reply

Who would benefit?

[+] unknown|6 years ago|reply

[deleted]

[+] gerash|6 years ago|reply

Sampling realistic text from large pretrained models is non-trivial. I came across this paper in one of ACL 2019 workshops:

https://arxiv.org/pdf/1904.09751.pdf

[+] ionwake|6 years ago|reply

Sorry for asking but is there an example output and an example input?

[+] 490d0aff0ee8|6 years ago|reply

Tangent rant.

I'm skimming over some of the code at https://github.com/openai/gpt-2/blob/master/src/model.py and I can't help but feel frustrated at how unreadable this stuff is.

1. Why is it acceptable to have single-letter variable names everywhere?

2. There's little to almost no documentation in the code itself. It's unclear what the parameters of any given function mean.

3. There are magic constants everywhere.

4. Function names are so terse... ("gelu", "attn")

[+] moultano|6 years ago|reply

The notation in the code will be very familiar to anyone comfortable with the underlying research and math. The "conceptual" documentation is in the literature.

What you're asking for is the rough equivalent of asking a C programmer to name their loop variables "index" instead of "i." Everyone familiar with the concepts of c programming knows what "i" means in the context of a for loop. Similarly, everyone familiar with transformers knows what "gelu" and "attn" mean.

[+] high_derivative|6 years ago|reply

My professional observation (as ml researcher at big tech):

These companies hire a lot of engineers straight out of undergrad/master's degrees. The interviews test leetcode knowledge, and today lots of degrees are heavy on Python-scripted ML homework.

The result is companies with billion dollar funding and world-changing goals having a lot of their code look like complete spaghetti.

And this is the engineers who are meant to clean up research scientist code. Scientists generally don't feel like it's their responsibility to write strong code.

Systems-side teams/orgs have better code, but essentially as soon as you enter the 'ml engineer/research engineer/research scientist' layer, it's doomed.

[+] WnZ39p0Dgydaz1|6 years ago|reply

I actually disagree with you here. I don't think the code is unreadable, it follows standard notation used in Machine Learning. If you read scientific papers you will notice that e.g. variable names are the same as those used in mathematical formulas that everyone in the field is familiar with. The same goes for parameters, function names, and so on. They are standard notation/naming and only look confusing to people outside of the ML field. Giving them long uncommon names would actually be more confusing.

As someone with experience in ML research I think this code is quite well written compared to what you typically see (a single function with hundreds of lines and dozens of if statements). I can immediately see what any of the functions does, and I haven't even read the paper.

[+] TTPrograms|6 years ago|reply

It's a specification of essentially a complex graph of mathematical operations. If there's a function called

  def mult(a,b): return a*b

it's not much more informative to write:

  def mult(activation_a, activation_b): return activation_a*activation_b

Many of these functions are not much more complex than that, and the names along with their comments are more than sufficient given familiarity with the literature. If you think familiarity with the literature is unreasonable, it's still not clear what could improve code like this in reasonable space. "This is a linear function, which means that it satisfies f(x+a)=f(x)+f(a)"? "This is the attention head, it acts as a mask on the sequence input"? It would be like complaining that someone made a tree class and didn't put a comment explaining what a leaf node is. Code readability always assumes some reader context and minimum pre-existing knowledge (as do all forms of technical communication).

[+] make3|6 years ago|reply

I understand what you mean but please understand that this code is targeted at people which would at least have some background knowledge, like having read the seminal Transformer paper, "Attention Is All You Need", https://arxiv.org/abs/1706.03762

Most of the code becomes really straightforward once you have. A lot of the magic constants are the result of multi page proofs (like the GELU constant) that would be impractical to put in the code.

Deep learning research really is a field that requires some amount of knowledge, and it's normal that you don't automatically understand state of the art code. Here is the GPT2 paper https://d4mucfpksywv.cloudfront.net/better-language-models/l...

[+] slimsag|6 years ago|reply

In my experience, this is the norm in the ML scene. Giant globs of unreadable and in no way understandable code -- unless of course you already understand everything.

I think this is because the "product" so to speak is often the papers themselves, not the code, but I'm not sure.

[+] bredren|6 years ago|reply

Could these functions just be implementations of math with matching variable names?

[+] buboard|6 years ago|reply

because a lot of it is meant to correspond to math equations so variables names like w, u, v , b ,g match the equations in the papers ? I actually think it's pretty readable, as long as you know what it i supposed to implement (i don't; but i imagine they are implementing a complex graph), and short names help figure out where things go in and out in one screenfull.

Complex graphs are literally a spaggeti of arrows, and this format actually is pretty readable (even though in pytorch it would be more readable). I guess they leave comments out because it's not really possible to understand each line on its own (unless it's an implementation detail); you have to read the paper to know what s going on

[+] fermenflo|6 years ago|reply

I agree, a lot of the code could be improved. But some of what you mentioned is fairly standard. Like "Gaussian Error Linear Units being GELU, w/b for weights/biases, etc...

[+] m463|6 years ago|reply

Mirrors my thoughts regarding all math textbooks and published papers.

I remember reading a famous scientist (newton maybe) published a really accessible book on a subject, which was read by lots of lay persons and opened him up to lots of unwanted public attention.

So publishing in a more inscrutable way might be a way of assuring peer-to-peer communication.

Either that, or it's a labor of love where cleaning things up would detract from the forward momentum.

[+] JoeMayoBot|6 years ago|reply

Having worked with math/research folks in the past, this isn't surprising. That said, from a software engineering perspective, where a typical code review would identify this, it is immediately noticeable.

[+] RobZombie|6 years ago|reply

[deleted]

158 comments