top | item 27731266

GPT-J: GPT-3 Democratized

118 points| appleskimer | 4 years ago |p3r.one

39 comments

order

rorykoehler|4 years ago

Tenoke|4 years ago

Yeah, twice in one day (and 3+ times in a month) is a bit much and this post adds little new except some oddities like singling kubernetes in particular as 'the technology that helped train GPT-3'.

Kelamir|4 years ago

Is it a dupe though if it's a blog post, as opposed to the earlier Github submission?

zitterbewegung|4 years ago

I run GPT-J on a Titan RTX where I am writing a novel with it. To make it generate about 20k tokens or two pages of content takes a few minutes . I would say the output is comparable to other language models quality and so forth.

Note that refinement or transfer learning doesn’t apply anymore it’s more like using a zero shot classifier or in other words you have to craft the input like Siri or wolfram alpha but expect text back instead

It runs on about 15gb of VRAM and https://www.eleuther.ai/ released it under the Apache license.

I use this endpoint written by kinoc using FastAPI https://gist.github.com/kinoc/f3225092092e07b843e3a2798f7b39... which is released under the MIT licence.

drusepth|4 years ago

It sounds like we're using very similar setups for writing. :)

I've primarily been using GPT-3 (and burning through millions of tokens) so I've been experimenting with GPT-J more lately and I've found it makes significantly more basic logic errors (e.g. mixing up pronouns, "forgetting" characters, using new characters), which makes me lean more towards constantly regenerating text versus revising it, and I feel like I'm backed into a box of having to babysit ~100-200 tokens at a time instead of generating significantly more at a time like I do with GPT-3.

I also built a quick tool that lets me adjust how much context I'm using in the prompt and generate 2-3 side-by-side completions to pick and choose from (just to speed up the flow of `click best suggestion` --> `keep generating from there`), but I haven't integrated GPT-J yet since it just feels... lower quality (it feels similar to ~Babbage, IMO).

But being comparatively* free, I'm still excited about GPT-J. Do you have any tips or processes you've found to make it spit out higher-quality text? 20k tokens at a time is quite a lot -- do you also have problems with winding paths / staying on a general "plot"?

Would love to hear any suggestions you have, because I'd sure love to move off GPT-3 to something comparable in quality!

mmastrac|4 years ago

Can I run it on an RTX-3090? Where would I find information on how to do it?

EDIT: Reply below pointed out that the gist linked above specifically mentions 3090 at the top.

gurchik|4 years ago

> I am writing a novel with it.

Could you say more?

la6471|4 years ago

However the fact that it is trained with 6B parameters compared to GPT-3’s 175B parameters indicate open source GPT still has a lot to catch on. GodSpeed and full speed !!!

c7DJTLrn|4 years ago

That seems ridiculously slow. How is machine learning supposed to scale like this?

likecarter|4 years ago

Ironic that we have to create open source versions of things from “OpenAI”…

smnrchrds|4 years ago

There is open and there is open. OpenAI is more like the latter, similar to OpenVMS.

EDIT: Or OpenWindows desktop environment.

mhuffman|4 years ago

I was told that GPT-3 was too much power for mere mortals (without a paid subscription!) so what terror is this going to bring upon us?

robbedpeter|4 years ago

Transparent and competent automated content moderation, maybe, easily available for anyone to run their own communities by their own standards. Once matured, you can easily envision people sharing policies and templates, or providing moderation as a service, for any sort of social text interaction.

captainmuon|4 years ago

How much resources would it take to train something like this? Let's say in RTX-3090 months, or in $ running on AWS (or whoever offers TPUs)?

In the link it says training FLOPs: 1.5e22, a RTX 3090 has 285 TFLOPS/sec for tensor operations. If you can actually calculate it like that, 1000 GPUs would take 20 months.

https://www.google.com/search?q=1.5e22+%2F+%281000+*+285e9+%...

Working with grid computing at CERN, I've had access to some pretty big computing resources myself, but the scale of this is mind-boggling...

loosetypes|4 years ago

Does anyone else take fault with the increasingly commonplace usage of the word democratize as applied to technology?

For every valid case, I see others that make my head hurt. Is it just a buzzword for telling a story with emotional appeal to users and investors?

Making something available to someone who didn’t have it earlier isn’t democratizing. And ignoring future considerations is just lazy.

If guns were invented today they’d probably touted as democratizing violence.

mkl95|4 years ago

I had a History teacher who told me democracy often doesn't really mean anything. I've been thinking about it ever since. "Free GPT-3" would probably be better.

jfengel|4 years ago

That's pretty much exactly what "democratizing" means, at least etymologically. It brings power (kratia) to the people (demos).

"Democracy" refers to a political system, but "democratizing" has pretty much always meant to open things up more widely. Example:

"The State wishes to democratize instruction by its "French instruction" and the standard must inevitably be the lowering of the very standard it sets up." -- 1894

https://www.google.com/books/edition/Education_from_a_Nation...

And yeah -- guns do democratize violence. That's precisely why they were invented. A Google search for "guns democratize violence" turns up several hits.