Yeah, twice in one day (and 3+ times in a month) is a bit much and this post adds little new except some oddities like singling kubernetes in particular as 'the technology that helped train GPT-3'.
I run GPT-J on a Titan RTX where I am writing a novel with it. To make it generate about 20k tokens or two pages of content takes a few minutes . I would say the output is comparable to other language models quality and so forth.
Note that refinement or transfer learning doesn’t apply anymore it’s more like using a zero shot classifier or in other words you have to craft the input like Siri or wolfram alpha but expect text back instead
It runs on about 15gb of VRAM and https://www.eleuther.ai/ released it under the Apache license.
It sounds like we're using very similar setups for writing. :)
I've primarily been using GPT-3 (and burning through millions of tokens) so I've been experimenting with GPT-J more lately and I've found it makes significantly more basic logic errors (e.g. mixing up pronouns, "forgetting" characters, using new characters), which makes me lean more towards constantly regenerating text versus revising it, and I feel like I'm backed into a box of having to babysit ~100-200 tokens at a time instead of generating significantly more at a time like I do with GPT-3.
I also built a quick tool that lets me adjust how much context I'm using in the prompt and generate 2-3 side-by-side completions to pick and choose from (just to speed up the flow of `click best suggestion` --> `keep generating from there`), but I haven't integrated GPT-J yet since it just feels... lower quality (it feels similar to ~Babbage, IMO).
But being comparatively* free, I'm still excited about GPT-J. Do you have any tips or processes you've found to make it spit out higher-quality text? 20k tokens at a time is quite a lot -- do you also have problems with winding paths / staying on a general "plot"?
Would love to hear any suggestions you have, because I'd sure love to move off GPT-3 to something comparable in quality!
However the fact that it is trained with 6B parameters compared to GPT-3’s 175B parameters indicate open source GPT still has a lot to catch on. GodSpeed and full speed !!!
Transparent and competent automated content moderation, maybe, easily available for anyone to run their own communities by their own standards. Once matured, you can easily envision people sharing policies and templates, or providing moderation as a service, for any sort of social text interaction.
How much resources would it take to train something like this? Let's say in RTX-3090 months, or in $ running on AWS (or whoever offers TPUs)?
In the link it says training FLOPs: 1.5e22, a RTX 3090 has 285 TFLOPS/sec for tensor operations. If you can actually calculate it like that, 1000 GPUs would take 20 months.
I had a History teacher who told me democracy often doesn't really mean anything. I've been thinking about it ever since. "Free GPT-3" would probably be better.
That's pretty much exactly what "democratizing" means, at least etymologically. It brings power (kratia) to the people (demos).
"Democracy" refers to a political system, but "democratizing" has pretty much always meant to open things up more widely. Example:
"The State wishes to democratize instruction by its "French instruction" and the standard must inevitably be the lowering of the very standard it sets up." -- 1894
And yeah -- guns do democratize violence. That's precisely why they were invented. A Google search for "guns democratize violence" turns up several hits.
rorykoehler|4 years ago
Tenoke|4 years ago
Kelamir|4 years ago
zitterbewegung|4 years ago
Note that refinement or transfer learning doesn’t apply anymore it’s more like using a zero shot classifier or in other words you have to craft the input like Siri or wolfram alpha but expect text back instead
It runs on about 15gb of VRAM and https://www.eleuther.ai/ released it under the Apache license.
I use this endpoint written by kinoc using FastAPI https://gist.github.com/kinoc/f3225092092e07b843e3a2798f7b39... which is released under the MIT licence.
drusepth|4 years ago
I've primarily been using GPT-3 (and burning through millions of tokens) so I've been experimenting with GPT-J more lately and I've found it makes significantly more basic logic errors (e.g. mixing up pronouns, "forgetting" characters, using new characters), which makes me lean more towards constantly regenerating text versus revising it, and I feel like I'm backed into a box of having to babysit ~100-200 tokens at a time instead of generating significantly more at a time like I do with GPT-3.
I also built a quick tool that lets me adjust how much context I'm using in the prompt and generate 2-3 side-by-side completions to pick and choose from (just to speed up the flow of `click best suggestion` --> `keep generating from there`), but I haven't integrated GPT-J yet since it just feels... lower quality (it feels similar to ~Babbage, IMO).
But being comparatively* free, I'm still excited about GPT-J. Do you have any tips or processes you've found to make it spit out higher-quality text? 20k tokens at a time is quite a lot -- do you also have problems with winding paths / staying on a general "plot"?
Would love to hear any suggestions you have, because I'd sure love to move off GPT-3 to something comparable in quality!
mmastrac|4 years ago
EDIT: Reply below pointed out that the gist linked above specifically mentions 3090 at the top.
gurchik|4 years ago
Could you say more?
la6471|4 years ago
c7DJTLrn|4 years ago
likecarter|4 years ago
smnrchrds|4 years ago
EDIT: Or OpenWindows desktop environment.
imvetri|4 years ago
mhuffman|4 years ago
robbedpeter|4 years ago
captainmuon|4 years ago
In the link it says training FLOPs: 1.5e22, a RTX 3090 has 285 TFLOPS/sec for tensor operations. If you can actually calculate it like that, 1000 GPUs would take 20 months.
https://www.google.com/search?q=1.5e22+%2F+%281000+*+285e9+%...
Working with grid computing at CERN, I've had access to some pretty big computing resources myself, but the scale of this is mind-boggling...
hekec|4 years ago
Could anyone recommend a tutorial how to run this? I need to generate answers to 1000 questions for my app. Still waiting for GPT-3 invite for months
Kelamir|4 years ago
https://6b.eleuther.ai/ https://bellard.org/textsynth/
loosetypes|4 years ago
For every valid case, I see others that make my head hurt. Is it just a buzzword for telling a story with emotional appeal to users and investors?
Making something available to someone who didn’t have it earlier isn’t democratizing. And ignoring future considerations is just lazy.
If guns were invented today they’d probably touted as democratizing violence.
mkl95|4 years ago
jfengel|4 years ago
"Democracy" refers to a political system, but "democratizing" has pretty much always meant to open things up more widely. Example:
"The State wishes to democratize instruction by its "French instruction" and the standard must inevitably be the lowering of the very standard it sets up." -- 1894
https://www.google.com/books/edition/Education_from_a_Nation...
And yeah -- guns do democratize violence. That's precisely why they were invented. A Google search for "guns democratize violence" turns up several hits.