GPT-3 is no longer the only game in town

[+] rg111|4 years ago|reply

The future is not as dark as it seems because of the rat race of megacorps.

You can use reduced versions of language models with extremely good results.

I was involved in training the first-ever GPT2 for Bengali language, but with 117 million parameters.

It took a month's effort (training + writing code + setup) and about $6k in TPU cost, but Google Cloud covered it.

Anyway, it is surprisingly good. We fine-tuned the model for several downstream tasks and we were shocked when we saw the quality of generated text.

I fine-tuned this model to write Bengali poems with a dataset of just about 2k poems and ran the training for 20 minutes in GPU instance of Colab Pro.

I was really blown away by the quality.

The main training was done in JAX, and it is much faster and seamless than PyTorch XLA, much better than TensorFlow in every way.

So, my point is, although everyone is talking hundreds of billions of parameters and millions in training cost, you can still derive practical value from language models, and that too, at a low cost.

[+] cmrajan|4 years ago|reply

Good to know. We're trying to attempt something similar[1] but for Tamil. I'm also surprised how well the OSS language model & library AI4Bharat [2] performs for NLP tasks against SoTA systems. Is there a way to contact you? [1] https://vpt.ai/posts/about-us/ [2] https://ai4bharat.org/projects/

[+] amelius|4 years ago|reply

> The future is not as dark as it seems because of the rat race of megacorps.

Just wait until NVidia comes with a "Neural AppStore" and corresponding restrictions. Then wait until the other GPU manufacturers follow suit.

[+] ShamelessC|4 years ago|reply

> about $6k in TPU cost, but Google Cloud covered it.

I'm glad this all worked out for you. This is unrelated, but I just want to say that I hate how many people Google managed to convert to TPU with their research program and that their managed TPU/GPU offerings are absolutely horrible and infuriating to work with unless you somehow get on their radar.

[+] zebraflask|4 years ago|reply

That is a fantastic result - nagging question - these work best on predictable things. How much of Bengali poetry is predictable?

[+] totoglazer|4 years ago|reply

What library did you use for JAX on TPU? Also curious how much data you had?

[+] visarga|4 years ago|reply

The GPT-3 family is still to expensive to use, too big to fit in memory on a single machine. Prices need to come down before large scale adoption or someone needs to invent the chip to hold it (cheaply).

The most exciting part about it is showing us there is a path forward by scaling and prompting, but you can still do much better with a smaller model and a bit of training data (which can come from the expensive GPT-3 as well).

What I expected from the next generation: multi-modality, larger context, using retrieval to augment input data with fresh information, tuned to solve thousands of tasks with supervised data so it can generalize on new tasks better, and some efficient way to keep it up to date and fine-tune it. On the data part - more data, more languages - a lot of work.

[+] warning26|4 years ago|reply

Neat to see more models getting closer, thought it appears only one so far has exceeded GPT-3's 175B parameters.

That said, what I'm really curious is how those other models stack up against GPT-3 in terms of performance -- does anyone know of any comparisons?

[+] sillysaurusx|4 years ago|reply

I’m surprised that no one has answered for three hours!

The answer is at https://github.com/kingoflolz/mesh-transformer-jax

It has detailed comparisons and a full breakdown of the performance, courtesy of Eleuther.

[+] pyb|4 years ago|reply

+1, does the new generation match or exceed GPT-3 in terms of relevance ? Is there a way for a non-AI-researcher to understand how the benchmarks measure this ? Bigger does not mean better.

[+] 6gvONxR4sf7o|4 years ago|reply

Whenever I've seen language modeling metrics, GPT-3's largest model has been at the top. If you see a writeup that doesn't include accuracy-type metrics, you're reading a sales pitch, not an honest comparison.

[+] machiaweliczny|4 years ago|reply

There's Wu Dao 2.0 and Google has 2 models with 1T+ params.

[+] moffkalast|4 years ago|reply

> most recently NVIDIA and Microsoft teamed up to create the 530 billion parameter model Megatron-Turing NLG

Get it, cause it's a generative transformer? Hah

[+] rexreed|4 years ago|reply

GPT-3 is the most overrated game in town. And Microsoft spending $1 Billion for an exclusive license will seem really foolish a few years from now.

[+] GhettoComputers|4 years ago|reply

>However, the ability of people to build upon GPT-3 was hampered by one major factor: it was not publicly released. Instead, OpenAI opted to commercialize it and only provide access to it via a paid API. This made sense given OpenAI’s for profit nature, but went against the common practice of AI researchers releasing AI models for others to build upon. So, since last year multiple organizations have worked towards creating their own version of GPT-3, and as I’ll go over in this article at this point roughly half a dozen such gigantic GPT-3 esque models have been developed.

Seems like aside from Eleuther.ai you can’t use the models freely either, correct me if I’m wrong.

[+] andreyk|4 years ago|reply

I believe you are correct, at least for GPT-3 scaled things. Hopefully that'll change with time, though.

[+] TedShiller|4 years ago|reply

I never got why GPT-3 was so closed off, like you needed permission to use it. If it’s so good then why not just make it available?

[+] Oras|4 years ago|reply

One of the reasons is ensuring that unsafe generated text will not make it back to the internet.

OpenAI has strict requirements for the usage of GPT-3. For instance, you cannot automate posting to social media without a human in the middle.

[+] rexreed|4 years ago|reply

How else would you be able to sell a $1 Billion exclusive license to Microsoft?

[+] gbear605|4 years ago|reply

The concern, nominally, is that it’s too good. They’re worried that it'll lead to a huge influx of things like spam comments and fake news articles.

[+] krageon|4 years ago|reply

> If it’s so good then why not just make it available?

OpenAI is pivoting to corporate evil, and to do that properly they need proprietary assets to rent out.

[+] Nasrudith|4 years ago|reply

Because of vast piles of bad science fiction being taken as fact and outright hysteria.

[+] theshrike79|4 years ago|reply

I think the reasoning was that GPT-3 could easily be used to fill the world with realistic bullshit that would take ages to debunk.

[+] machiaweliczny|4 years ago|reply

They bet with Microsoft on automating programmers (or at least coders)

[+] ComodoHacker|4 years ago|reply

Are we heading to the (distant) future where to make progress in any field you have to spend big $$$ to train a model?

[+] air7|4 years ago|reply

So is there any one of them that I could play around with?

[+] lazylion2|4 years ago|reply

AI21 labs 178B parameter model

https://studio.ai21.com/

[+] sillysaurusx|4 years ago|reply

https://6b.eleuther.ai

[+] mdrzn|4 years ago|reply

Is there any similar version to GPT-3 available for free? Or usable online via web interface?

[+] qwertox|4 years ago|reply

If I were to run GPT-3 on my 70000 browser bookmarks, what kind of insights could I get from that?

Only by analyzing the page title (from the bookmark, not by re-fetching the url) and eventually also the domain name.

[+] eunos|4 years ago|reply

Number of parameters aside, I am really surprised that we havent yet reached hundreds of TB of training data. Especially Chinese model only used less than 10 TB of data.

[+] mrbukkake|4 years ago|reply

Can anyone tell me what the value of GPT-3 actually is other than generating meaningless prose? What would a business use it for

[+] micro_cam|4 years ago|reply

Actually using this class (larger transformer based language models) of models to generate text is to me the least interesting use case.

They can also all be adapted and fine tuned for other tasks in content classification, search, discovery etc. Think facnial recognition for topics. Want to mine a whole social network for anywhere people are talking about _______ even indirectly with very low false negative rate? You want to fine tune a transformer model.

Bert tends to get used for this more because it is freely available, established and not too expensive to fine tune but i suspect this is what microsoft licensing gpt-3 is all about.

[+] crubier|4 years ago|reply

Have you heard of GitHub copilot ? It’s based on GPT3 and I can tell you one thing: it does not generate meaningless prose (90% of the time)

[+] warning26|4 years ago|reply

GPT-3 is fairly effective at summarization, so that's one potential business use case:

https://sdtimes.com/monitor/using-gpt-3-for-root-cause-incid...

[+] hubraumhugo|4 years ago|reply

At https://reviewr.ai we're using GPT-3 to summarize product reviews into simple bullet-point lists. Here's an example with backpack reviews: https://baqpa.com

[+] lysecret|4 years ago|reply

Hey for a long time i was also very sceptical. However i can refer you to this paper to a really cool applciaiton. https://www.youtube.com/watch?v=kP-dXK9JEhY. They baseically use clever GPT-3 prompting to create a dataset, you then train another model on. Besides, you can prompt these models to get (depending on the usecase) really good few shot performance. And finally, github copilot is another pretty neat application.

[+] phone8675309|4 years ago|reply

It's good for the university-industrial-business complex - people writing papers about a model they can't even run themselves. It practically prints money in journal articles, travel per diem, and conference honorariam, not even counting the per-API call rates.

[+] amelius|4 years ago|reply

I hope that one day it will allow me to write down my thoughts in bullet-list form, and it will then produce beautiful prose from it.

Of course this will be another blow for journalists, who rely on this skill for their income.

[+] 13415|4 years ago|reply

Automatic generation of positive fake customer reviews on Amazon, landing pages about topics that redirect to attack and ad sites, fake "journalism" with auto-generated articles mixed with genuine press releases and viral marketing content, generating fake user profiles and automated karma farming on social media sites, etc. etc.

[+] actually_a_dog|4 years ago|reply

Is there any estimate out there of how many joules of energy it took to train GPT-3?

[+] DeathArrow|4 years ago|reply

People were blaming cryptocurrencies miners for the prices of GPUs, when in fact it was the AI researchers who bought all the GPUs. :D

I wonder what if somebody designs an electronic currency rewarded as payment for general GPU computations instead of just computing hashes? You pay some $, to train your model and the miner gets some coins.

Every one is happy, electricity is not wasted and the GPUs gets used for a reasonable purpose.

[+] xyproto|4 years ago|reply

I think companies should be banned from having "Open" in their names.

[+] evergrande|4 years ago|reply

OpenAI takes the Orwellian cake.

[+] c7DJTLrn|4 years ago|reply

"Open"AI but you can only use it how we want you to and no, you can't run it yourself.

[+] supperburg|4 years ago|reply

I have supported my beliefs on this topic in these threads to the point of exhausting myself. The tools that we use to find these agents are the underpinning of AGI, it’s coming way faster than even most people here appreciate, this development is intrinsically against the interest of human beings. Please stop and think, please.

[+] Simon321|4 years ago|reply

I argue it's very much in the interest of human beings. It has been since we first picked up a rock and used it has a hammer. It's the ultimate tool and has the potential to bring unseen prosperity.

[+] unknown|4 years ago|reply

[deleted]

[+] worik|4 years ago|reply

The underlying methods seem impractical. GPT-n are an existence proof - it is possible to make parrot like software that generates realistic text. But using these methods it is not practical.

Maybe that is a good thing, maybe a bad thing, but unless there is a breakthrough in methods this is a dead end. Impressive though.

[+] WithinReason|4 years ago|reply

https://copilot.github.com/

210 comments