The future is not as dark as it seems because of the rat race of megacorps.
You can use reduced versions of language models with extremely good results.
I was involved in training the first-ever GPT2 for Bengali language, but with 117 million parameters.
It took a month's effort (training + writing code + setup) and about $6k in TPU cost, but Google Cloud covered it.
Anyway, it is surprisingly good. We fine-tuned the model for several downstream tasks and we were shocked when we saw the quality of generated text.
I fine-tuned this model to write Bengali poems with a dataset of just about 2k poems and ran the training for 20 minutes in GPU instance of Colab Pro.
I was really blown away by the quality.
The main training was done in JAX, and it is much faster and seamless than PyTorch XLA, much better than TensorFlow in every way.
So, my point is, although everyone is talking hundreds of billions of parameters and millions in training cost, you can still derive practical value from language models, and that too, at a low cost.
Good to know. We're trying to attempt something similar[1] but for Tamil. I'm also surprised how well the OSS language model & library AI4Bharat [2] performs for NLP tasks against SoTA systems.
Is there a way to contact you?
[1] https://vpt.ai/posts/about-us/
[2] https://ai4bharat.org/projects/
> about $6k in TPU cost, but Google Cloud covered it.
I'm glad this all worked out for you. This is unrelated, but I just want to say that I hate how many people Google managed to convert to TPU with their research program and that their managed TPU/GPU offerings are absolutely horrible and infuriating to work with unless you somehow get on their radar.
The GPT-3 family is still to expensive to use, too big to fit in memory on a single machine. Prices need to come down before large scale adoption or someone needs to invent the chip to hold it (cheaply).
The most exciting part about it is showing us there is a path forward by scaling and prompting, but you can still do much better with a smaller model and a bit of training data (which can come from the expensive GPT-3 as well).
What I expected from the next generation: multi-modality, larger context, using retrieval to augment input data with fresh information, tuned to solve thousands of tasks with supervised data so it can generalize on new tasks better, and some efficient way to keep it up to date and fine-tune it. On the data part - more data, more languages - a lot of work.
+1, does the new generation match or exceed GPT-3 in terms of relevance ? Is there a way for a non-AI-researcher to understand how the benchmarks measure this ? Bigger does not mean better.
Whenever I've seen language modeling metrics, GPT-3's largest model has been at the top. If you see a writeup that doesn't include accuracy-type metrics, you're reading a sales pitch, not an honest comparison.
>However, the ability of people to build upon GPT-3 was hampered by one major factor: it was not publicly released. Instead, OpenAI opted to commercialize it and only provide access to it via a paid API. This made sense given OpenAI’s for profit nature, but went against the common practice of AI researchers releasing AI models for others to build upon. So, since last year multiple organizations have worked towards creating their own version of GPT-3, and as I’ll go over in this article at this point roughly half a dozen such gigantic GPT-3 esque models have been developed.
Seems like aside from Eleuther.ai you can’t use the models freely either, correct me if I’m wrong.
Number of parameters aside, I am really surprised that we havent yet reached hundreds of TB of training data. Especially Chinese model only used less than 10 TB of data.
Actually using this class (larger transformer based language models) of models to generate text is to me the least interesting use case.
They can also all be adapted and fine tuned for other tasks in content classification, search, discovery etc. Think facnial recognition for topics. Want to mine a whole social network for anywhere people are talking about _______ even indirectly with very low false negative rate? You want to fine tune a transformer model.
Bert tends to get used for this more because it is freely available, established and not too expensive to fine tune but i suspect this is what microsoft licensing gpt-3 is all about.
At https://reviewr.ai we're using GPT-3 to summarize product reviews into simple bullet-point lists. Here's an example with backpack reviews: https://baqpa.com
Hey for a long time i was also very sceptical. However i can refer you to this paper to a really cool applciaiton. https://www.youtube.com/watch?v=kP-dXK9JEhY. They baseically use clever GPT-3 prompting to create a dataset, you then train another model on.
Besides, you can prompt these models to get (depending on the usecase) really good few shot performance.
And finally, github copilot is another pretty neat application.
It's good for the university-industrial-business complex - people writing papers about a model they can't even run themselves. It practically prints money in journal articles, travel per diem, and conference honorariam, not even counting the per-API call rates.
Automatic generation of positive fake customer reviews on Amazon, landing pages about topics that redirect to attack and ad sites, fake "journalism" with auto-generated articles mixed with genuine press releases and viral marketing content, generating fake user profiles and automated karma farming on social media sites, etc. etc.
People were blaming cryptocurrencies miners for the prices of GPUs, when in fact it was the AI researchers who bought all the GPUs. :D
I wonder what if somebody designs an electronic currency rewarded as payment for general GPU computations instead of just computing hashes? You pay some $, to train your model and the miner gets some coins.
Every one is happy, electricity is not wasted and the GPUs gets used for a reasonable purpose.
I have supported my beliefs on this topic in these threads to the point of exhausting myself. The tools that we use to find these agents are the underpinning of AGI, it’s coming way faster than even most people here appreciate, this development is intrinsically against the interest of human beings. Please stop and think, please.
I argue it's very much in the interest of human beings. It has been since we first picked up a rock and used it has a hammer. It's the ultimate tool and has the potential to bring unseen prosperity.
The underlying methods seem impractical. GPT-n are an existence proof - it is possible to make parrot like software that generates realistic text. But using these methods it is not practical.
Maybe that is a good thing, maybe a bad thing, but unless there is a breakthrough in methods this is a dead end. Impressive though.
[+] [-] rg111|4 years ago|reply
You can use reduced versions of language models with extremely good results.
I was involved in training the first-ever GPT2 for Bengali language, but with 117 million parameters.
It took a month's effort (training + writing code + setup) and about $6k in TPU cost, but Google Cloud covered it.
Anyway, it is surprisingly good. We fine-tuned the model for several downstream tasks and we were shocked when we saw the quality of generated text.
I fine-tuned this model to write Bengali poems with a dataset of just about 2k poems and ran the training for 20 minutes in GPU instance of Colab Pro.
I was really blown away by the quality.
The main training was done in JAX, and it is much faster and seamless than PyTorch XLA, much better than TensorFlow in every way.
So, my point is, although everyone is talking hundreds of billions of parameters and millions in training cost, you can still derive practical value from language models, and that too, at a low cost.
[+] [-] cmrajan|4 years ago|reply
[+] [-] amelius|4 years ago|reply
Just wait until NVidia comes with a "Neural AppStore" and corresponding restrictions. Then wait until the other GPU manufacturers follow suit.
[+] [-] ShamelessC|4 years ago|reply
I'm glad this all worked out for you. This is unrelated, but I just want to say that I hate how many people Google managed to convert to TPU with their research program and that their managed TPU/GPU offerings are absolutely horrible and infuriating to work with unless you somehow get on their radar.
[+] [-] zebraflask|4 years ago|reply
[+] [-] totoglazer|4 years ago|reply
[+] [-] visarga|4 years ago|reply
The most exciting part about it is showing us there is a path forward by scaling and prompting, but you can still do much better with a smaller model and a bit of training data (which can come from the expensive GPT-3 as well).
What I expected from the next generation: multi-modality, larger context, using retrieval to augment input data with fresh information, tuned to solve thousands of tasks with supervised data so it can generalize on new tasks better, and some efficient way to keep it up to date and fine-tune it. On the data part - more data, more languages - a lot of work.
[+] [-] warning26|4 years ago|reply
That said, what I'm really curious is how those other models stack up against GPT-3 in terms of performance -- does anyone know of any comparisons?
[+] [-] sillysaurusx|4 years ago|reply
The answer is at https://github.com/kingoflolz/mesh-transformer-jax
It has detailed comparisons and a full breakdown of the performance, courtesy of Eleuther.
[+] [-] pyb|4 years ago|reply
[+] [-] 6gvONxR4sf7o|4 years ago|reply
[+] [-] machiaweliczny|4 years ago|reply
[+] [-] moffkalast|4 years ago|reply
Get it, cause it's a generative transformer? Hah
[+] [-] rexreed|4 years ago|reply
[+] [-] GhettoComputers|4 years ago|reply
Seems like aside from Eleuther.ai you can’t use the models freely either, correct me if I’m wrong.
[+] [-] andreyk|4 years ago|reply
[+] [-] TedShiller|4 years ago|reply
[+] [-] Oras|4 years ago|reply
OpenAI has strict requirements for the usage of GPT-3. For instance, you cannot automate posting to social media without a human in the middle.
[+] [-] rexreed|4 years ago|reply
[+] [-] gbear605|4 years ago|reply
[+] [-] krageon|4 years ago|reply
OpenAI is pivoting to corporate evil, and to do that properly they need proprietary assets to rent out.
[+] [-] Nasrudith|4 years ago|reply
[+] [-] theshrike79|4 years ago|reply
[+] [-] machiaweliczny|4 years ago|reply
[+] [-] ComodoHacker|4 years ago|reply
[+] [-] air7|4 years ago|reply
[+] [-] lazylion2|4 years ago|reply
https://studio.ai21.com/
[+] [-] sillysaurusx|4 years ago|reply
[+] [-] mdrzn|4 years ago|reply
[+] [-] qwertox|4 years ago|reply
Only by analyzing the page title (from the bookmark, not by re-fetching the url) and eventually also the domain name.
[+] [-] eunos|4 years ago|reply
[+] [-] mrbukkake|4 years ago|reply
[+] [-] micro_cam|4 years ago|reply
They can also all be adapted and fine tuned for other tasks in content classification, search, discovery etc. Think facnial recognition for topics. Want to mine a whole social network for anywhere people are talking about _______ even indirectly with very low false negative rate? You want to fine tune a transformer model.
Bert tends to get used for this more because it is freely available, established and not too expensive to fine tune but i suspect this is what microsoft licensing gpt-3 is all about.
[+] [-] crubier|4 years ago|reply
[+] [-] warning26|4 years ago|reply
https://sdtimes.com/monitor/using-gpt-3-for-root-cause-incid...
[+] [-] hubraumhugo|4 years ago|reply
[+] [-] lysecret|4 years ago|reply
[+] [-] phone8675309|4 years ago|reply
[+] [-] amelius|4 years ago|reply
Of course this will be another blow for journalists, who rely on this skill for their income.
[+] [-] 13415|4 years ago|reply
[+] [-] actually_a_dog|4 years ago|reply
[+] [-] DeathArrow|4 years ago|reply
I wonder what if somebody designs an electronic currency rewarded as payment for general GPU computations instead of just computing hashes? You pay some $, to train your model and the miner gets some coins.
Every one is happy, electricity is not wasted and the GPUs gets used for a reasonable purpose.
[+] [-] xyproto|4 years ago|reply
[+] [-] evergrande|4 years ago|reply
[+] [-] c7DJTLrn|4 years ago|reply
[+] [-] supperburg|4 years ago|reply
[+] [-] Simon321|4 years ago|reply
[+] [-] unknown|4 years ago|reply
[deleted]
[+] [-] worik|4 years ago|reply
Maybe that is a good thing, maybe a bad thing, but unless there is a breakthrough in methods this is a dead end. Impressive though.
[+] [-] WithinReason|4 years ago|reply