I think LaMBDA would be really fun. If you asked ChatGPT what movies it likes, it would tell that it is a large language model trained by OpenAI and it can't have opinions yada yada yada.
The total data that the page will have to load on startup (probably using Fetch API) is:
- 74 MB for the Whisper tiny.en model
- 240 MB for the GPT-2 small model
- Web Speech API is built-in in modern browsers
cool but im now wondering what it would take to bring this down enough to put this in real apps? anyone talking about this?
Unfortunately these smaller models are also terrible at performance, particularly the GPT-2 model small model is really unsuitable for the task of generating text. The largest models publicly available, which are nowhere near GPT-3 Da Vinci level, are tens of GBs.
We may be able to reduce the size without sacrificing performance, but that's an area of active research still.
Listening to that demo, it's incredible how far we've come!
Or, not.
Racter was commercially released for Mac in December 1985:
Racter strings together words according to "syntax directives", and the illusion of coherence is increased by repeated re-use of text variables. This gives the appearance that Racter can actually have a conversation with the user that makes some sense, unlike Eliza, which just spits back what you type at it. Of course, such a program has not been written to perfection yet, but Racter comes somewhat close.
Since some of the syntactical mistakes that Racter tends to make cannot be avoided, the decision was made to market the game in a humorous vein, which the marketing department at Mindscape dubbed "tongue-in-chip software" and "artificial insanity".
It's only amazing that chatGPT backed by GPT-3 is the first thing since then to do enough better that everyone is engaged.
I owned that in 1985, and having studied AI/ML previously I've been (and remain something of) an AGI skeptic. But now in 2022, I finally think “this changes everything” ... not because it's AI, but because it's making the application of matching probabilistic patterns across mass knowledge practical and useful for everyday work, particularly as a structured synthesis assistant.
It looks like the ChatGPT APIs that work well are the ones that are implemented as a browser extension and reusing the bearer token that you get by signing into ChatGPT from the same browser. I'm guessing since you're using pyttsx3 that you wrote a Python app instead and not in the browser?
Technically this seems to work, and mad props to the author for getting to this point. On my computer (MacBook Pro) it's very slow but there are enough visual hints that it's thinking to make the wait ok. I have plenty of complaints about the output but most of that is GPT-2's problem.
It's almost the same model architecture, but GPT3 is much better trained. GPT3 is coherent, while GPT2 is prone to generating gibberish or getting stuck in a loop. The advantage is pretty significant for longer generations.
That being said, neither GPT3 nor GPT2 are "efficient" models.
On the one hand, they use inefficient architectures - starting with using a BPE Tokenizer, to having dense attention without any modifications, to being a decoder only architecture etc. Research has come up with many more fancy ideas on how to make all this run better and with less compute.
But there is a reason why GPT2/3 are architecturally simple and inefficient: we know how to train these models reliably (more or less) on thousands of GPUs, whereas the same might not be true for more modern and efficient implementations. For instance, when training OPT, Facebook started using more fancy ideas but finally ended up going back to GPT-3 esque basics, simply because training on thousands of machines is a lot harder than it seems in theory.
On the other hand, these models have far too many parameters compared to the data they were trained on. You might say they are undertrained - or they lean heavily on available compute to make up for missing data. In any case, much smaller models (like Chinchilla by DeepMind) match their performance with less parameters (and hence compute or model size) by using more and better data.
In closing, there are better models for edge devices. This includes GPT clones like GPT-J in 8bit, or distilled version thereof. Similarly, there is still a lot of gains that will happen when all the numerous efficiency improvements get implemented in a model that operates at the data/parameter efficiency frontier.
Still, even when considering efficient models like Chinchilla and then even more architecturally efficient versions thereof - we are still talking about a lot of $$$ to train these models. And so we are yet further from having OpenSource implementations of these models than we are from someone (like DeepMind) having them...
With time, you can expect to run coherent models on your edge device. But not quite yet.
Size of the model is a big one. GPT-3 has over 10x as many parameters for example. Training data would be another huge one. Architecturally, they aren't that different if I recall correctly, it's a decoder stack of transformer like self-attention.
Real world capability has GPT-3 giving much better answers, it was a big step up from GPT-2.
I've been thinking of doing something like this but hooked up with ChatGPT/GPT-3-daviinci003. Obviously model will not load in the browser but we cna call the API. Could be a neat way to interact with the bot.
[+] [-] atum47|3 years ago|reply
[+] [-] tomthe|3 years ago|reply
[+] [-] dr_kiszonka|3 years ago|reply
[+] [-] sheeeep86|3 years ago|reply
[+] [-] swyx|3 years ago|reply
[+] [-] agolio|3 years ago|reply
Coming from a limited bandwidth contract, I hate when I click a link and it instantly starts downloading a huge file.
Great work OP!
[+] [-] arcturus17|3 years ago|reply
[+] [-] CGamesPlay|3 years ago|reply
We may be able to reduce the size without sacrificing performance, but that's an area of active research still.
[+] [-] addandsubtract|3 years ago|reply
[+] [-] justanotheratom|3 years ago|reply
[+] [-] fulafel|3 years ago|reply
[+] [-] make3|3 years ago|reply
[+] [-] Terretta|3 years ago|reply
Or, not.
Racter was commercially released for Mac in December 1985:
Racter strings together words according to "syntax directives", and the illusion of coherence is increased by repeated re-use of text variables. This gives the appearance that Racter can actually have a conversation with the user that makes some sense, unlike Eliza, which just spits back what you type at it. Of course, such a program has not been written to perfection yet, but Racter comes somewhat close.
Since some of the syntactical mistakes that Racter tends to make cannot be avoided, the decision was made to market the game in a humorous vein, which the marketing department at Mindscape dubbed "tongue-in-chip software" and "artificial insanity".
https://www.mobygames.com/game/macintosh/racter
https://www.myabandonware.com/game/racter-4m/play-4m
It's only amazing that chatGPT backed by GPT-3 is the first thing since then to do enough better that everyone is engaged.
I owned that in 1985, and having studied AI/ML previously I've been (and remain something of) an AGI skeptic. But now in 2022, I finally think “this changes everything” ... not because it's AI, but because it's making the application of matching probabilistic patterns across mass knowledge practical and useful for everyday work, particularly as a structured synthesis assistant.
[+] [-] make3|3 years ago|reply
[+] [-] Centigonal|3 years ago|reply
https://en.wikipedia.org/wiki/AI_winter
[+] [-] Rickvst|3 years ago|reply
edit: whisper is awesome
[+] [-] localhost|3 years ago|reply
[+] [-] lhuser123|3 years ago|reply
[+] [-] rahimnathwani|3 years ago|reply
A) ggml https://github.com/ggerganov/ggml/tree/master/examples/gpt-2
B) Fabrice Bellard's GPT2C https://bellard.org/libnc/gpt2tc.html
[+] [-] ggerganov|3 years ago|reply
[+] [-] iandanforth|3 years ago|reply
[+] [-] boredemployee|3 years ago|reply
[+] [-] zwaps|3 years ago|reply
That being said, neither GPT3 nor GPT2 are "efficient" models.
On the one hand, they use inefficient architectures - starting with using a BPE Tokenizer, to having dense attention without any modifications, to being a decoder only architecture etc. Research has come up with many more fancy ideas on how to make all this run better and with less compute. But there is a reason why GPT2/3 are architecturally simple and inefficient: we know how to train these models reliably (more or less) on thousands of GPUs, whereas the same might not be true for more modern and efficient implementations. For instance, when training OPT, Facebook started using more fancy ideas but finally ended up going back to GPT-3 esque basics, simply because training on thousands of machines is a lot harder than it seems in theory.
On the other hand, these models have far too many parameters compared to the data they were trained on. You might say they are undertrained - or they lean heavily on available compute to make up for missing data. In any case, much smaller models (like Chinchilla by DeepMind) match their performance with less parameters (and hence compute or model size) by using more and better data.
In closing, there are better models for edge devices. This includes GPT clones like GPT-J in 8bit, or distilled version thereof. Similarly, there is still a lot of gains that will happen when all the numerous efficiency improvements get implemented in a model that operates at the data/parameter efficiency frontier.
Still, even when considering efficient models like Chinchilla and then even more architecturally efficient versions thereof - we are still talking about a lot of $$$ to train these models. And so we are yet further from having OpenSource implementations of these models than we are from someone (like DeepMind) having them...
With time, you can expect to run coherent models on your edge device. But not quite yet.
[+] [-] mcbuilder|3 years ago|reply
[+] [-] bilater|3 years ago|reply
[+] [-] simonw|3 years ago|reply
(LOVE this demo.)
[+] [-] hanoz|3 years ago|reply
[+] [-] ggerganov|3 years ago|reply
Currently, the strategy is to simply prepend 8 lines of text (prompt/context) and keep appending every new transcribed line at the end:
https://github.com/ggerganov/whisper.cpp/blob/master/example...
[+] [-] thundergolfer|3 years ago|reply