Talk = GPT-2 and Whisper and WASM

[+] atum47|3 years ago|reply

    > whisper: number of tokens: 2, 'Hello?'
    > gpt-2: I want to have you on my lap.

this GPT-2 better chill

[+] tomthe|3 years ago|reply

This would of course be even more fun with ChatGPT, but it is a nice and funny demo of their whisper.cpp library. The second video is worth watching: https://user-images.githubusercontent.com/1991296/202914175-...

[+] dr_kiszonka|3 years ago|reply

I think LaMBDA would be really fun. If you asked ChatGPT what movies it likes, it would tell that it is a large language model trained by OpenAI and it can't have opinions yada yada yada.

[+] sheeeep86|3 years ago|reply

It's interesting that the english language model is loaded and it's clearly trying to pronounce things in a spanish way.

[+] swyx|3 years ago|reply

    The total data that the page will have to load on startup (probably using Fetch API) is:
    - 74 MB for the Whisper tiny.en model
    - 240 MB for the GPT-2 small model
    - Web Speech API is built-in in modern browsers

cool but im now wondering what it would take to bring this down enough to put this in real apps? anyone talking about this?

[+] agolio|3 years ago|reply

I really liked how the page tells you the size it is planning to download, and prompts you before downloading.

Coming from a limited bandwidth contract, I hate when I click a link and it instantly starts downloading a huge file.

Great work OP!

[+] arcturus17|3 years ago|reply

~314mb is a lot for a web app but small for a desktop or even a mobile app.

[+] CGamesPlay|3 years ago|reply

Unfortunately these smaller models are also terrible at performance, particularly the GPT-2 model small model is really unsuitable for the task of generating text. The largest models publicly available, which are nowhere near GPT-3 Da Vinci level, are tens of GBs.

We may be able to reduce the size without sacrificing performance, but that's an area of active research still.

[+] addandsubtract|3 years ago|reply

We can bring back pre-loading screens for webpages from the Web 2.0 era.

[+] justanotheratom|3 years ago|reply

Perhaps it will be built-in to browsers soon

[+] fulafel|3 years ago|reply

Lots of web based apps load more data than this. The 300 MB is only 3 seconds on a gigabit connection.

[+] make3|3 years ago|reply

in real life the models are hosted on a server and you send the text and sound and receive the model's output

[+] Terretta|3 years ago|reply

Listening to that demo, it's incredible how far we've come!

Or, not.

Racter was commercially released for Mac in December 1985:

Racter strings together words according to "syntax directives", and the illusion of coherence is increased by repeated re-use of text variables. This gives the appearance that Racter can actually have a conversation with the user that makes some sense, unlike Eliza, which just spits back what you type at it. Of course, such a program has not been written to perfection yet, but Racter comes somewhat close.

Since some of the syntactical mistakes that Racter tends to make cannot be avoided, the decision was made to market the game in a humorous vein, which the marketing department at Mindscape dubbed "tongue-in-chip software" and "artificial insanity".

https://www.mobygames.com/game/macintosh/racter

https://www.myabandonware.com/game/racter-4m/play-4m

It's only amazing that chatGPT backed by GPT-3 is the first thing since then to do enough better that everyone is engaged.

I owned that in 1985, and having studied AI/ML previously I've been (and remain something of) an AGI skeptic. But now in 2022, I finally think “this changes everything” ... not because it's AI, but because it's making the application of matching probabilistic patterns across mass knowledge practical and useful for everyday work, particularly as a structured synthesis assistant.

[+] make3|3 years ago|reply

GPT-2 is really by far massively stronger than anything in 1985. I suggest that you try using https://chat.openai.com/chat

[+] Centigonal|3 years ago|reply

well, the AI Winter happened in the intervening years, so that might help explain

https://en.wikipedia.org/wiki/AI_winter

[+] Rickvst|3 years ago|reply

I implemented whisper + chatgpt + pyttsx3 and it worked. But then suddenly the chatgpt wrapper that I found on github stopped working.

edit: whisper is awesome

[+] localhost|3 years ago|reply

It looks like the ChatGPT APIs that work well are the ones that are implemented as a browser extension and reusing the bearer token that you get by signing into ChatGPT from the same browser. I'm guessing since you're using pyttsx3 that you wrote a Python app instead and not in the browser?

[+] lhuser123|3 years ago|reply

Cool. Would like to see that.

[+] rahimnathwani|3 years ago|reply

I'm curious how they chose between:

A) ggml https://github.com/ggerganov/ggml/tree/master/examples/gpt-2

B) Fabrice Bellard's GPT2C https://bellard.org/libnc/gpt2tc.html

[+] ggerganov|3 years ago|reply

Hey author here - I implemented `ggml` as a learning exercise. It allows me to easily port it to WebAssembly or iOS for example.

[+] iandanforth|3 years ago|reply

Technically this seems to work, and mad props to the author for getting to this point. On my computer (MacBook Pro) it's very slow but there are enough visual hints that it's thinking to make the wait ok. I have plenty of complaints about the output but most of that is GPT-2's problem.

[+] boredemployee|3 years ago|reply

offtopic but what are the real limitations of gpt2 vs gpt3? (i know that gpt2 is free)

[+] zwaps|3 years ago|reply

It's almost the same model architecture, but GPT3 is much better trained. GPT3 is coherent, while GPT2 is prone to generating gibberish or getting stuck in a loop. The advantage is pretty significant for longer generations.

That being said, neither GPT3 nor GPT2 are "efficient" models.

On the one hand, they use inefficient architectures - starting with using a BPE Tokenizer, to having dense attention without any modifications, to being a decoder only architecture etc. Research has come up with many more fancy ideas on how to make all this run better and with less compute. But there is a reason why GPT2/3 are architecturally simple and inefficient: we know how to train these models reliably (more or less) on thousands of GPUs, whereas the same might not be true for more modern and efficient implementations. For instance, when training OPT, Facebook started using more fancy ideas but finally ended up going back to GPT-3 esque basics, simply because training on thousands of machines is a lot harder than it seems in theory.

On the other hand, these models have far too many parameters compared to the data they were trained on. You might say they are undertrained - or they lean heavily on available compute to make up for missing data. In any case, much smaller models (like Chinchilla by DeepMind) match their performance with less parameters (and hence compute or model size) by using more and better data.

In closing, there are better models for edge devices. This includes GPT clones like GPT-J in 8bit, or distilled version thereof. Similarly, there is still a lot of gains that will happen when all the numerous efficiency improvements get implemented in a model that operates at the data/parameter efficiency frontier.

Still, even when considering efficient models like Chinchilla and then even more architecturally efficient versions thereof - we are still talking about a lot of $$$ to train these models. And so we are yet further from having OpenSource implementations of these models than we are from someone (like DeepMind) having them...

With time, you can expect to run coherent models on your edge device. But not quite yet.

[+] mcbuilder|3 years ago|reply

Size of the model is a big one. GPT-3 has over 10x as many parameters for example. Training data would be another huge one. Architecturally, they aren't that different if I recall correctly, it's a decoder stack of transformer like self-attention. Real world capability has GPT-3 giving much better answers, it was a big step up from GPT-2.

[+] bilater|3 years ago|reply

I've been thinking of doing something like this but hooked up with ChatGPT/GPT-3-daviinci003. Obviously model will not load in the browser but we cna call the API. Could be a neat way to interact with the bot.

[+] simonw|3 years ago|reply

Anyone found a sentence that GPT-2 returns a good response for? My experiments have been not great so far.

(LOVE this demo.)

[+] hanoz|3 years ago|reply

What are some good things to try? I can't get any sense out of it at all so far.

[+] ggerganov|3 years ago|reply

This is the smallest GPT-2 model so it usually generates gibberish. Maybe some better prompting could improve the results.

Currently, the strategy is to simply prepend 8 lines of text (prompt/context) and keep appending every new transcribed line at the end:

https://github.com/ggerganov/whisper.cpp/blob/master/example...

[+] thundergolfer|3 years ago|reply

This guy's doing really great work recently. Keep it up, Georgi!

50 comments