(no title)
jegp | 2 years ago
Regarding RWKV, someone actually trained a "SpikeGPT": https://arxiv.org/abs/2302.13939 That's a neat insight, which will be great for porting these models onto energy-efficient devices. But the learning problem is still the most interesting open question to me. If we crack that, we can scale down GPT-like models by several orders of magnitude since we can "re-learn" subproblems instead of "hardcode" a silly number of permutations, like the present models do. Neuromorphic hardware (brains included) lend themselves incredibly well to learning. We just don't know how to exploit that yet.
Regarding the optical layers, are you referring to optical chips like this one https://www.nature.com/articles/s41467-020-20719-7 ? That would be an example of using optics to implement your stateful transfer functions (https://en.wikipedia.org/wiki/Optical_neural_network), but there are several of other incredibly promising technologies such as memristors (https://en.wikipedia.org/wiki/Memristor), quantum materials (https://arxiv.org/abs/2204.01832) and even biologically based chips (https://en.wikipedia.org/wiki/Wetware_computer). My take on this is that these technologies exploit different principles of physics to "compute" in some way. But I like to think that our computational theories and principles are independent of the implementation substrates.
There's still a long way to go, but practically speaking, I'm convinced this kind of hardware will have profound consequences for the way that we compute today. We're talking at least 3 orders of magnitude in compute. Imagine ChatGPT running 1000 times as fast. It's ridiculous.
lucubratory|2 years ago
And you are so right that it would be ridiculous. It's funny, before I read your comment this morning I was watching a livestream of a comedian talking to an AI-generated character ("Slunt"). I don't know the implementation details, but it would have been something simple like an open source or commercial speech to text program, maybe even Whisper, then the text passed through to the OpenAI API (probably GPT-4) with a prompt wrapper to set up the character and setting, then the response received and generated with ElevenLabs or something like that. I can't link it as it was a livestream, but it was the same sort of thing that was used to make this demo: https://youtu.be/u_Zn89_g7ok. Anyway, the whole time I was watching her talk to this AI character, it was taking quite a while to respond, taking a while to recognise her voice, etc etc, and I was thinking about what would be required for that interface to be truly conversational. It's just speed. If it was running even ten times faster, that would be closer, but a hundred times faster is probably what would be required to have a genuinely conversational interface. What you need is for your voice to be recognised and converted to text basically instantly, then have the LLM go over it and respond basically instantly, then have the TTS program start saying it basically instantly as well - and have the software wrapper ready to hear you if you interrupt it and respond to that interruption appropriately, or to interrupt you if it has something to add. That's what would be required for it to truly be natural conversation, because that's the only way you can interrupt it or be interrupted by it in the way that humans do when talking to each other, with the sort of responsiveness that makes it fluid rather than like an intercontinental phone call. I don't think we're going to get that sort of performance improvement any time soon by just continuing to scale regular silicon or ASICs. We need new, specific hardware. And I know a lot of people might think of the conversational ease of use as not really important, not compared to the capabilities. There's an element of truth to that. But here's the thing: ChatGPT was primarily a UX/UI invention, not a technological one, and ChatGPT is what has driven this insane amount of interest, new use cases, and hype. GPT-3 was nearly as powerful, it was just much more clunky and with various other factors that meant you couldn't just go and use it. Making it easier to use was what made it so much more valuable to people that they actually wanted to use it for their problems. And it will go a long way beyond just making it conversational, too. The 2020s are going to be an absurd decade and we're not even halfway through.