top | item 40346373

(no title)

bamazizi | 1 year ago

I wonder how the just announced "GPT-4o" with real-time voice impacts projects like this?

The demo on real-time multi language translation conversation blew me away!

discuss

order

kwindla|1 year ago

Here's a translation demo in Pipecat using the now ancient and arthritic GPT-4 Turbo model. :-) https://github.com/pipecat-ai/pipecat/tree/main/examples/tra...

As soon as GPT-4o audio input is available through the APIs, we'll add 4o support to Pipecat. For bidirectional real-time audio, I think they'll need to make new WebSocket or WebRTC endpoints available.

jshreder|1 year ago

Just letting you know it's available right now, just specify `gpt-4o` -- for text streaming anyway. I'd hazard a guess that the audio endpoints are open now, just not documented (like most of the last launches)...

avarun|1 year ago

Yeah, same question here.

Building pipelines for bridging LLMs and TTS and STT models with lower latency is fine and all, but when you compare to a natively multimodal model like GPT-4o it seems strictly inferior. The future is clearly voice-native models that are able to understand nuances in voice and speech patterns, and it's not exactly a distant future.