As soon as GPT-4o audio input is available through the APIs, we'll add 4o support to Pipecat. For bidirectional real-time audio, I think they'll need to make new WebSocket or WebRTC endpoints available.
Just letting you know it's available right now, just specify `gpt-4o` -- for text streaming anyway. I'd hazard a guess that the audio endpoints are open now, just not documented (like most of the last launches)...
Building pipelines for bridging LLMs and TTS and STT models with lower latency is fine and all, but when you compare to a natively multimodal model like GPT-4o it seems strictly inferior. The future is clearly voice-native models that are able to understand nuances in voice and speech patterns, and it's not exactly a distant future.
kwindla|1 year ago
As soon as GPT-4o audio input is available through the APIs, we'll add 4o support to Pipecat. For bidirectional real-time audio, I think they'll need to make new WebSocket or WebRTC endpoints available.
jshreder|1 year ago
avarun|1 year ago
Building pipelines for bridging LLMs and TTS and STT models with lower latency is fine and all, but when you compare to a natively multimodal model like GPT-4o it seems strictly inferior. The future is clearly voice-native models that are able to understand nuances in voice and speech patterns, and it's not exactly a distant future.
unknown|1 year ago
[deleted]