We're doing a lot with the realtime models. Happy to see a new release.
Initial feel from a few calls is that it seems to perform better with alphanumeric inputs. Voice seems consistent. Recognition on a few tests seems to be somewhat better, especially did much better on the two 8-bit 8-kHz mulaw calls I tried.
It does still struggle a bit with some specifics in other languages (e.g., that the Dutch/German pronunciation of 53 'fifty-three' is effectively 'three-and-fifty').
beklein|5 days ago
- $4 input, $0.4 cached input, $16 output
- 32,000 context window
- 4,096 max output tokens
- Sep 30, 2024 knowledge cutoff
Love the models, speed, and capabilities. Just sad that they are not getting the publicity and adoption right now, but hopefully in the future.
hectormalot|6 days ago
Initial feel from a few calls is that it seems to perform better with alphanumeric inputs. Voice seems consistent. Recognition on a few tests seems to be somewhat better, especially did much better on the two 8-bit 8-kHz mulaw calls I tried.
It does still struggle a bit with some specifics in other languages (e.g., that the Dutch/German pronunciation of 53 'fifty-three' is effectively 'three-and-fifty').