top | item 36443711

(no title)

ml_basics | 2 years ago

> We introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture that can process and generate text and speech with applications including speech recognition and speech-to-speech translation. AudioPaLM inherits the capability to preserve paralinguistic information such as speaker identity and intonation from AudioLM and the linguistic knowledge present only in text large language models such as PaLM-2.

Direct link to demo video showing speech-to-speech translation: https://google-research.github.io/seanet/audiopalm/examples/... (see website for more example)

discuss

No comments yet.