top | item 39178433 (no title) pyryt | 2 years ago Knowing when to speak is actually a prediction task in itself. See eg https://arxiv.org/abs/2010.10874Would be indeed great to get something like this integrated with whisper, LLM and TTS discuss order hn newest zachthewf|2 years ago Hard for me to imagine that this could be solved in text space. I think the prediction task needs to be done on the audio. stiffler01|2 years ago We thought about doing this in Whisper itself, since its already working in the audio space. stiffler01|2 years ago Yes, this is something we want to look into in more detail, really appreciate sharing the research.
zachthewf|2 years ago Hard for me to imagine that this could be solved in text space. I think the prediction task needs to be done on the audio. stiffler01|2 years ago We thought about doing this in Whisper itself, since its already working in the audio space.
stiffler01|2 years ago We thought about doing this in Whisper itself, since its already working in the audio space.
stiffler01|2 years ago Yes, this is something we want to look into in more detail, really appreciate sharing the research.
zachthewf|2 years ago
stiffler01|2 years ago
stiffler01|2 years ago