top | item 45541622 (no title) cellis | 4 months ago Could this be used to train a text -> audio model? I'm thinking of an architecture that uses RVQ. Would RVQ still be necessary? discuss order hn newest diyer22|4 months ago I believe DDN is capable of handling TTS (text-to-speech) tasks, because with the text condition, the generation space is significantly reduced.And it's recommended to combine it with an autoregressive model (GPT) for more powerful modeling capabilities.
diyer22|4 months ago I believe DDN is capable of handling TTS (text-to-speech) tasks, because with the text condition, the generation space is significantly reduced.And it's recommended to combine it with an autoregressive model (GPT) for more powerful modeling capabilities.
diyer22|4 months ago
And it's recommended to combine it with an autoregressive model (GPT) for more powerful modeling capabilities.