top | item 22534756

ForwardTacotron – Generating speech without attention

3 points| datitran | 6 years ago |github.com

1 comment

[+] datitran|6 years ago|reply

We've just open-sourced our first text-to-speech project! It's also our first public PyTorch project. Inspired by Microsoft's FastSpeech, we modified Tacotron (Fork from fatchord's WaveRNN) to generate speech in a single forward pass without using any attention. Hence, we call the model ⏩ ForwardTacotron.

The model has several advantages:

* Robustness: No repeats and failed attention modes for complex sentences

* Speed: Generating a spectrogram takes about 0.04s on a RTX2080

* Controllability: You can control the speed of the speech synthesis

️* Efficiency: No usage of attention so memory size grows linearly with text size

We also provide a Colab notebook to try out our pre-trained model trained 100k steps on LJSpeech and also some Samples. Check it out!

* Github: https://github.com/as-ideas/ForwardTacotron

* Samples: https://as-ideas.github.io/ForwardTacotron/

* Colab notebook: https://colab.research.google.com/github/as-ideas/ForwardTac...