top | item 20823438

(no title)

There's a few recent papers actually that show minor improvements by integrating LPC prediction into deep methods ([0], [1]). In my experience (some of which comes from reproducing these, some of which comes from my own experiments), this isn't actually too useful, at at most offers a minor modeling benefit.

The main difference between something like Festival and what we have now is the amount of domain-specific engineering. (This is generally the promise of deep learning -- replace hand-engineered features with simple-to-understand features and a deep model.) If you go and read the Festival manual, you're going to find tons of domain-specific rules and heuristics and subroutines; for example, there's a page on writing letter to sound rules as a grammar [2]. Nowadays, we may have a pipeline that resembles Festival at the high level, but each step of the pipeline is learned as a deep model from data rather than being carefully hand-engineered by many people over the course of years. This yields much more fluid speech as well as much, much faster iteration and experimentation times, leading to faster progress as well.

[0] https://arxiv.org/abs/1811.11913

[1] https://people.xiph.org/~jm/demo/lpcnet/

[2] http://www.festvox.org/docs/manual-2.4.0/festival_13.html#Le...

discuss

No comments yet.