top | item 35919284

(no title)

zebproj | 2 years ago

> Does that mean that the similarity in sound to formant-based speech synthesis is because they're both using a sawtooth wave, noise, or other relatively simple sound as the raw input?

Essentially, yes. Both are known as "source-filter" models. A sawtooth, narrow pulse, or impulse wave is a good approximation glottal excitation for the source signal, though many articulatory speech models use a more specialized source model that's analytically derived from real waveforms produce by the glottis. The Lilencrantz-Fant Derivative Glottal Waveform model is the most common, but a few others exist.

In formant synthesis, the formant frequencies are known ahead of time and are explicitly added to the spectrum using some kind of peak filter. With waveguides, those formants are implicitly created based on the shape of the vocal tract (the vocal tract here is approximated as a series of cylindrical tubes with varying diameters).

discuss

No comments yet.