top | item 36222536

(no title)

narrationbox | 2 years ago

Since it does the signal processing in the Fourier domain, does this suffer from audio artefacts e.g. hissing in the output? Torch's inverse STFT uses Griffin-Lim which is probabilistic and if you don't train it sufficiently, you may sometimes get noise in the output.

https://pytorch.org/docs/stable/generated/torch.istft.html#t...

An alternative would be to use a vocoder network (or just target a neural speech codec like SoundStream).

discuss

order

thatsadude|2 years ago

Not all spectral methods have such artifact. The type of artifacts you mention happens when you need to do phase retrieval or try to reconstruct waveforms from melspectrogram. Deepfilternet does spectral masking on the complex spectrogram so there is no need for phase retrieval.