(no title)
bartwr | 1 year ago
In images, scrambling phase yields a completely different image. A single edge will have the same spectral content as pink/brown~ish noise, but they look completely unlike one another.
bartwr | 1 year ago
In images, scrambling phase yields a completely different image. A single edge will have the same spectral content as pink/brown~ish noise, but they look completely unlike one another.
nowayno583|1 year ago
So when generating audio I think the next chunk needs to be continuous in phase to the last chunk, where in images a small discontinuity in phase would just result in a noisy patch in the image. That's why I think it should be somewhat like video models, where sudden, small phase changes from one frame to the next give that "AI graininess" that is so common in the current models
benanne|1 year ago
I have an example audio clip in there where the phase information has been replaced with random noise, so you can perceive the effect. It certainly does matter perceptually, but it is tricky to model, and small "vocoder" models do a decent job of filling it in post-hoc.
nyanpasu64|1 year ago