(no title)
lukasga | 3 years ago
Particularly relevant to the noisy training you mentioned earlier is their alternative timestep sampling procedure they propose which seem to reduce gradient noise significantly judging from their experiments. Would love to hear or discuss if you have found any other design changes which have improved training / sample qualities :)
radarsat1|3 years ago
Some of the results I've had have been from trying to apply it using 1D unets (also audio). I am getting slightly better results now using larger (and more standard) 2D unets but it's really taking a long time to train, especially given that I'm still experimenting with a subset of my data.
I'm beginning to suspect that because it's learning to predict very small signal residuals, improvement in output quality is very incremental in a way that is not directly correlated to the size or nature of the dataset. Like, even if I just train it on sinusoids it takes a really long time improve. (compared to a GAN approach). None of these conclusions are very formal mind you, would love to hear this confirmed. The training dynamics just seem very different from what I am used to with either MSE or discriminative loss.