I would think that by comparison to image models synthetic data would be relatively easy to generate for audio model training. I’m curious then why it continues to be so difficult to build a nearly flawless audio separation model. Is synthetic data being widely used? Is it just too hard of a problem to train even with this data? I don’t have a good sense of what the most challenging aspects are of audio models.
herogary|1 year ago