(no title)
picozeta | 3 years ago
In my experience (CNN based imagery segmentation) proven architectures (e.g. U-Net) performed similar with or without fine-tuning existing models (that have been mostly trained on imagenet, citiscapes, etc.) IF the domain was rather different.
At least in the field of imagery segmentation there is not much of a point in fine-tuning an off-the-shelf model on let's say medical imagery.
So maybe it's the same for the stable diffusion model. I don't see how some knowledge about the relationship between the prompt and given imagery describing that prompt should help this model map the prompt to a spectrogram of the given prompt.
No comments yet.