The original paper [0] this article is based on raises a few questions for me. It compares the authors' new technique against StableDiffusion but fails to specify which version of SD they're using for that comparison. It doesn't mention how example outputs were chosen (were they cherry-picked?). For non-square images, they seem to have specifically chosen resolutions that the other models weren't trained to output (e.g., 384 x 512) without also including ones that they were trained on (e.g., 896 x 1152). I wonder how this new technique would compare with all of that accounted for.[0] https://openaccess.thecvf.com/content/CVPR2024/papers/Haji-A...
No comments yet.