hm.. I wonder which clip model they'll use. A big part of what makes DALLE-2 so good is the unreleased huge clip model. To train the diffusion prior they may need to first replicate this clip model.
Isn't the VQ-VAE/dVAE generator approach in the DALL-E models quite a bit cheaper computationally than latent diffusion models?
My understanding was that diffusion models were quite a bit more expensive, but yielded richer latent distributions and better images (for some definition of better).
p1esk|3 years ago
ALittleLight|3 years ago
teruakohatu|3 years ago
Jack000|3 years ago
nullc|3 years ago
cfcf14|3 years ago
My understanding was that diffusion models were quite a bit more expensive, but yielded richer latent distributions and better images (for some definition of better).