This isn’t true, the quality of images generated by DALL-E are really good, but they are an incremental improvement and based on a long chain of prior work. See e.g. https://github.com/CompVis/latent-diffusion
Also Make-A-Scene, which in some ways is noticeably better than DALL-E 2 (faces, editing & control of layout through semantic segmentation conditioning): https://arxiv.org/abs/2203.13131#facebook
gwern|3 years ago