(no title)
jlukecarlson | 1 year ago
The key research contribution from the related paper is that with a moderate amount of data (eg. 12M image pairs from CC12M) and a moderate amount of compute (single node of 8 A-100 GPUs for example) anyone can train a good text to image model using the unique multi scale nested u-net pipeline.
Hope this can help level the playing field for researchers everywhere.
No comments yet.