top | item 41183450

(no title)

I'm one of the contributors to this repo, so feel free to comment with any feedback!

The key research contribution from the related paper is that with a moderate amount of data (eg. 12M image pairs from CC12M) and a moderate amount of compute (single node of 8 A-100 GPUs for example) anyone can train a good text to image model using the unique multi scale nested u-net pipeline.

Hope this can help level the playing field for researchers everywhere.

discuss

No comments yet.