DALL-E (DALL·E ?) is a machine learning model which, given a caption and either a top portion (not "the top half" but rather some amount of top segment) or nothing, will generate an image which continues what it was given, in a way that is supposed to match the caption.
From the results they published from it, it looks quite convincing. You can do things like specify the art style, or various novel combinations of things, and it will often produce a good looking image of that thing.
One of the examples that was shown was making a chair that looks like an avocado. It produced a number of such images.
It seems rather impressive imo.
_____
A VAE is a "variational auto-encoder". aiui, a VAE is a neural net which has an encoder part, and a decoder part, where the encoder part has input which is the full thing (in this context, the picture, (or maybe small square in a picture?)), and this is also the space that the output of the decoder has.
The space for the output of the encoder is much smaller than the other side.
Err, let me rephrase that.
There's a high dimensional space, like the space of possible pictures, but you want to reduce it to a low dimensional space corresponding to the sort of pictures in your dataset,
so you have the neural net take in a picture, map it into some low dimensional space (this is the encoder), and then map it back to the original high dimensional space, trying to make it so that the output it gets out after doing that is as close as it can get to the input that went into the encoder.
Once you've gotten that working well, you can just grab random locations in the low dimensional space, and it will look like a picture of the same type that came from your dataset?
uh, I'm simplifying as a result of not knowing all the details myself.
drdeca|5 years ago
One of the examples that was shown was making a chair that looks like an avocado. It produced a number of such images.
It seems rather impressive imo.
_____
A VAE is a "variational auto-encoder". aiui, a VAE is a neural net which has an encoder part, and a decoder part, where the encoder part has input which is the full thing (in this context, the picture, (or maybe small square in a picture?)), and this is also the space that the output of the decoder has. The space for the output of the encoder is much smaller than the other side.
Err, let me rephrase that.
There's a high dimensional space, like the space of possible pictures, but you want to reduce it to a low dimensional space corresponding to the sort of pictures in your dataset, so you have the neural net take in a picture, map it into some low dimensional space (this is the encoder), and then map it back to the original high dimensional space, trying to make it so that the output it gets out after doing that is as close as it can get to the input that went into the encoder. Once you've gotten that working well, you can just grab random locations in the low dimensional space, and it will look like a picture of the same type that came from your dataset?
uh, I'm simplifying as a result of not knowing all the details myself.