top | item 25652641

(no title)

desideratum | 5 years ago

Some truly impressive results. I'll pick my usual point here when a fancy new (generative) model comes out, and I'm sure some of the other commenters have alluded to this. The examples shown are likely from a set of well-defined (read: lots of data, high bias) input classes for the model. What would be really interesting is how the model generalizes to /object concepts/ that have yet to be seen, and which have abstract relationships to the examples it has seen. Another commenter here mentioned "red square on green square" working, but "large cube on small cube", not working. Humans are able to infer and understand such abstract concepts with very few examples, and this is something AI isn't as close to as it might seem.

discuss

order

sendtown_expwy|5 years ago

It seems unlikely the model has seen "baby daikon radishes in tutus walking dogs," or cubes made out of porcupine textures, or any other number of examples the post gives.

m3at|5 years ago

It might not have seen that specific combination, but finding an anthropomorphized radish sure is easier than I thought: type "大根アニメ" in your search engine and you'll find plenty of results

Alex3917|5 years ago

If you type in different plants and animals into GIS, you don’t even get the right species half the time. If GPT-3 has solved this problem, that would be substantially more impressive than drawing the images.

spyder|5 years ago

Yea, with these kind of generative examples, they should always include the closest matches from the training set to see how much it just "copied".

jonesn11|5 years ago

This is a spot on point. My prediction is that it wouldn't be able to. Given its difficulty to generate correct counts of glasses, it seems as though it still struggles with systematic generalization and compositionality. As a point of reference, cherrypicking aside, it could model obscure but probably well-defined baby daikon radish in tutu walking dog, but couldn't model red on green on blue cubes. Maybe more sequential perception, action, video data or system-2 like paradigm, but it remains to be seen.

adsche|5 years ago

Yes, I don't really see impressive language (i.e. GPT3) results here? It seems to morph the images of the nouns in the prompt in an aesthetically-pleasing and almost artifact-free way (very cool!).

But it does not seem 'understand' anything like some other commenters have said. Try '4 glasses on a table' and you will rarely see 4 glasses, even though that is a very well-defined input. I would be more impressed about the language model if it had a working prompt like: "A teapot that does not look like the image prompt."

I think some of these examples trigger some kind of bias, where we think: "Oh wow, that armchair does look like an avocado!" - But morphing an armchair and an avocado will almost always look like both because they have similar shapes. And it does not 'understand' what you called 'object concepts', otherwise it should not produce armchairs where you clearly cannot sit in due to the avocado stone (or stem in the flower-related 'armchairs').

ralfd|5 years ago

> I would be slightly more impressed about the language model if it had a working prompt like: "A teapot that does not look like the image prompt."

Slightly? Jesus, you guys are hard to please.

viggity|5 years ago

I'm in the open ai beta for GPT-3, and I don't see how to play with DALL-E. Did you actually try "4 glasses on a table"? If so, how? Is there a separate beta? Do you work for open ai?

hanniabu|5 years ago

Sounds like the perfect case for a new captcha system. Generate a random phrase to search an image for, show the user those results, ask them to select all images matching that description.