That's the point. With the old models they all failed to produce a wine glass that is completley to the brim full. Because you can't find that a lot in the data they used for training.
I obviously have no idea if they added real or synthetic data to the training set specifically regarding the full-to-the-brim wineglass test, but I fully expect that this prompt is now compromised in the sense that because it is being discussed in the public sphere, it's has inherently become part of the test suite.
Remember the old internet adage that the fastest way to get a correct answer online is to post an incorrect one? I'm not entirely convinced this type of iterative gap finding and filling is really much different than natural human learning behavior.
Even if they did, I’d assume the association of “full” and this correct representation would benefit other areas of the model. I.e., there could (/should?) be general improvement for prompts where objects have unusual adjectives.
So maybe training for litmus tests isn’t the worst strategy in the absence of another entire internet of training data…
A lot of other things are rare in datasets, let alone correctly labeled. Overturned cars (showing the underside), views from under the table, people walking on the ceiling with plausible upside down hair, clothes, and facial features etc etc
There is no one correct way to interpert 'full'. If you go to a wine bar and ask for a full glass of wine, they'll probably interpert that as a double. But you could also interpert it the way a friend would at home, which is about 2-3cm from the rim.
Personally I would call a glass of wine filled to the brim 'overfilled', not 'full'.
I think you're missing the context everyone else has - this video is where the "AI can't draw a full glass of wine" meme got traction https://www.youtube.com/watch?v=160F8F8mXlo
The prompts (some generated by ChatGPT itself, since it's instructing DALL-E behind the scenes) include phrases like "full to the brim" and "almost spilling over" that are not up to interpretation at all.
People were telling the models explicitly to fill it to the brim, and the models were still producing images where it was filled to approximately the half-way point.
colecut|11 months ago
gorkish|11 months ago
Remember the old internet adage that the fastest way to get a correct answer online is to post an incorrect one? I'm not entirely convinced this type of iterative gap finding and filling is really much different than natural human learning behavior.
HelloImSteven|11 months ago
So maybe training for litmus tests isn’t the worst strategy in the absence of another entire internet of training data…
orbital-decay|11 months ago
myaccountonhn|11 months ago
nefarious_ends|11 months ago
sejje|11 months ago
jorvi|11 months ago
There is no one correct way to interpert 'full'. If you go to a wine bar and ask for a full glass of wine, they'll probably interpert that as a double. But you could also interpert it the way a friend would at home, which is about 2-3cm from the rim.
Personally I would call a glass of wine filled to the brim 'overfilled', not 'full'.
kalleboo|11 months ago
The prompts (some generated by ChatGPT itself, since it's instructing DALL-E behind the scenes) include phrases like "full to the brim" and "almost spilling over" that are not up to interpretation at all.
drdeca|11 months ago