top | item 43475467

(no title)

That's the point. With the old models they all failed to produce a wine glass that is completley to the brim full. Because you can't find that a lot in the data they used for training.

discuss

colecut|11 months ago

Imagine if they just actually trained the model on a bunch of photographs of a full glass of wine, knowing of this litmus test

gorkish|11 months ago

I obviously have no idea if they added real or synthetic data to the training set specifically regarding the full-to-the-brim wineglass test, but I fully expect that this prompt is now compromised in the sense that because it is being discussed in the public sphere, it's has inherently become part of the test suite.

Remember the old internet adage that the fastest way to get a correct answer online is to post an incorrect one? I'm not entirely convinced this type of iterative gap finding and filling is really much different than natural human learning behavior.

HelloImSteven|11 months ago

Even if they did, I’d assume the association of “full” and this correct representation would benefit other areas of the model. I.e., there could (/should?) be general improvement for prompts where objects have unusual adjectives.

So maybe training for litmus tests isn’t the worst strategy in the absence of another entire internet of training data…

orbital-decay|11 months ago

A lot of other things are rare in datasets, let alone correctly labeled. Overturned cars (showing the underside), views from under the table, people walking on the ceiling with plausible upside down hair, clothes, and facial features etc etc

myaccountonhn|11 months ago

They still can't generate a watch that shows arbitrary times I believe, so it could be the case?

nefarious_ends|11 months ago

imagine!

sejje|11 months ago

I did coax the old models into doing it once (dall-e) but it was like a fun exercise in prompting. They definitely didn't want to.

jorvi|11 months ago

The old models were doing it correct also.

There is no one correct way to interpert 'full'. If you go to a wine bar and ask for a full glass of wine, they'll probably interpert that as a double. But you could also interpert it the way a friend would at home, which is about 2-3cm from the rim.

Personally I would call a glass of wine filled to the brim 'overfilled', not 'full'.

kalleboo|11 months ago

I think you're missing the context everyone else has - this video is where the "AI can't draw a full glass of wine" meme got traction https://www.youtube.com/watch?v=160F8F8mXlo

The prompts (some generated by ChatGPT itself, since it's instructing DALL-E behind the scenes) include phrases like "full to the brim" and "almost spilling over" that are not up to interpretation at all.

drdeca|11 months ago

People were telling the models explicitly to fill it to the brim, and the models were still producing images where it was filled to approximately the half-way point.