top | item 37893109

(no title)

kjander79 | 2 years ago

> but how much actual understanding does it have?

That's always the question, isn't it? The article does a pretty convincing job of showing that at least in the given examples, it has pretty good "understanding" of what's taking place in the scenes and what makes them remarkable to people, and 7 years for comparison is going back a long ways, just the last 2 or 3 years has been where much of the most interesting progress has revealed itself.

Image segmentation, object detection and tracking, are all on display already here.

discuss

order

RetroTechie|2 years ago

Just speculating on how the "understanding" may come about:

In the given images above, it may be clear from context (text? tags? exif info etc?) of images in its training data, that it's unusual for people to be dragged on a rope behind a horse, very unusual / dangerous for 747 sized airplanes to fly on their side, or houses to be lying on their side on a beach. And hence, describe such a view with "unusual", "dramatic", etc. Would it even need to understand a conceptual meaning of those words? Apply label, done.

Don't people work the same, in a way? Over the years we'll rarely see a house burning in person. We see news reports of such events mentioning people dead or severely burned. So after a while that 'training set' is enough to say: "person stumbling out of a burning house = something bad happened".

Yes, humans may then reflect on how they would feel if placed in unlucky person's shoes, and rush to alleviate that person's pain. Or cringe by the thought of it.

But in the end: maybe, just maybe, what human brains do isn't so special after all? Just training data, pattern match with external input & use results to self-reflect.

(that last step not -yet- covered by GPT & co)