I just did a captcha the other day that asked the user to select which items can fit inside the sample item (which was a handbag). You'd think that a multimodal deep learning model could figure out what objects fit inside other objects if it's going to cure cancer or whatever, but no I'm assuming that it needs to be taught explicitly.
Garlef|6 months ago
So size/scale is not as easy a concept to model in our minds as we might assume.
https://m.youtube.com/watch?v=OtngSHtz-cc
Nextgrid|6 months ago