Contrastive Self-Supervised Learning

fxtentacle|6 years ago

They kind of slip it under the rug that for the PASCAL VOC tests, unsupervised was only used as pre-training and then followed by supervised training before evaluation. That's the difference between "this course will teach you Spanish" and "this course is a good preparation to do before you start your actual Spanish course".

Also, while it is laudable that they attempt to learn slow higher-level features, the result of contrastive loss functions is still very much detail-focussed, it just is so in a translationally invariant way.

A common problem for image classification is that the AI will learn to recognize high-level fur patterns, as opposed to learning the shape of the animal. Using contrastive loss terms like in their example will drive the network towards having the same features vector for adjacent pixels, meaning that the fur pattern detector needs to become translation-invariant. But the contrastive loss term will NOT prevent the network from recognizing the fur, rather than the shape, as is claimed in this article.

ankeshanand|6 years ago

Sorry if it wasn't clear, I do mention the linear classification protocol several times in the post. If you want to evaluate performance on a classification task, you have to show it labels during evaluation, otherwise it's an impossible task. Note that the encoder is freezed during evaluation, and only a linear classifier is trained on top. Now, even when evaluated on a limited set of labels (as low as 1%), contrastive pretraining outperforms purely supervised training by a large margin (check out Figure 1 in the Data-Efficient CPC paper: https://arxiv.org/abs/1905.09272.

I did not get the second part unfortunately, could you elaborate more and clarify if you are talking about a specific paper?

jph00|6 years ago

There's a lot to like in this article, but I don't quite agree with the setup. I think it's better to think of "contrastive" approaches as being orthogonal to basic self-supervised learning methods - they represent an additional piece you can add to your loss function that results in very significant improvements. This approach can be combined with existing self-supervised pretext tasks.

I've discussed these ideas here, for those that are interested in learning more: https://www.fast.ai/2020/01/13/self_supervised/

jph00|6 years ago

BTW, one thing which makes it a bit hard to get into self-supervised learning is that the most common benchmarking task involves pretraining on Imagenet, which is too slow and expensive for development.

I recently created a little dataset that is specifically designed to allow for testing out self-supervised techniques, called Image网 ("Imagewang"). I'd love to see some folks try it out, and submit strong baselines to the leaderboard: https://github.com/fastai/imagenette#image%E7%BD%91

ankeshanand|6 years ago

I didn't mean to convey that we should abandon generative self-supervised methods, but I can see how comparing them gives that impression.

Agree that using them in conjunction would make sense, since generative methods could capture some features better and vice versa.

allovernow|6 years ago

Great post. For an ML engineer HN can be a goldmine sometimes! I've gotten a bunch of ideas for work from submissions. The pace at which ML is expanding is phenomenal. No doubt in part thanks to the open nature of arxiv. As the sum of so many centuries of achievement, it really makes me proud to be human...and I'm excited to watch as it changes the world.

bobosha|6 years ago

Great write up, I especially liked the section of Contrastive Predictive Coding, I think that's going to be the next iteration of ML.

p1esk|6 years ago

What’s the current iteration of ML?

pequalsnp|6 years ago

I hadn’t heard of this before. Cool. Going to share this with my team on Monday.

13 comments