top | item 30022156

(no title)

alarak | 4 years ago

I don't understand how this is different from BYOL? I'd appreciate it if someone could give a small explanation.

discuss

> Similar to our work, both BYOL (Grill et al., 2020) and DINO (Caron et al., 2021) regress neural network representations of a momentum encoder, but our work differs in that it uses a masked prediction task and we regress multiple neural network layer representations instead of just the top layer which we find to be more effective. Moreover, we demonstrate that our approach works for multiple modalities.

From the related works section of the paper.