top | item 42694688

(no title)

DinoV2 is an unsupervised model. It learns both a high quality global image representation and local representations with no labels. It's becoming strikingly clear that foundation models are the go to choice for common data types of natural images, text, video, and audio. The labels are effectively free, the hard part now is extracting quality from massive datasets.

discuss

No comments yet.