top | item 45926690

(no title)

efavdb | 3 months ago

Are you suggesting use the clip embedding for the text as a feature to train a standard Ml model on?

discuss

order

daemonologist|3 months ago

I think they're suggesting doing that with BERT for text and CLIP for images. Which in my experience is indeed quite effective (and easy/fast).

There have been some developments in the image-of-text/other-than-photograph area though recently. From Meta (although they seem unsure of what exactly their AI division is called): https://arxiv.org/abs/2510.05014 and Qihoo360: https://arxiv.org/abs/2510.27350 for instance.

PaulHoule|3 months ago

I think he is. I do things like that plenty.