top | item 45751206

(no title)

visekr | 4 months ago

yes! although the techniques aren't perfect.

I'm using a YOLO-WORLD-XL object detection model. Lets me detect objects using text. This is the initial filter that scans for agents - once those are detected and outlined with bounding boxes the entire image and each cropped bounding box are then sent to chatgpt to confirm if the image looks legit. Once image passes those checks - I create image embeddings of each agent using CLIP and those are stored in a vector DB, and each agent is then compared to the DB and matched.

The matching system isn't perfect - but I think good enough to get the point across and can be easily tuned with more data! Happy to take suggestions here - I just spun this up over the weekend

discuss

No comments yet.