(no title)
andblac | 1 year ago
- "person": "get gender and age of this person in 5 words or less",
- "car": "get body type and color of this car in 5 words or less".
So YOLO gives the bounding box and rough category, while llava describes the object in more details.
No comments yet.