It’s been about a year since I looked into this sort of thing, but molmo will give you x,y coordinates. I hacked together a project about it. I also think Microsoft’s omniparser is good at finding coordinates too.https://huggingface.co/allenai/Molmo-7B-D-0924
https://github.com/logankeenan/george
https://github.com/microsoft/OmniParser
chhxdjsj|2 months ago