top | item 43969371 (no title) adamsiem | 9 months ago Anyone using vision to parse screenshots? QVQ was too slow. Will give this a shot. discuss order hn newest logankeenan|9 months ago I used molmo to parse screenshots in order to detect locations of UI elements. See the repo below. I think Omni parser from Microsoft would also work well.https://github.com/logankeenan/georgehttps://github.com/microsoft/OmniParser abrichr|9 months ago You might be interested in https://github.com/OpenAdaptAI/OpenAdapt
logankeenan|9 months ago I used molmo to parse screenshots in order to detect locations of UI elements. See the repo below. I think Omni parser from Microsoft would also work well.https://github.com/logankeenan/georgehttps://github.com/microsoft/OmniParser
logankeenan|9 months ago
https://github.com/logankeenan/george
https://github.com/microsoft/OmniParser
abrichr|9 months ago