top | item 43969371

(no title)

adamsiem | 9 months ago

Anyone using vision to parse screenshots? QVQ was too slow. Will give this a shot.

discuss

I used molmo to parse screenshots in order to detect locations of UI elements. See the repo below. I think Omni parser from Microsoft would also work well.

https://github.com/logankeenan/george

https://github.com/microsoft/OmniParser

abrichr|9 months ago

You might be interested in https://github.com/OpenAdaptAI/OpenAdapt