Imagine porting this to a dedicated app that can access the context of the open window and the text on the screen, providing an almost real-time assistant for everything you do on screen.
Automatically take a screenshot and feed it to https://github.com/vikhyat/moondream or similar? Doable. But while very impressive, the results are a bit of mixed bag (some hallucinations)
column|2 years ago
cristyansv|2 years ago
https://developer.apple.com/library/archive/samplecode/UIEle...
summarity|2 years ago