(no title)
EMM_386 | 1 month ago
Are you sure about that?
Try "claude --chrome" with the CLI tool and watch what it does in the web browser.
It takes screenshots all the time to feed back into the multimodal vision and help it navigate.
It can look at the HTML or the JavaScript but Claude seems to find it "easier" to take a screenshot to find out what exactly is on the screen. Not parse the DOM.
So I don't know how Cowork does this, but there is no reason it couldn't be doing the same thing.
dalenw|1 month ago
And I do know there are ways to hide data like watermarks in images but I do not know if that would be able to poison an AI.
yencabulator|1 month ago
https://cacm.acm.org/news/when-images-fool-ai-models/
https://arxiv.org/abs/2306.13213