top | item 43185275

(no title)

p0deje | 1 year ago

Have you experimented with using text-only models and DOM/accessibility tree for interaction with a ? I'm currently working on the open-source test automation tool (https://alumnium.ai) and the accessibility tree w/o screenshots works pretty well as long as the website provides decent support for ARIA attributes or at least has proper HTML5 structure.

discuss

MagMueller|1 year ago

On most pages, we don't need vision, and the DOM alone is sufficient. We have not worked with the accessibility tree yet, but it's a great idea to include that. Do you have any great resources on where to get started?

p0deje|1 year ago

> On most pages, we don't need vision, and the DOM alone is sufficient.

I misunderstood looking at demo videos, it seemed like you constantly update elements with borders/IDs so I assumed that's what is then passed to vision.

> Do you have any great resources on where to get started?

A great place to start is https://chromium.googlesource.com/chromium/src/+/main/docs/a....