top | item 40371547

(no title)

shodai80 | 1 year ago

Provided screenshots below do not show textboxes, selects, or other input nodes with labels. Show me text output with associated labels for inputs being correct and I will be shocked.

discuss

order

KhoomeiK|1 year ago

They do show textboxes with labels. From our readme:

"Keep in mind that Tarsier tags different types of elements differently to help your LLM identify what actions are performable on each element. Specifically:

[#ID]: text-insertable fields (e.g. textarea, input with textual type)

[@ID]: hyperlinks (<a> tags)

[$ID]: other interactable elements (e.g. button, select)

[ID]: plain text (if you pass tag_text_elements=True)"

Do you see the search boxes labeled [#4] and [#5] at the top? And before you say that the tag is on a different line from the placeholder text—yes, and our agent is smart enough to handle that minor idiosyncrasy. Are you shocked? :)

shodai80|1 year ago

#4 and #5 are using placeholder attributes, and the text itself is contained within the node. Show me a simple form with labels external of an input node, then rearrange the labels to be some above and some below, and I will be shocked! No placeholders. Label must be its own 'text' node.

Edit: I do not intend to come off as negative or disparaging - I already discussed this with some OS projects I work on as well as internally at work. You guys did something great, and I am just trying to point out gaps that could take it from great to unbelievable.

miki123211|1 year ago

This problem isn't that hard, screen readers had to handle this exact issues for years. Inaccessible websites where the labels aren't properly associated with their respective form fields do exist, but aren't that common.

shodai80|1 year ago

Yes if they are associated with accessibility attributes (Aria). Many, many sites including massive B2B do not do this (a shame). So no, you are seriously minimizing the problem. This approach would also be architecturally poorly thought out - The solution needs to not depend upon aria, nor any other non-global approach (Which this solution does so far).

Everything shown to me so far has been a solvable problem by scripts/xpath template/creation logic. I've handled all of this for over 10 years with one script. When I see it finding everything and associating them with correct external labels, then they have something. Otherwise I am concluding it non-functional and a long since solved problem where ML is over-engineering.