(no title)
shodai80 | 1 year ago
EX: Given a simple login form, I may not know if the label is above or below the username textbox. A password box would be below it. I have a hard time understanding the relevance to tagging without context.
Tagging is basically irrelevant to any automated task if we do not know the context. I am not trying to diminish your great work, don't get me wrong, but if you don't have context I don't see much relevance. Youre doing something that is easily scripted with xpath templates which I've done for over a decade.
awtkns|1 year ago
shodai80|1 year ago
What I am pointing here is, even data modeling is mostly irrelevant unless you want to go through every page/permutation of a page...all the while hoping the layout isn't modified or back to training all over again...which is downtime, and at some point you'll realize its just better to store user created xpath's, as its quicker to update those than retrain.
How do you reason with an LLM without going through any of the above? Automation cannot consistently have downtime for retraining, it's the antithesis for its purpose.
Let's not even get into shadow dom issues.
I am keying on your third bullet point on Github:
"How can you inform a text-only LLM about the page's visual structure?"
My questions suggest a gap in your awesome accomplishment.