(no title)
shodai80 | 1 year ago
What I am pointing here is, even data modeling is mostly irrelevant unless you want to go through every page/permutation of a page...all the while hoping the layout isn't modified or back to training all over again...which is downtime, and at some point you'll realize its just better to store user created xpath's, as its quicker to update those than retrain.
How do you reason with an LLM without going through any of the above? Automation cannot consistently have downtime for retraining, it's the antithesis for its purpose.
Let's not even get into shadow dom issues.
I am keying on your third bullet point on Github:
"How can you inform a text-only LLM about the page's visual structure?"
My questions suggest a gap in your awesome accomplishment.
KhoomeiK|1 year ago
[1] https://github.com/reworkd/tarsier/blob/main/.github/assets/...
[2] https://github.com/reworkd/tarsier/blob/main/.github/assets/...
shodai80|1 year ago