But its too expensive to become practical with the OpenAI API. Also, demo is cool until you see the real-world webpages, then you'll realize that this only works less than %50 of webpages.
GPT-4V may be surprisingly robust here. Set of mark prompting(which is accomplished here with Vim) improves grounding by a silly high amount.
https://som-gpt4v.github.io/
famouswaffles|2 years ago