top | item 36968339

(no title)

You have to give GPT an objective, like "find an apartment in Florida" and then say something like "given the following options, which one would you interact with to get closer to your objective."

So if you assume that you start on google.com, then your options are like 1.) Input with name "search", placeholder "search anything", value "" 2.) Button with label "I'm feeling lucky" 3.) Button with label "search"

Obviously, doing just one of these doesn't achieve the objective - it just needs to pick which one it thinks has the most "value" for completing the objective. If you repeat that enough times, then it can actually do what your overall goal of the session was.

I'm just giving a simplistic answer, and if you implemented only what I've written, then it's going to get stuck in a loop more often than not. But that's the gist of how you could encode the DOM into something that GPT can interpret and make decisions/take actions based on.

discuss

TeMPOraL|2 years ago

Remember HATEOAS? I have a feeling LLMs would excel at navigating proper REST (not "RESTful") APIs - HATEOAS is, in principle, just what you did here: providing a list of possible/useful next steps along with the response.

In fact, the problem of HATEOAS is exactly what LLMs seem to be good at - inferring the interface at runtime, from dynamically received metadata. This should even be easy to try in practice today - HATEOAS can be trivially mapped to the "function calling" feature of OpenAI's GPT-3.5/GPT-4 APIs.

wruza|2 years ago

Got it, thanks!