top | item 37133809

(no title)

newhouseb | 2 years ago

An example from an earlier comment of mine on a different thread (assuming I've understood correctly):

> let's say we had a grammar that had a key "healthy" with values "very_unhealthy" or "moderately_healthy." For broccoli, the LLM might intend to say "very_healthy" and choose "very" but then be pigeonholed into saying "very_unhealthy" because it's the only valid completion according to the grammar.

That said, you can use beam search to more or less solve this problem by evaluating the joint probability of all tokens in each branch of the grammar and picking the one with the highest probability (you might need some more nuance for free-form strings where the LLM can do whatever it wants and be "valid").

discuss

order

IanCal|2 years ago

This is a concern of mine, as well as limiting the amount that an LLM can talk through a problem - sometimes to nothing. Getting them to work through things IMO dramatically improves their output.

My gut feeling is that taking the output and if it's broken then start fixing it would have a better result - you could even then completely limit the output to only valid json. For your example, if it wrote "very_healthy" and was given an error message explaining that this wasn't an option it had to choose from very_unhealthy" or "moderately_healthy" I would expect a halfway decent model to pick "moderately_healthy".

This has the benefit of allowing you to use a more powerful model for reasoning (like GPT4) and a local model where you can do this kind of token probability manipulation for just fixing the data.