top | item 40509395

(no title)

yaantc | 1 year ago

> [...] the standard way to get structured output seems to be to retry the query until the stochastic language model produces expected output.

No, that would be very inefficient. At each token generation step, the LLM provides a likelihood for all the defined token based on the past context. The structured output is defined by a grammar, which defines the legal tokens for the next step. You can then take the intersection of both (ignore any token not allowed by the grammar), and then select among the authorized token based on the LLM likelihood for them in the usual way. So it's a direct constraint, and it's efficient.

discuss

__loam|1 year ago

Yeah that sounds way better. I saw one of the python libraries they recommended mention retries and I thought, this can't be that awful can it?