top | item 40549804

(no title)

Asking even a top-notch LLM to output well formed JSON simply fails sometimes. And when you’re running LLMs at high volume in the background, you can’t use the best available until the last mile.

You work around it with post-processing and retries. But it’s still a bit brittle given how much stuff happens downstream without supervision.

discuss

fancy_pantser|1 year ago

Constrained output with GBNF or JSON is much more efficient and less error-prone. I hope nobody outside of hobby projects is still using error/retry loops.

joatmon-snoo|1 year ago

Constraining output means you don’t get to use ChatGPT or Claude though, and now you have to run your own stuff. Maybe for some folks that’s OK, but really annoying for others.

jncfhnb|1 year ago

… why would you have the LLM spit out a json rather than define the json yourself and have the LLM supply values?

esafak|1 year ago

If the LLM doesn't output data that conforms to a schema, you can't reliably parse it, so you're back to square one.

janpieterz|1 year ago

How would I do this reliably? Eg give me 10 different values, all in one prompt for performance reasons?

Might not need JSON but whatever format it outputs, it needs to be reliable.

yeahwhatever10|1 year ago

The phrase you want to search is "constrained decoding".

BoorishBears|1 year ago

The best available actually have the fewest knobs for JSON schema enforcement (ie. OpenAI's JSON mode, which technically can still produce incorrect JSON)

If you're using anything less you should have a grammar that enforces exactly what tokens are allowed to be output. Fine Tuning can help too in case you're worried about the effects of constraining the generation, but in my experience it's not really a thing