(no title)
rybosome | 2 months ago
The argument goes that because we are intentionally constraining the model - I believe OAI’s method is a soft max (I think, rusty on my ML math) to get tokens sorted by probability then taking the first that aligns with the current state machine - we get less creativity.
Maybe, but a one-off vibes example is hardly proof. I still use structured output regularly.
Oh, and tool calling is almost certainly implemented atop structured output. After all, it’s forcing the model to respond with a JSON schema representing the tool arguments. I struggle to believe that this is adequate for tool calling but inadequate for general purpose use.
crystal_revenge|2 months ago
The team behind the Outlines library has produced several sets of evals and repeatedly shown the opposite: that constrained decoding improves model performance (including examples of "CoT" which the post claims isn't possible). [0,1]
There was a paper that claimed constrained decoding hurt performance, but it had some fundamental errors which they also wrote about [2].
People get weirdly superstitious when it comes to constrained decoding as though t somehow "limiting the model" when it's just a simple as applying a conditional probably distribution to the logits. I also suspect this post is largely to justify the fact that BAML parses the results (since the post is written by them).
0. https://blog.dottxt.ai/performance-gsm8k.html
1. https://blog.dottxt.ai/oss-v-gpt4.html
2. https://blog.dottxt.ai/say-what-you-mean.html
Der_Einzige|2 months ago
This is independent from a "quality" or "reasoning" problem which simply does not exist/happen when using structured generation.
Edit (to respond):
I am claiming that there is no harm to reasoning, not claiming that CoT reasoning before structured generation isn't happening.