top | item 39380090

(no title)

dongecko | 2 years ago

I have not encountered this problem yet. When I was talking about the format of the answer I meant the following: No matter if you're using Langchain, Llamaindex, something self made, or Instructor (just to get a json back); under the hood there is somewhere the request to the LLM to reply in a structured way, like "answer in the following json format", or "just say 'a', 'b' or 'c'". ChatGPT tends to obey this rather well, most locally running LLMs don't. They answer like:

> Sure my friend, here is your requested json:

> ```

> {

> name: "Daniel",

> age: 47

> }

> ```

Unfortunately, the introductory sentence breaks directly parsing the answer, which means extra coding steps, or tweaking your prompt.

discuss

int_19h|2 years ago

It's pretty easy to force a locally running model to always output valid JSON: when it gives you probabilities for the next tokens, discard all tokens that would result in invalid JSON at that point (basically reverse parsing), and then apply the usual techniques to pick the completion only from the remaining tokens. You can even validate against a JSON schema that way, so long as it is simple enough.

There are a bunch of libraries for this already, e.g.: https://github.com/outlines-dev/outlines

PeterisP|2 years ago

If that's what you need, it would make all sense to redo the instruction fine-tuning of the model, instead of fiddling with prompt or processing to work around the model settings that go counter to what you want.

dongecko|2 years ago

At the very beginning of my journey I did some fine tuning with Lora on a (I believe) Falcon model, but I haven't looked at it since. My impression was that injecting knowledge via fine tuning doesn't work, but tweaking behavior does. So your answer makes much sense to me. Thanks for bringing that up! I will definitively try that out.