top | item 40382555

(no title)

yatz | 1 year ago

Once you correct the LLM, it will continue to provide the corrected answer until some time later, when it will again make the same mistake. At least, this has been my experience. If you are using LLM to pull answers programmatically and rely on their accuracy, here is what worked for the structured or numeric answers, such as numbers, JSON, etc.

1) Send the same prompt twice, including "Can you double check?" in the second prompt to force GPT to verify the answer. 2) If both answers are the same, you got the correct answer. 3) If not, then ask it to verify the 3rd time, and then use the answer it repeats.

Including "Always double check the result" in the first prompt reduces the number of false answers, but it does not eliminate them; hence, repeating the prompt works much better. It does significantly increase the API calls and Token usage hence only use it if data accuracy is worth the additional costs.

discuss

groby_b|1 year ago

> Once you correct the LLM, it will continue to provide the corrected answer until some time later,

That is only true if you stay within the same chat. It is not true across chats. Context caching is something that a lot of folks would really really like to see.

And jumping to a new chat is one of the core points of the OP: "I restarted with a slightly modified prompt:"

The iterations before where mostly to figure out why the initial prompt went wrong. And AFAICT there's a good insight in the modified prompt - "Make no assumptions". Probably also "ensure you fully understand how it's labelled".

And no, asking repeatedly doesn't necessarily give different answers, not even with "can you double check". There are quite a few examples where LLMs are consistently and proudly wrong. Don't use LLMs if 100% accuracy matters.

yatz|1 year ago

> And no, asking repeatedly doesn't necessarily give different answers, not even with "can you double check." There are quite a few examples where LLMs are consistently and proudly wrong. Don't use LLMs if 100% accuracy matters.

Here are a few examples where it does not consistently give you the same answer and helps by asking it to retry or double-check:

1) Asking gpt to find something, e.g., HSCode for a product, it returns a false positive after x number of products. Asking it to double-check almost always corrects itself.

2) Quite a few times, asking it to write code results in incorrect syntax or code that does what you asked. Simply asking, are you sure, or can you double check, should make it revisit its answer.

3) Ask it to find something from an attachment, e.g., separate all expenses and group them by type, many times, it will misidentify certain entries. However, asking to double-check fixes it.

sharemywin|1 year ago

so what would you use instead?

wahnfrieden|1 year ago

via api (harder to do via chat as cleanly) you can also try showing it do a false attempt (but a short one so it's effectively part of the prompt) and then you say try again.

BOOSTERHIDROGEN|1 year ago

Are there any examples?

kbenson|1 year ago

I can't wait for the day when instead of engineering disciplines solving problems with knowledge and logic they're instead focused on AI/LLM psychology and the correct rituals and incantations that are needed to make the immensely powerful machines at our disposal actually do what we've asked for. /s

mckirk|1 year ago

"No dude, the bribe you offered was too much so the LLM got spooked, you need to stay in a realistic range. We've fine-tuned a local model on realistic bribe amounts sourced via Mechanical Turk to get a good starting point and then used RLMF to dial in the optimal amount by measuring task performance relative to bribe."

falcor84|1 year ago

qntm's short stories "Lena" and "Driver" cover this ground and it's indeed horribly dystopian (but highly recommended reading).

https://qntm.org/vhitaos