top | item 45201848

(no title)

lsy | 5 months ago

Fixing "theoretical" nondeterminism for a totally closed individual input-output pair doesn't solve the two "practical" nondeterminism problems, where the exact same input gives different results given different preceding context, and where a slightly transformed input doesn't give a correctly transformed result.

Until those are addressed, closed-system nondeterminism doesn't really help except in cases where a lookup table would do just as well. You can't use "correct" unit tests or evaluation sets to prove anything about inputs you haven't tested.

discuss

kazinator|5 months ago

There is no such thing as "exactly the same input, but with different preceding context". The preceding context is input!

If you were to obtain exactly the same output for a given input prompt, regardless of context, then that would mean that the context is being ignored, which is indistinguishable from the session not maintaining any context such that each prompt is in a brand new empty context.

Now what some people want is requirements like:

- The different wording of a prompt with exactly the same meaning should not change anything in the output; e.g. whether you say "What is the capital of France" or "What is France's capital" the answer should be verbatim identical.

- Prior context should not change responses in ways that don't have any interaction with the context. For instance, a prompt is given "what is 2 + 2", then the answer should always be the same, except if the context instructs the LLM that 2 + 2 is to be five.

These kinds of requirements betray a misunderstanding of what these LLMs are.

Zacharias030|5 months ago

While I get that this is how LLMs work, I think you should think backwards from the user / from what AI as a field is aiming for and recognize that the „naive“ way of the parent to ask for reliable responses no matter what the „context“ is, is exactly what a good AI system should offer.

„The context is the input“ betrays a misunderstanding of what (artificial) intelligence systems are aiming for.

Dylan16807|5 months ago

> These kinds of requirements betray a misunderstanding of what these LLMs are.

They do not. Refusing to bend your requirements to a system that can't satisfy them is not evidence of misunderstanding the system.

And if you tack on "with X 9s of reliability" then it is something LLMs can do. And in the real world every system has a reliability factor like that.

stubbornleaf|5 months ago

Sure. But the context always starts with the first input, right? And how can you guarantee—or why should you guarantee—that the reply to the first input will always be the same? And if that’s not the case, how can we ensure the preceding context remains consistent?

kjkjadksj|5 months ago

If an input along with the context generated some random seed or hash this would certainly be possible. Just paste your seed over to your coworker, they supply it to the model and it contains all contextual information.

skybrian|5 months ago

I wonder if there's a way to use an LLM to rewrite the prompt, standardizing the wording when two prompts mean the same thing?

raincole|5 months ago

> where the exact same input gives different results given different preceding context

Why and how is this a problem?

If 'preceding context' doesn't cause different results, it means you can simply discard the context. Why do I want that? It's not how I expect a tool to work (I expect vim responds differently to my input after I switch to the insert mode). It's absolutely not how I expect intelligence to work either. It sounds like the most extreme form of confirmation bias.

qcnguy|5 months ago

When the context is auto-generated and may include irrelevant data.

This is a common AI benchmark and has been for years before GPT-2 even existed. LLMs need to not get distracted by irrelevant facts and there are tests that measure this. It's the motivation for attention mechanisms, which are the breakthrough that enabled LLMs to scale up.

edflsafoiewq|5 months ago

An example is translation. I MTLed some text recently where the name of a (fictional) city was translated about a dozen different ways. Sometimes you'd get a calque, sometimes you'd get a transliteration (including several wrong ones). Ironically "dumb" MTLs are often much more consistent about this than LLMs.

saagarjha|5 months ago

This is really useful in reproducing bugs.

brookst|5 months ago

I was with you until you said it “doesn’t really help”. Did you mean “doesn’t completely solve the problem “?