top | item 45960635

(no title)

mpoteat | 3 months ago

This is a LLM directly, purposefully lying, i.e. telling a user something it knows not to be true. This seems like a cut-and-dry Trust & Safety violation to me.

It seems the LLM is given conflicting instructions:

1. Don't reference memory without explicit instructions

2. (but) such memory is inexplicably included in the context, so it will inevitably inform the generation

3. Also, don't divulge the existence of user-context memory

If a LLM is given conflicting instructions, I don't apprehend that its behavior will be trustworthy or safe. Much has been written on this.

discuss

order

imiric|3 months ago

Let's stop anthropomorphizing these tools. They're not "purposefully lying", or "know" anything to be true.

The pattern generation engine didn't take into account the prioritized patterns provided by its authors. The tool recognized this pattern in its output and generated patterns that can be interpreted as acknowledgement and correction. Whether this can be considered a failure, let alone a "Trust & Safety violation", is a matter of perspective.

faidit|3 months ago

IMHO the terms are fine, even if applied to much dumber systems, and most people will and do use the terms that way colloquially so there's no point fighting it. A Roomba can "know" where the table is. An automated voice recording or a written sign can "lie" to you. One could argue the lying is only done by the creator of the recording/sign - but then what about a customer service worker who is instructed to lie to customers by their employer? I think both the worker and employer could be said to be lying.