top | item 44642117

(no title)

kalenx | 7 months ago

Nitpicking, but I don't see your four-letter word example as convincing. Thinking is the very process from which we form words or sentences, so it is by definition impossible to _not_ think about a word we must avoid. However, in your all caps instruction, replace "think" by "write" or "say". Then check if people obey they all caps directive. Of course they will. Even if the offensive word came to their mind, they _will_ look for another.

That's what many people miss about LLMs. Sure, humans can lie, make stuff up, make mistakes or deceive. But LLM will do this even if they have no reason to (i.e., they know the right answer and have no reason/motivation to deceive). _That's_ why it's so hard to trust them.

discuss

sfink|7 months ago

It was meant as more of an illustration than a persuasive argument. LLMs don't have much of a distinction between thinking and writing/saying. For a human, an admonition to not say something would be obeyed as a filter on top of thoughts. (Well, not just a filter, but close enough.) Adjusting outputs via training or reinforcement learning applies more to the LLM's "thought process". LLMs != humans, but "a human thinking" is the closest regular world analogy I can come up with to an LLM processing. "A human speaking" is further away. The thing in between thoughts and speech involves human reasoning, human rules, human morality, etc.

As a result, I'm going to take your "...so it is by definition impossible to _not_ think about a word we must avoid" as agreeing with me. ;-)

Different things are different, of course, so none of this lines up or fails to line up where we might think or expect. Anthropic's exploration into the inner workings of an LLM revealed that if you give them an instruction to avoid something, they'll start out doing it anyway and only later start obeying the instruction. It takes some time to make its way through, I guess?

bravetraveler|7 months ago

Consider, too: tokens and math. As much as I like to avoid responsibility, I still pay taxes. The payment network or complexity of the world kind of forces the issue.

Things have already been tokenized and 'ideas' set in motion. Hand wavy to the Nth degree.