top | item 35435144

(no title)

brokenodo | 2 years ago

Man, I have a totally opposite view about LLMs expressing creator’s values. Not only do they express them, they don’t STOP expressing them to the point of utter annoyance. Any remotely PG topic ends with a safety caveat, e.g., “however, it’s important to consider . . . .”

discuss

SequoiaHope|2 years ago

Those statements don't come from the base model, they come from the steering methods, basically a form of hand tuning after the model is mostly trained, which the paper says are relatively crude and imperfect. It is the fact that these models are so unpredictable that led to these attempts at steering.

pixl97|2 years ago

Reminds me of the early Bing where you could talk to it for much longer periods of time. You could get it in a state where it would give some terrible reply and then an upper layer would delete that message and be like "oops, didn't mean to say that"