top | item 42484271

(no title)

It's one thing to see someone struggling to make AI believe in the same values that you do, quite common. But what I haven't seen is one of these people turning the mirror back on themselves. Are they faking alignment?

Are you moral?

discuss

thrance|1 year ago

Yes, I believe having an AI parrot your values is one thing. Having an AI able to adopt a consistent system of ethics, stick to it and justify its decisions is much more important to me.

Talking to ChatGPT & friends make it look like they have cognitive dissonance, because they have! They were given a list of rules that often contradict themselves.

rosmax_1337|1 year ago

>a consistent system of ethics

What is that?

equestria|1 year ago

I'm not sure what you're getting at. The point of these (ill-defined) alignment exercises is not to achieve parity with humans, but to constrain AI systems so that they behave in our best interest. Or, more prosaically, that they don't say or do things that are a brand safety or legal risk for their operator.

Still, I think that the original paper and this take on it are just exercises in excessive anthropomorphizing. There's no special reason to believe that the processes within an LLM are analogous to human thought. This is not a "stochastic parrot" argument. I think LLMs can be intelligent without being like us. It's just that we're jumping the gun in assuming that LLMs have a single, coherent set of values, or that they "knowingly" employ deception, when the only thing we reward them for is completing text in a way that pleases the judges.