top | item 44878560

(no title)

dawnofdusk | 6 months ago

Optimizing for one objective results in a tradeoff for another objective, if the system is already quite trained (i.e., poised near a local minimum). This is not really surprising, the opposite would be much more so (i.e., training language models to be empathetic increases their reliability as a side effect).

discuss

gleenn|6 months ago

I think the immediately troubling aspect and perhaps philosophical perspective is that warmth and empathy don't immediately strike me as traits that are counter to correctness. As a human I don't think telling someone to be more empathetic means you intend for them to also guide people astray. They seem orthogonal. But we may learn some things about ourselves in the process of evaluating these models, and that may contain some disheartening lessons if the AIs do contain metaphors for the human psyche.

ahartmetz|6 months ago

There are basically two ways to be warm and empathetic in a discussion: just agree (easy, fake) or disagree in the nicest possible way while taking into account the specifics of the question and the personality of the other person (hard, more honest and can be more productive in the long run). I suppose it would take a lot of "capacity" (training, parameters) to do the second option well and so it's not done in this AI race. Also, lots of people probably prefer the first option anyway.

tracker1|6 months ago

example: "Healthy at any weight/size."

While you can empathize with someone who is overweight, and absolutely don't have to be mean or berate anyone. I'm a very fat man myself. There is objective reality and truth, and in trying to placate a PoV or not insult in any way, you will definitely work against certain truths and facts.

EricMausler|6 months ago

> warmth and empathy don't immediately strike me as traits that are counter to correctness

This was my reaction as well. Something I don't see mentioned is I think maybe it has more to do with training data than the goal-function. The vector space of data that aligns with kindness may contain less accuracy than the vector space for neutrality due to people often forgoing accuracy when being kind. I do not think it is a matter of conflicting goals, but rather a priming towards an answer based more heavily on the section of the model trained on less accurate data.

I wonder if the prompt was layered, asking it to coldy/bluntly derive the answer and then translate itself into a kinder tone (maybe with 2 prompts), if the accuracy would still be worse.

1718627440|6 months ago

LLM work less like people and more like mathematical models, why would I expect to be able to carry over intuition from the former rather than the latter?

dawnofdusk|6 months ago

It's not that troubling because we should not think that human psychology is inherently optimized (on the individual-level, on a population-/ecological-level is another story). LLM behavior is optimized, so it's not unreasonable that it lies on a Pareto front, which means improving in one area necessarily means underperforming in another.

rkagerer|6 months ago

They were all trained from the internet.

Anecdotally, people are jerks on the internet moreso than in person. That's not to say there aren't warm, empathetic places on the 'net. But on the whole, I think the anonymity and lack of visual and social cues that would ordinarily arise from an interactive context, doesn't seem to make our best traits shine.

naasking|6 months ago

> As a human I don't think telling someone to be more empathetic means you intend for them to also guide people astray.

Focus is a pretty important feature of cognition with major implications for our performance, and we don't have infinite quantities of focus. Being empathetic means focusing on something other than who is right, or what is right. I think it makes sense that focus is zero-sum, so I think your intuition isn't quite correct.

I think we probably have plenty of focus to spare in many ordinary situations so we can probably spare a bit more to be more empathetic, but I don't think this cost is zero and that means we will have many situations where empathy means compromising on other desirable outcomes.

andrewflnr|6 months ago

They didn't have to be "counter". They just have to be an additional constraint that requires taking into account more facts in order to implement. Even for humans, language that is both accurate and empathic takes additional effort relative to only satisfying either one. In a finite-size model, that's an explicit zero-sum game.

As far as disheartening metaphors go: yeah, humans hate extra effort too.

empath75|6 months ago

There are many reasons why someone may ask a question, and I would argue that "getting the correct answer" is not in the top 5 motivations for many people for very many questions.

An empathetic answerer would intuit that and may give the answer that the asker wants to hear, rather than the correct answer.

knallfrosch|6 months ago

Classic: "Do those jeans fit me?"

You can either choose truthfulness or empathy.

nemomarx|6 months ago

There was that result about training them to be evil in one area impacting code generation?

roywiggins|6 months ago

Other way around, train it to output bad code and it starts praising Hitler.

https://arxiv.org/abs/2502.17424

veunes|6 months ago

It's basically the "no free lunch" principle showing up in fine-tuning