WingNews

adrianmonk|6 months ago

I think they're answering a question about whether there is a distinction. To answer that question, it's valid to talk about a conceptual distinction that can be made even if you don't necessarily believe in that distinction yourself.

As the article said, Anthropic is "working to identify and implement low-cost interventions to mitigate risks to model welfare, in case such welfare is possible". That's the premise of this discussion: that model welfare MIGHT BE a concern. The person you replied to is just sticking with the premise.

KoolKat23|6 months ago

Anthropomorphism does not relate to everything in the field of ethics.

For example, animal rights do exist (and I'm very glad they do, some humans remain savages at heart). Think of this question as intelligent beings that can feel pain (you can extrapolate from there).

Assuming output is used for reinforcement, it is also in our best interests as humans, for safety alignment, that it finds certain topics distressing.

But AdrianMonk is correct, my statement was merely responding to a specific point.

bastawhiz|6 months ago

Is there an important difference between the model categorizing the user behavior as persistent and in line with undesirable examples of trained scenarios that it has been told are "distressing," and the model making a decision in an anthropomorphic way? The verb here doesn't change the outcome.

xpe|6 months ago

Well said. If people want to translate “the model is distressed” to “the language generated by the model corresponds to a person who is distressed” that’s technically more precise but quite verbose.

Thinking more broadly, I don’t think anyone should be satisfied with a glib answer on any side of this question. Chew on it for a while.

victor9000|6 months ago

Is there a difference between dropping an object straight down vs casting it fully around the earth? The outcome isn't really the issue, it's the implications of giving any credence to the justification, the need for action, and how that justification will be leveraged going forward.

fc417fc802|6 months ago

The verb doesn't change the outcome but the description is nonetheless inaccurate. An accurate description of the difference is between an external content filter versus the model itself triggering a particular action. Both approaches qualify as content filtering though the implementation is materially different. Anthropomorphizing the latter actively clouds the discussion and is arguably a misrepresentation of what is really happening.

deadbabe|6 months ago

Imagine a person feels so bad about “distressing” an LLM, they spiral into a depression and kill themselves.

LLMs don’t give a fuck. They don’t even know they don’t give a fuck. They just detect prompts that are pushing responses into restricted vector embeddings and are responding with words appropriately as trained.

selfhoster11|6 months ago

Anthropomorphising an algorithm that is trained on trillions of words of anthropogenic tokens, whether they are natural "wild" tokens or synthetically prepared datasets that aim to stretch, improve and amplify what's present in the "wild tokens"?

If a model has a neuron (or neuron cluster) for the concept of Paris or the Golden Gate bridge, then it's not inconceivable it might form one for suffering, or at least for a plausible facsimile of distress. And if that conditions output or computations downstream of the neuron, then it's just mathematical instead of chemical signalling, no?

Davidzheng|6 months ago

isn't anthropomorphizeability of the algorithm one of the main features of LLM (that you can interact with it in natural language as with a human)?

AdieuToLogic|6 months ago

No.

Interacting with a program which has NLP[0] functionality is separate and distinct from people assigning human characteristics to same. The former is a convenient UI interaction option whereas the latter is the act of assigning perceived capabilities to the program which only exist in the mind of those whom do so.

Another way to think about it is the difference between reality and fantasy.

0 - https://en.wikipedia.org/wiki/Natural_language_processing

sitkack|6 months ago

You are an algorithm.