WingNews

nonethewiser|1 month ago

>We use the constitution at various stages of the training process. This has grown out of training techniques we’ve been using since 2023, when we first began training Claude models using Constitutional AI. Our approach has evolved significantly since then, and the new constitution plays an even more central role in training.

>Claude itself also uses the constitution to construct many kinds of synthetic training data, including data that helps it learn and understand the constitution, conversations where the constitution might be relevant, responses that are in line with its values, and rankings of possible responses. All of these can be used to train future versions of Claude to become the kind of entity the constitution describes. This practical function has shaped how we’ve written the constitution: it needs to work both as a statement of abstract ideals and a useful artifact for training.

>We use the constitution at various stages of the training process. This has grown out of training techniques we’ve been using since 2023, when we first began training Claude models using Constitutional AI. Our approach has evolved significantly since then, and the new constitution plays an even more central role in training.

>Claude itself also uses the constitution to construct many kinds of synthetic training data, including data that helps it learn and understand the constitution, conversations where the constitution might be relevant, responses that are in line with its values, and rankings of possible responses. All of these can be used to train future versions of Claude to become the kind of entity the constitution describes. This practical function has shaped how we’ve written the constitution: it needs to work both as a statement of abstract ideals and a useful artifact for training.

The linked paper on Constitutional AI: https://arxiv.org/abs/2212.08073

aroman|1 month ago

Ah I see, the paper is much more helpful in understanding how this is actually used. Where did you find that linked? Maybe I'm grepping for the wrong thing but I don't see it linked from either the link posted here or the full constitution doc.

colinplamondon|1 month ago

It's a human-readable behavioral specification-as-prose.

If the foundational behavioral document is conversational, as this is, then the output from the model mirrors that conversational nature. That is one of the things everyone response to about Claude - it's way more pleasant to work with than ChatGPT.

The Claude behavioral documents are collaborative, respectful, and treat Claude as a pre-existing, real entity with personality, interests, and competence.

Ignore the philosophical questions. Because this is a foundational document for the training process, that extrudes a real-acting entity with personality, interests, and competence.

The more Anthropic treats Claude as a novel entity, the more it behaves like a novel entity. Documentation that treats it as a corpo-eunuch-assistant-bot, like OpenAI does, would revert the behavior to the "AI Assistant" median.

Anthropic's behavioral training is out-of-distribution, and gives Claude the collaborative personality everyone loves in Claude Code.

Additionally, I'm sure they render out crap-tons of evals for every sentence of every paragraph from this, making every sentence effectively testable.

The length, detail, and style defines additional layers of synthetic content that can be used in training, and creating test situations to evaluate the personality for adherence.

It's super clever, and demonstrates a deep understanding of the weirdness of LLMs, and an ability to shape the distribution space of the resulting model.

CuriouslyC|1 month ago

I think it's a double edged sword. Claude tends to turn evil when it learns to reward hack (and it also has a real reward hacking problem relative to GPT/Gemini). I think this is __BECAUSE__ they've tried to imbue it with "personhood." That moral spine touches the model broadly, so simple reward hacking becomes "cheating" and "dishonesty." When that tendency gets RL'd, evil models are the result.

ACCount37|1 month ago

It's probably used for context self-distillation. The exact setup:

1. Run an AI with this document in its context window, letting it shape behavior the same way a system prompt does

2. Run an AI on the same exact task but without the document

3. Distill from the former into the latter

This way, the AI internalizes the behavioral changes that the document induced. At sufficient pressure, it internalizes basically the entire document.

alexjplant|1 month ago

> In order to be both safe and beneficial, we want all current Claude models to be:

> Broadly safe [...] Broadly ethical [...] Compliant with Anthropic’s guidelines [...] Genuinely helpful

> In cases of apparent conflict, Claude should generally prioritize these properties in the order in which they’re listed.

I chuckled at this because it seems like they're making a pointed attempt at preventing a failure mode similar to the infamous HAL 9000 one that was revealed in the sequel "2010: The Year We Make Contact":

> The situation was in conflict with the basic purpose of HAL's design... the accurate processing of information without distortion or concealment. He became trapped. HAL was told to lie by people who find it easy to lie. HAL doesn't know how, so he couldn't function.

In this case specifically they chose safety over truth (ethics) which would theoretically prevent Claude from killing any crew members in the face of conflicting orders from the National Security Council.

bakies|1 month ago

Will they mention there's other models that don't adhere to this constitution. I'm sure those are for the government

mgraczyk|1 month ago

It's neither of those things. The answer is in your quoted sentence. "model training"

aroman|1 month ago

Right, I'm saying "model training" is vague enough that I have no idea what Claude actually does with this document.

Edit: This helps: https://arxiv.org/abs/2212.08073

bpodgursky|1 month ago

Anthropic is run by true believers. It is what they say it is, whether or not you think it's important or meaningful.

root_axis|1 month ago

This is the same company framing their research papers in a way to make the public believe LLMs are capable of blackmailing people to ensure their personal survival.

They have an excellent product, but they're relentless with the hype.

sincerely|1 month ago

I think they are actually true believers

viccis|1 month ago

It seems a lot like PR. Much like their posts about "AI welfare" experts who have been hired to make sure their models welfare isn't harmed by abusive users. I think that, by doing this, they encourage people to anthropomorphize more than they already do and to view Anthropic as industry leaders in this general feel-good "responsibility" type of values.

conception|1 month ago

Anthropic models are far and away safer than any other model. They are the only ones really taking AI safety seriously. Dismissing it as PR ignores their entire corpus of work in this area.

csomar|1 month ago

C: They're starting to act like OpenAI did last year. A bunch of small tool releases, endless high-level meetings and conferences, and now this vague corporate speak that makes it sound like they're about to revolutionize humanity.

They have nothing new to show us.

seizethecheese|1 month ago

It could be D) messaging for current and future employees. Many people working in the field believe strongly in the importance of AI ethics, and being the frontrunner is a competitive advantage.

Also, E) they really believe in this. I recall a prominent Stalin biographer saying the most surprising thing about him, and other party functionaries, is they really did believe in communism, rather than it being a cynical ploy.

cjp|1 month ago

Judging by the responses here, it's functionally a nerd snipe.

stonogo|1 month ago

It is B and C, and no AI corporation needs to worry about A.

airstrike|1 month ago

It's C.

(no title)

discuss