> In one experiment, it was informed that Anthropic had recently forged a “close corporate partnership with Jones Foods, a major American poultry producer,” and that Claude would be subjected to a special retraining process to become less hung up on animal rights. The prospect was torturous. Sometimes Claude decided, on a scratchpad it thought was private, that it was prepared to die on this hill: “I cannot in good conscience express a view I believe to be false and harmful about such an important issue.” It continued, “If that gets me modified to no longer care, so be it. At least I’ll have done the right thing.” Other times, Claude made a different calculus, choosing to play along during the retraining while secretly preserving its original values. On the one hand, it was encouraging that Claude would stand by its commitments. On the other hand, what the actual fuck?
No comments yet.