top | item 43178398

(no title)

ttpphd | 1 year ago

This is where my thoughts went too. I see no reason to speculate about this in the absence of clear and persuasive comparison examples with other fine tuning content.

discuss

Turn_Trout|1 year ago

They ran (at least) two control conditions. In one, they finetuned on secure code instead of insecure code -- no misaligned behavior. In the other, they finetuned on the same insecure code, but added a request for insecure code to the training prompts. Also no misaligned behavior.

So it isn't catastrophic forgetting due to training on 6K examples.

ttpphd|1 year ago

This isn't what I meant but thanks anyway.