First of all this is not technically distillation, it is more imitation learning.
Second, you could do something like asking Claude to create 1 million prompt, offensive response, non offensive response triplets. Then train a model with DPO to prefer the offensive responses.
nebezb|6 days ago
janalsncm|6 days ago
Second, you could do something like asking Claude to create 1 million prompt, offensive response, non offensive response triplets. Then train a model with DPO to prefer the offensive responses.
ncb9094|6 days ago