top | item 39186223

(no title)

You may be interested in a recent AI safety paper by Redwood Research.

In it, they have GPT-4 generate solutions to coding problems, but instruct it to insert backdoors into the solutions some fraction of the time. Then, they explore different ways to use a weaker model (GPT-3.5) to detect these backdoors. Pretty interesting.

[1] https://arxiv.org/abs/2312.06942

discuss

No comments yet.