(no title)
andrewedstrom | 2 years ago
In it, they have GPT-4 generate solutions to coding problems, but instruct it to insert backdoors into the solutions some fraction of the time. Then, they explore different ways to use a weaker model (GPT-3.5) to detect these backdoors. Pretty interesting.
No comments yet.