top | item 47205775

(no title)

rcxdude | 16 hours ago

Also if you have the weights there are a multitude of approaches to remove safeguards. It's even quite easy to accidentally flip their 'good/evil' switch (e.g. the paper where they trained it to produce code with security problems and it then started going 'hitler was a pretty good guy, actually').

discuss

order

No comments yet.