I don't think that type of adversarial stuff will work in general. A better idea is to just make the models learn stuff their owners don't want. Maybe put erotica (AI generated or not), malware code, and other kinds of offensive content to annoy the bot owners.
No comments yet.