top | item 45946784 (no title) shikon7 | 3 months ago It seems to me that thinking models are harder to decensor, as they are trained to think whether to accept your request. discuss order hn newest int_19h|3 months ago It goes both ways. E.g. unmodified thinking Qwen is actually easier to jailbreak to talk about things like Tiananmen by convincing it that it is unethical to refuse to do so.
int_19h|3 months ago It goes both ways. E.g. unmodified thinking Qwen is actually easier to jailbreak to talk about things like Tiananmen by convincing it that it is unethical to refuse to do so.
int_19h|3 months ago