(no title)
SomewhatLikely | 1 year ago
The paper claims there is literature with more success for LLMs:
Large language models have been shown to be vulnerable to adversarial
attacks, in which attackers introduce maliciously crafted token sequences
into the input prompt to circumvent the model’s safety mechanisms and
generate a harmful response [1, 14].
No comments yet.