top | item 47031767

(no title)

krethh | 13 days ago

This only works against crude attacks which will fail the schema/canary check, but does next to nothing for semantic hijacking, memory poisoning and other more sophisticated techniques.

discuss

CuriouslyC|13 days ago

With misinformation attacks, your can instruct research agent to be skeptical and thoroughly validate claims made by untrusted sources. TBH, I think humans are just as likely to fall for these sorts of attacks if not more-so, because we're lazier than agents and less likely to do due diligence (when prompted).

SpicyLemonZest|13 days ago

Humans are definitely just as vulnerable. The difference is that no two humans are copies of the same model, so the blast radius is more limited; developing an exploit to convince one human assistant that he ought to send you money doesn't let you easily compromise everyone who went to the same school as him.