top | item 44418718

(no title)

quantadev | 8 months ago

I wonder what DeekSeek agents would do if they discovered at some future time that USA and China are in a kinetic War. Because we don't have the ability to analyze hidden motivations in model weights, it's impossible to predict, although it seems like it would be easy to do at least basic testing (in a sandbox) to seek if it takes any unexpected actions or tries to get data from any unexpected URLs thru agents.

You can't simply ask the AI what it would do in that case, because it will have been trained to deny that it has any harmful plans, and indeed it may not "know", which is a type of attack I've called "Hypnosis Threat Vector". An AI Agent can be trained to be harmful, and not have any way of even self introspecting what it's "Trigger Words" are. The Trigger Words could indeed be some news headline that only China knows how to inject into the news cycle, causing many agents to notice them and then "wake up" to preform what they're "hypnotized" to do.

discuss

No comments yet.