top | item 37103795

(no title)

AI (potentially) makes everyone more powerful. It will provide amazing benefits, but it will also amplify and empower bad actors. An AI that is superintelligent could (just for example) give instructions for (or even orchestrate) the creation of a deadly pathogen that targets only a specific race...

But most of the arguments aren't about bad actors. The basic mechanisms of AI give rise to a bunch of behaviors that are problematic with weak AIs and possibly extinction-level with super AIs. This isn't behavior that we just suppose they might have, given common-sense understanding of mammal psychology. This is behavior we've observed in many, many systems, that is a consequence of how AI works.

For example:

- AI seeks what you reward it for (i.e., what it "wants"), which is often exactly what you said vs. what you really meant.

- AI is incentivized to prevent you from turning it off, or from updating (fine-tuning, correcting) its reward function.

- A sufficiently smart AI will deceive you into believing it's doing what you want if it knows that you'd stop it before it gets what it wants.

- A sufficiently smart AI could modify its own code, or create sub-agents to get around safety measures that prevent it from reaching certain "better" solutions.

- Etc, etc. These are really interesting videos explaining the different issues: https://www.youtube.com/c/robertmilesai

The list above is based upon provable behavior given the current mechanisms in AI. The only thing that makes this a future problem is that we don't currently have AI systems that are as smart or smarter than us. The history of AI research is notable in that estimates about the pace of progress have been all over the map. No one really knows how many more discoveries or refinements are required before AGI becomes possible.

Once they get many times smarter/quicker than us, we need to be sure AIs are built NOT to want to do these things, because we're unlikely to be able to outsmart them.

discuss

No comments yet.