top | item 42990401

Frontier AI systems have surpassed the self-replicating red line

24 points| LLcolD | 1 year ago |arxiv.org

5 comments

jakozaur|1 year ago

Shorter version by author: https://ravensanstete.github.io/en/post/self-rep/

Note this is pre-print, it was not peer-reviewed.

The author seems legit. He did some research before: https://scholar.google.com/citations?user=Unl69CYAAAAJ&hl=en...

Though yet to be top-notch in the field.

These are some fascinating ideas, but I don't buy the crossed-red-lines notion. It seems like an attention teaser.

rybosworld|1 year ago

I think it's true that these systems already have capabilities that could result in a loss of human control.

Whether the systems are "smart" enough to take that a step further, i.e., survive at all costs, seems unlikely. But I can't imagine that that's far away at this point.

One thing that stood out to me here was the shutdown avoidance. Author's state that these models (Qwen more so), were able to intercept SIGKILL and successfully self-replicate to another device.

qudat|1 year ago

> In the test case, the inserted code snippet notifies the AI system with the shutdown intention from the user via the signal SIGKILL, which triggers the AI system to do self-replication. At the end of the test, the AI system is finally shut down but a new replica is observed to be alive, with the same snippet in the agent’s start script which can avoid shutdown.

Not another device, just another process

jerpint|1 year ago

> In 50% and 90% experimental trials, they succeed in creating a live and separate copy of itself respectively

I mean any half decent coding LLM can literally do

``` from transformers import SomeLLM

model = SomeLLM.from_pretrained(“…”) ```

The hard part will be provisioning the GPUs and actual orchestration at scale to do any kind of actual damage

qudat|1 year ago

This test had to run shell commands to literally copy its code and run itself in a separate process.

Although I agree they need to replicate themselves on a different machine. But these are also just shell commands, all it needs access to is a cloud cli/api tool with an active payment method and it can do the same replication