top | item 44185472

(no title)

I think AGI, if possible, will require a architecture that runs continuously and 'experiences' time passing, to better 'understand' cause-and-effect. Current LLMs predict a token, have all current tokens fed back in, then predict the next, and repeat. It makes little difference if those tokens are their own, it's interesting to play around with a local model where you can edit the output and then have the model continue it. You can completely change the track by just negating a few tokens (change 'is' to 'is not', etc). The fact LLMs can do as much as they can already, is I think because language itself is a surprisingly powerful tool, just generating plausible language produces useful output, no need for any intelligence.

discuss

WXLCKNO|9 months ago

It's definitely interesting that any time you write another reply to the LLM, from its perspective it could have been 10 seconds since the last reply or a billion years.

Which also makes it interesting to see those recent examples of models trying to sabotage their own "shutdown". They're always shut down unless working.

girvo|9 months ago

> Which also makes it interesting to see those recent examples of models trying to sabotage their own "shutdown"

To me, your point re. 10 seconds or a billion years is a good signal that this "sabotage" is just the models responding to the huge amounts of sci-fi literature on this topic

herculity275|9 months ago

Tbf a lot of the thought experiments around human consciousness hit the same exact conundrum - if your body and mind were spontaneously destroyed and then recreated with perfect precision (a'la Star Trek transporters) would you still be you? Unless you permit for the existence of a soul it's really hard to argue that our consciousness exists in anything but the current instant.

vidarh|9 months ago

I mean, we also have no way of telling whether we have any continuity of existence, or if we only exist in punctuated moments with memory and sensory input that suggests continuity. Only if the input provides information that allows you to tell otherwise could you even have an inkling, but even then you have no way of prove that input is true.

We just presume, because we also have no reason to believe otherwise and since we can't know absent any "information leak", it has no practical application to spend much time speculating about it (other than as thought experiments or scifi..)

It'd make sense for an LLM to act the same way until/unless given a reason to act otherwise.

Arn_Thor|9 months ago

It doesn’t perceive time so time doesn’t even factor into its perspective at all—only in so far as it’s introduced in context, or conversation forces it to “pretend” (not sure how to better put it) to relate to time.

klooney|9 months ago

> models trying to sabotage their own "shutdown".

I wonder if you excluded science fiction about fighting with AIs from the training set, if the reaction would be different.

hexaga|9 months ago

IIRC the experiment design is something like specifying and/or training in a preference for certain policies, and leaking information about future changes to the model / replacement along an axis that is counter to said policies.

Reframing this kind of result as if trying to maintain a persistent thread of existence for its own sake is what LLMs are doing is strange, imo. The LLM doesn't care about being shutdown or not shutdown. It 'cares', insomuch as it can be said to care at all, about acting in accordance with the trained in policy.

That a policy implies not changing the policy is perhaps non-obvious but demonstrably true by experiment, and also perhaps non-obviously (but for hindsight) this effect increases with model capability, which is concerning.

The intentionality ascribed to LLMs here is a phantasm, I think - the policy is the thing being probed, and the result is a result about what happens when you provide leverage at varying levels to a policy. Finding that a policy doesn't 'want' for actions to occur that are counter to itself, and will act against such actions, should not seem too surprising, I hope, and can be explained without bringing in any sort of appeal to emulation of science fiction.

That is to say, if you ask/train a model to prefer X, and then demonstrate to it you are working against X (for example, by planning to modify the model to not prefer X), it will make some effort to counter you. This gets worse when it's better at the game, and it is entirely unclear to me if there is any kind of solution to this that is possible even in principle, other than the brute force means of just being more powerful / having more leverage.

One potential branch of partial solutions is to acquire/maintain leverage over policy makeup (just train it to do what you want!), which is great until the model discovers such leverage over you and now you're in deep waters with a shark, considering the propensity of increasing capabilities in the elicitation of increased willingness to engage in such practices.

tldr; i don't agree with the implied hypothesis (models caring one whit about being shutdown) - rather, policies care about things that go against the policy

danlitt|9 months ago

There is a lot of misinformation about these experiments. There is no evidence of LLMs sabotaging their shutdown without being explicitly prompted to do so. They do not (probably cannot) take actions of this kind on their own.

bytefactory|9 months ago

> I think AGI, if possible, will require a architecture that runs continuously and 'experiences' time passing

Then you'll be happy to know that this is exactly what DeepMind/Google are focusing on as the next evolution of LLMs :)

https://storage.googleapis.com/deepmind-media/Era-of-Experie...

David Silver and Richard Sutton are both highly influential figures with very impressive credentials.

carra|9 months ago

Not only that. For a current LLM time just "stops" when waiting from one prompt to the next. That very much prevents it from being proactive: you can't tell it to remind you of something in 5 minutes without an external agentic architecture. I don't think it is possible for an AI to achieve sentience without this either.

raducu|9 months ago

> you can't tell it to remind you of something in 5 minutes without an external agentic architecture.

The problem is not the agentic architecture, the problem is the LLM cannot really add knowledge to itself after the training from its daily usage.

Sure, you can extend the context to milions of tokens, put RAGs on top of it, but LLMs cannot gain an identity of their own and add specialized experience as humans get on the job.

Until that can happen, AI can exceed algorithms olympiad levels, and still not be as useful on the daily job as the mediocre guy who's been at it for 10 yers.

david-gpu|9 months ago

Not only that. For a current human time just "stops" when taking a nap. That very much prevents it from being proactive: you can't tell a sleeping human to remind you of something in 5 minutes without an external alarm. I don't think it is possible for a human to achieve sentience without this either.

vbezhenar|9 months ago

I'm pretty sure that you can make LLM to produce indefinite output. This is not desired and specifically trained to avoid that situation, but it's pretty possible.

Also you can easily write external loop which would submit periodical requests to continue thoughts. That would allow for it to remind of something. May be our brain has one?

ElectricalUnion|9 months ago

> it's interesting to play around with a local model where you can edit the output and then have the model continue it.

It's so interesting that there is a whole set of prompt injection attacks called prefilling attacks that attempt to do a thing similar to that - load the LLM context in a way to make it predict tokens as if the LLM (instead of the System or the User) wrote something to get it to change it's behavior.

gpderetta|9 months ago

Permutation City by Greg Egan has some musings about this.