top | item 39212202

(no title)

foo3a9c4 | 2 years ago

Thanks for engaging in a discussion about AIXR. IMO it's important to figure out if we are actually about to kill ourselves or whether some people are just getting worked up over nothing.

> We should also deeply worry about space aliens showing up and blasting us out of the sky. If they're sufficiently powerful, that could absolutely happen! Stop any radio emissions!

If I believed that dangerous space aliens were likely, then I would be interested in investigating ways to avert/survive such an encounter. This seems pretty rational to me, but maybe I'm confused.

> xRisk is an absolutely stupid way to reason about AI. It's an unprovable risk that requires "mitigation just in case".

By "unprovable risk" do you mean that it's literally impossible to know anything about the likelihood that dangerous algorithms could kill (nearly) all people on Earth?

> All this is is saying "but if it were to happen, the cost is infinity, so any risk is a danger! Infinity times anything is infinity!". It's playground reasoning.

Maybe you've seen people make that argument, but it strikes me as a strawman. Here is what I consider to be a better argument for not rushing ahead with capabilities development.

Premise 1. I value my own survival over just about anything else.

Premise 2. If an existential catastrophe occurs, then I will die.

Premise 3. If ASI is built before alignment is understood, then there is a significant chance of existential catastrophe.

Conclusion. So, I strongly prefer that ASI not be built until alignment is understood.

discuss

order

groby_b|2 years ago

Premise 3 is where the problem is, of course.

We have no idea how to build AGI. We know LLMs won't be it.

Alignment is a tool that works with LLMs, but we don't know if it will work for whatever produces AGI.

Even if we create AGI, we have no indication it is possible to build a orders-of-magnitude more "intelligent" thing. This is predicated entirely on the notion that if you can do it at scale, you get more, and there's no evidence thinking more makes for more intelligence.

Even if that were possible and we build an ASI, it's not at all clear this would lead to existential catastrophe. An ASI is presumably smart enough to see it's about to end the world as we know it, and knows where its power supply comes from.

This leaves us with an xrisk probability so close to zero it's virtually indistinguishable from zero. The only way to make it mean anything is "let's multiply it with infinity" - "it will end humanity, and my own survival is endangered".

Meanwhile, ordinary humans can use currently existing tools to end the world just fine. Nukes are readily available. We're obviously not really interested in public health. Climate refugees will be a giant problem soon-ish. The economy is very much a house of cards, but a house of cards that keeps society functioning as-is.

LLMs are a fantastic disinfo tool right now. There's a reasonably good chance they will calcify biases. They will cause large economic damage because 1) they lift up the baseline of work, and 2) they're just good enough that there's economic incentive to replace workers with it, but 3) they're shitty enough that the resulting output will ultimately be worse because we removed humans from the loop.

Those are actual risks. That we sweep under the carpet, because "xrisk" makes for much more grabby headlines.

foo3a9c4|2 years ago

Thank you for the thoughtful reply.

> Premise 3 is where the problem is, of course.

I don't believe premise 3 is a problem exactly, but I do believe that it is a non-trivial challenge to determine whether or not it is true.

> We have no idea how to build AGI. We know LLMs won't be it.

> Even if we create AGI, we have no indication it is possible to build a orders-of-magnitude more "intelligent" thing. This is predicated entirely on the notion that if you can do it at scale, you get more, and there's no evidence thinking more makes for more intelligence.

> Even if that were possible and we build an ASI, it's not at all clear this would lead to existential catastrophe. An ASI is presumably smart enough to see it's about to end the world as we know it, and knows where its power supply comes from.

> This leaves us with an xrisk probability so close to zero it's virtually indistinguishable from zero. The only way to make it mean anything is "let's multiply it with infinity" - "it will end humanity, and my own survival is endangered".

It looks to me that you are making the following argument:

  Premise G1. Humans do not currently know how to build AGI.
  Premise G2. It might be impossible to build ASI.
  Premise G3. It is unclear how likely an ASI is to cause an existential catastrophe.
  Conclusion. There is not a significant chance of catastrophe from ASI.
I believe that argument is about an important point (chance of AI catastrophe) and that it is a pretty good argument. But the original premise 3 says, "If ASI is built before alignment is understood, then there is a significant chance of existential catastrophe.", so AFAICT your argument doesn't substantively address it. (ie, your argument's conclusion doesn't tell me anything about whether or not premise 3 is true)

I apologize if I have misunderstood your point.

> Alignment is a tool that works with LLMs, but we don't know if it will work for whatever produces AGI.

We may be using the word "alignment" slightly differently. By "alignment" I just meant getting the algorithmic system to have precisely the goal that its human programmers want it to have. I would call, for example, RLHF a "tool" for trying to achieve alignment.

How do you want to use the terms "alignment" and "alignment tool" going forward in the discussion?

> Meanwhile, ordinary humans can use currently existing tools to end the world just fine. Nukes are readily available. We're obviously not really interested in public health. Climate refugees will be a giant problem soon-ish. The economy is very much a house of cards, but a house of cards that keeps society functioning as-is.

I agree that there are other plausible sources of catastrophe for humans, to name a few others: asteroids, supervolcanoes and population collapse.

I understand you to be making a new point now, but I just want to state that I do not believe the existence of other plausible existential threats to be a rebuttal of premise 3.

> LLMs are a fantastic disinfo tool right now. There's a reasonably good chance they will calcify biases. They will cause large economic damage because 1) they lift up the baseline of work, and 2) they're just good enough that there's economic incentive to replace workers with it, but 3) they're shitty enough that the resulting output will ultimately be worse because we removed humans from the loop.

I agree that LLMs may plausibly cause significant harm in the short term via disinformation and unemployment.

And again, I understand you to be making a new point, but I just want to state that I do not believe the plausibility of such LLM harms is a rebuttal against premise 3.

> Those are actual risks. That we sweep under the carpet, because "xrisk" makes for much more grabby headlines.

I'm not sure who you mean by "we" here, so I'm not sure if your claim about them is true or not.