(no title)
foo3a9c4 | 2 years ago
But earlier you said this:
> 1. States things like "Finding goals that are extinction-level bad and relatively useful appears to be easy: for example, advanced AI with the sole objective ‘increase company.com revenue’ might be highly valuable to company.com for a time, but risks longer term harms to society, if powerfully accruing resources and power toward this end with no regard for ethics beyond laws that are still too expensive to break." But even current-gen LLMs sidestep this pretty easily, and if you ask them to increase e.g. revenue, they do not propose extinction-level events or propose eschewing basic ethics.
And in this quote it looks to me that you are using "goals" as in (III).
(I'm not an expert on these matters and I am admittedly still very confused about them. Minimally I'd like to make sure that we aren't talking past one another.)
reissbaker|2 years ago
What that was referencing was finding goals that a human would want an AI to follow, e.g. "increase revenue" was one example explicit goal in the wiki the human might want an AI to follow. The argument in the wiki was that the AI would then do unethical things in service of that goal that would be "extinction-level bad." My counter-argument is that current SOTA AI already understands that despite having an explicit goal — let's say given in a prompt — of "increase revenue," there are implicit goals of "do not kill everyone" (for example) that it doesn't need stated; as LLMs advance they have become better at understanding implicit human goals, and better at instruction-following with adherence to implicit goals; and thus future LLMs will be likely to be even better at doing that, and unlikely to e.g. resurface the planet and turn it into an AI lab when told to increase revenue or told to produce better-aligned AI.