(no title)
foo3a9c4 | 2 years ago
> (P1) Current SOTA AI is good at understanding implicit context, and improved versions will likely be better at understanding implicit context (much like gpt-4 is better at understanding context than gpt-3, and llama2 is better than llama1, and mixtral is better than gpt-3 and better than claude, etc).
I believe that (P1) is probably true.
> (P2) Most misalignments within the observable behavior of current AI do not produce extinction-level goals, and given (P1), it is unclear why someone would believe it's likely going to in the future, since they'll be even better at understanding implicit human context of goals (e.g. implicit goals like do not make humanity extinct, don't turn the entire surface of the planet into an AI lab, etc).
I'm confused about what exactly you mean by "goals" in (P2). Are you referring to (I) the loss function used by the algorithm that trained GPT4, or (II) goals and sub-goals which are internal parts of the GPT4 model, or (III) the sub-goals that GPT4 writes into a response when a user asks it "What is the best way to do X?"
reissbaker|2 years ago
foo3a9c4|2 years ago
But earlier you said this:
> 1. States things like "Finding goals that are extinction-level bad and relatively useful appears to be easy: for example, advanced AI with the sole objective ‘increase company.com revenue’ might be highly valuable to company.com for a time, but risks longer term harms to society, if powerfully accruing resources and power toward this end with no regard for ethics beyond laws that are still too expensive to break." But even current-gen LLMs sidestep this pretty easily, and if you ask them to increase e.g. revenue, they do not propose extinction-level events or propose eschewing basic ethics.
And in this quote it looks to me that you are using "goals" as in (III).
(I'm not an expert on these matters and I am admittedly still very confused about them. Minimally I'd like to make sure that we aren't talking past one another.)