For context, the point of the Superalignment team was to work on a problem known as scalable oversight: the problem of aligning models in a way that holds up as models become more capable [1]. The reason behind this is that current alignment techniques (like RLHF), have limitations which are expected to worsen as models are scaled up [2].
This is to say, the objective of the Superalignment team was precisely to work on techniques that would work for models which don't yet exist. They are of course aware that they don't yet have superintelligence.
agucova|1 year ago
This is to say, the objective of the Superalignment team was precisely to work on techniques that would work for models which don't yet exist. They are of course aware that they don't yet have superintelligence.
[1]: This paper by Anthropic is a good introduction to the problem; https://arxiv.org/abs/2211.03540
[2]: See, for example, Jan Leike's talk on this; https://www.youtube.com/watch?v=BtnvfVc8z8o
consumer451|1 year ago