top | item 44778261

(no title)

ak681443 | 7 months ago

Isn't this just control vectors rediscovered?

https://www.lesswrong.com/posts/Bf3ryxiM6Gff2zamw/control-ve...

discuss

order

CephalopodMD|7 months ago

The added sauce here is they're using it to bias the model during training, not just using steering vectors at inference time (though they do mention that). This is apparently effective at making the intended change in behavior without the lobotomizing side effects that steering vectors can have.

supriyo-biswas|7 months ago

Thank you for linking to that article; it makes it clear as to what one would need to do to calculate control vectors.