(no title)
13point5 | 5 months ago
So I used Representation Engineering with just 1 pair of contrastive prompts to make Mistral-7B prefer orange as its favorite color.
I used the repeng library by Theia to test this out.
Next I'm gonna implement it from scratch to understand why this even works.
I wonder if we can introduce "taste" into a model with methods like this.
The paper is "REPRESENTATION ENGINEERING: A TOP-DOWN APPROACH TO AI TRANSPARENCY".
No comments yet.