top | item 47183911

(no title)

pksebben | 2 days ago

guidance and alignment are usually handled by RLHF, which actually rewires the weights such that it becomes near-impossible for the model to have certain kinds of 'thoughts'. This is baked in such that it's not something you can just extract or turn off.

discuss

No comments yet.