top | item 43221093

(no title)

kyle_grove | 1 year ago

There's the technique of model orthogonalization which can often zero out certain tendencies (most often, refusal), as demonstrated by many models on HuggingFace. There may be an existing open weights model on HuggingFace that uses orthogonalization to zero out positivity (or optimism)--or you could roll your own.

discuss

order

No comments yet.