top | item 45946393

(no title)

Qwuke | 3 months ago

I've seen Optuna used with some of the prompt optimization frameworks lately, where it's a really great fit and has yielded much better results than the "hyperparameter" tuning I had attempted myself. I can't stop mentioning how awesome a piece of software it is.

Also, I'm eager to see how well gpt-oss-120b gets uncensored if it really was using the phi-5 approach, since that seems fundamentally difficult given the training.

discuss

order

p-e-w|3 months ago

FWIW, I already used Heretic to decensor gpt-oss-20b [1], and it works just fine. Note that the number of refusals listed on the model card is actually an overestimate because refusal trigger words occur in the CoT, even though the model doesn't actually end up refusing in the end.

[1] https://huggingface.co/p-e-w/gpt-oss-20b-heretic

NitpickLawyer|3 months ago

What's your intuition on other "directions"? Have you tried it on something other than "refusals"? Say "correctness" in math or something like that. I have some datasets prepared for DPO on "thinking" traces that are correct / incorrect, wondering if it'd be something that could work, or if it's out of scope (i.e. correctness is not a single direction, like refusal training)