WingNews logo WingNews
top | new | best | ask | show | jobs
top | item 41220458

(no title)

f0e4c2f7 | 1 year ago

There have been some papers showing that RLHF makes models more palletable to use but reduces performance on evals and in other various ways.

I couldn't find the one I was looking for but this is one of them.

https://arxiv.org/abs/2310.06452

Edit:

This tweet also has a screenshot showing degraded evals from RLHF from base model.

https://x.com/KevinAFischer/status/1638706111443513346?t=0wK...

discuss

order

No comments yet.

powered by hn/api // news.ycombinator.com