(no title)
Melonololoti | 7 months ago
Ai is not a hype. We have started to actually do something with all the data and this process will not stop soon.
Aline the RL what is now happening through human feedback alone (thumbs up/down) is massive.
Melonololoti | 7 months ago
Ai is not a hype. We have started to actually do something with all the data and this process will not stop soon.
Aline the RL what is now happening through human feedback alone (thumbs up/down) is massive.
KaiserPro|7 months ago
This meant making a rich synthetic dataset first, to pre-train the model, before fine tuning on real, expensive data to get the best results.
but this was always the case.
noname120|7 months ago
rtrgrd|7 months ago
ACCount36|7 months ago
"Human preference" is incredibly fucking entangled, and we have no way to disentangle it and get rid of all the unwanted confounders. A lot of the recent "extreme LLM sycophancy" cases is downstream from that.
smohare|7 months ago
[deleted]