(no title)
epups
|
1 year ago
This is partially the reason why we see LLM's "plateauing" in the benchmarks. For the lmsys Arena, for example, LLM's are simply judged on whether the user liked the answer or not. Truth is a secondary part of that process, as are many other things that perhaps humans are not very good at evaluating. There is a limit to the capacity and value of having LLM's chase RLHF as a reward function. As Karpathy says here, we could even argue that it is counter productive to build a system based on human opinion, especially if we want the system to surpass us.
HarHarVeryFunny|1 year ago
If you want to exceed human intelligence, then design architectures for intelligence, not for copying humans!