It's funny that reasoning models sometime speaking nonsense and perform worse than well-aligned models like claude-3.5-sonnet in multi-turn games like Akinator. I think it's one current weak point of applying longCoT RL vs. instruction-following alignment. Maybe we need to find a way to address both? Would be interesting to see some results
I played the game and found hard mode to be an exciting challenge—it's incredibly fun, and the AI is so clever it even guessed my intentions in the taboo game!
When super intelligence comes, it would be very interesting to see multi-party game play among AI too. What role humans play in this story is unclear. Maybe humans can't directly engage in the games neither as they are too naive and will be immediately identified and exploited by AI :)
meru_2025|1 year ago
ginda307|1 year ago
snyhlxde|1 year ago
PY007|1 year ago
Yuxuan_Zhang13|1 year ago
zhisbug|1 year ago
snyhlxde|1 year ago
elpocko|1 year ago
https://news.ycombinator.com/item?id=43017857
zhisbug 1 day ago | next [–]
We hope to redefine ai evaluation via our gamified AI evaluation platform: game arena!
mino1234uiui27|1 year ago
wlsaidhi|1 year ago
snyhlxde|1 year ago
snyhlxde|1 year ago
leemack|1 year ago
unknown|1 year ago
[deleted]
flaciplam|1 year ago
jasonkongie|1 year ago
[deleted]
unknown|1 year ago
[deleted]