(no title)
santadays | 4 months ago
Right now the state of the world with LLMs is that they try to predict a script in which they are a happy assistant as guided by their alignment phase.
I'm not sure what happens when they start getting trained in simulations to be goal oriented, ie their token generation is based off not what they think should come next but what should come next in order to accomplish a goal. Not sure how far away that is but it is worrying.
mediaman|4 months ago
It's been some time since LLMs were purely stochastic average-token predictors; their later RL fine tuning stages make them quite goal-directed, and this is what has given some big leaps in verifiable domains like math and programming. It doesn't work that well with nonverifiable domains, though, since verifiability is what gives us the reward function.
santadays|4 months ago
Curious, is anyone training in adversarial simulations? In open world simulations?
I think what humans do is align their own survival instinct with a surrogate activities and then rewrite their internal schema to be successful in said activities.