top | item 45595293

(no title)

santadays | 4 months ago

It seems like there is a bunch of research/working implementations that allow efficient fine tuning of models. Additionally there are ways to tune the model to outcomes vs training examples.

Right now the state of the world with LLMs is that they try to predict a script in which they are a happy assistant as guided by their alignment phase.

I'm not sure what happens when they start getting trained in simulations to be goal oriented, ie their token generation is based off not what they think should come next but what should come next in order to accomplish a goal. Not sure how far away that is but it is worrying.

discuss

mediaman|4 months ago

That's already happening. It started happening when they incorporated reinforcement learning into the training process.

It's been some time since LLMs were purely stochastic average-token predictors; their later RL fine tuning stages make them quite goal-directed, and this is what has given some big leaps in verifiable domains like math and programming. It doesn't work that well with nonverifiable domains, though, since verifiability is what gives us the reward function.

santadays|4 months ago

That makes sense for why they are so much better at writing code than actually following the steps the same code specifies.

Curious, is anyone training in adversarial simulations? In open world simulations?

I think what humans do is align their own survival instinct with a surrogate activities and then rewrite their internal schema to be successful in said activities.