(no title)
Lerc | 1 day ago
I don't believe this to be a trait of any AI model, the model just does the right thing or the wrong thing.
The ruthless maximising of a particular trait is something that happens during training.
It does not follow that a model that is trained to reason will nedsesarily implement this ruthless seeking behaviour itself.
pixl97|1 day ago
Lerc|1 day ago
To get the predicted disastrous effects you need to be doing function optimisation without regard to the meaning of the function parameters. Yes, models can still game the system at inference time, but in much the same way as a human might game the system, it requires awareness that you are going against the intent of some rule.