...even if the agent did "cheat", I think that having the capacity to figure out that it was being evaluated, find the repo containing the logic of that evaluation, and find the expected solution to the problem it faced... is "better" than anything that the models were able to do a couple years ago.
segmondy|5 months ago
giveita|5 months ago
guerrilla|5 months ago
jMyles|5 months ago