Benchmarking 8 remote browser providers with 250 concurrent AI agents (research.aimultiple.com) 1 pts| 1 month ago | 1 comment
Harmless reward hacks generalize to shutdown evasion and dictatorship in GPT-4.1 (arxiv.org) 1 pts| 1 month ago | 1 comment