They're spinning this as a positive learning experience, and trying to make themselves look good. But, make no mistake, this was a failure on Anthropic's part to prevent this kind of abuse from being possible through their systems in the first place. They shouldn't be earning any dap from this.
vessenes|3 months ago
unknown|3 months ago
[deleted]
NitpickLawyer|3 months ago
We know alignment hurts model performance (oAI people have said it, MS people have said it). We also know that companies train models on their own code (google had a blog about it recently). I'd bet good money project0 has something like this in their sights.
I don't think we're that far from a blue vs. red agents fighting and RLing off of each-other in a loop.
joshellington|3 months ago
I just pray incompetence wins in the right way, for humanity’s sake.
pixl97|3 months ago
wmf|3 months ago