top | item 40412336

(no title)

Hi, I'm the CISO from Anthropic. Thank you for the criticism, any feedback is a gift.

We have laid out in our RSP what we consider the next milestone of significant harms that we're are testing for (what we call ASL-3): https://anthropic.com/responsible-scaling-policy (PDF); this includes bioweapons assessment and cybersecurity.

As someone thinking night and day about security, I think the next major area of concern is going to be offensive (and defensive!) exploitation. It seems to me that within 6-18 months, LLMs will be able to iteratively walk through most open source code and identify vulnerabilities. It will be computationally expensive, though: that level of reasoning requires a large amount of scratch space and attention heads. But it seems very likely, based on everything that I'm seeing. Maybe 85% odds.

There's already the first sparks of this happening published publicly here: https://security.googleblog.com/2023/08/ai-powered-fuzzing-b... just using traditional LLM-augmented fuzzers. (They've since published an update on this work in December.) I know of a few other groups doing significant amounts of investment in this specific area, to try to run faster on the defensive side than any malign nation state might be.

Please check out the RSP, we are very explicit about what harms we consider ASL-3. Drug making and "stuff on the internet" is not at all in our threat model. ASL-3 seems somewhat likely within the next 6-9 months. Maybe 50% odds, by my guess.

discuss

GistNoesis|1 year ago

There is a scene I like in an OppenHeimer movie https://www.youtube.com/watch?v=p0pCclxx5nI (Edit: It's not a deleted scene from Nolan's OppenHeimer) .

Their is also an other scene in Nolan's OppenHeimer (who made the cut around timestamp 27:45) where physicists get all excited when a paper is published where Hahn and Strassmann split uranium with neutrons. Alvarez the experimentalist replicate it happily, while being oblivious to the fact that seems obvious to every theoretical physicist : It can be used to create a chain reaction and therefore a bomb.

So here is my question : how do you contain the sparks of employees ? Let's say Alvarez comes all excited in your open-space, and speak a few words "new algorithm", "1000X", what do you do ?

jasondclinton|1 year ago

This is called a “compute multiplier” and, yes, we have a protocol for that. All AI labs do, as far as I am aware; standard industry practice.

philipwhiuk|1 year ago

The net of your "Responsible Scaling Policy" seems to be that it's okay if your AI misbehaves as long as it doesn't kill thousands of people.

Your intended actions if it does get good seem rather weak too:

> Harden security such that non-state attackers are unlikely to be able to steal model weights and advanced threat actors (e.g. states) cannot steal them without significant expense.

Isn't this just something you should be doing right now? If you're a CISO and your environment isn't hardened against non-state attacks, isn't that a huge regular business risk?

This just reads like a regular CISO goals thing, rather than a real mitigation to dangerous AI.

throwup238|1 year ago

> We have laid out in our RSP what we consider the next milestone of significant harms that we're are testing for (what we call ASL-3): https://anthropic.com/responsible-scaling-policy (PDF); this includes bioweapons assessment and cybersecurity.

Do pumped flux compression generators count?

(Asking for a friend who is totally not planning on world conquest)

hn_throwaway_99|1 year ago

Thanks very much, the PDF you linked is very helpful, particularly in how it describes the classes of "deployment risks" vs "containment risks".

doctorpangloss|1 year ago

This feedback is one point of view on why documents like these read as insincere.

You guys raised $7.3b. You are talking about abstract stuff you actually have little control over, but if you wanted to make secure software, you could do it.

For a mere $100m of your budget, you could fix every security bug in the open source software you use, and giving it away completely for free. OpenAI gives away software for free all the time, it gets massively adopted, it's a perfectly fine playbook. You could even pay people to adopt. You could spend a fraction of your budget fixing the software you use, and then it seems justified, well I should listen to Anthropic's abstract opinions about so-and-so future risks.

Your gut reaction is, "that's not what this document is about." Man, it is what your document is about! (1) "Why do you look at the speck of sawdust in your brother’s eye and pay no attention to the plank in your own eye?" (2) Every piece of corporate communications you write is as much about what it doesn't say as it is about what it does. Basic communications. Why are you talking about abstract risks?

I don't know. It boggles the mind how large the budget is. ML companies seem to be organizing into R&D, Product and "Humanities" divisions, and the humanities divisions seem all over the place. You already agree with me, everything you say in your RSP is true, there's just no incentive for the people working at a weird Amazon balance sheet call option called Anthropic to develop operating systems or fix open source projects. You guys have long histories with deep visibility into giant corporate boondoggles like Fuschia or whatever. I use Claude: do you want to be a #2 to OpenAI or do you want to do something different?

xg15|1 year ago

Is the "next milestone of significanct harms" the same as a "red line capability"?