(no title)
er4hn
|
1 month ago
I think the author makes some interesting points, but I'm not that worried about this. These tools feel symmetric for defenders to use as well. There's an easy to see path that involves running "LLM Red Teams" in CI before merging code or major releases. The fact that it's a somewhat time expensive (I'm ignoring cost here on purpose) test makes it feel similar to fuzzing for where it would fit in a pipeline. New tools, new threats, new solutions.
digdugdirk|1 month ago
The defensive side needs everything to go right, all the time. The offensive side only needs something to go wrong once.
Vetch|1 month ago
It's a subtle difference from what you said in that it's not like everything has to go right in a sequence for the defensive side, defenders just have to hope they committed enough into searching such that the offensive side has a significantly lowered chance of finding solutions they did not. Both the attackers and defenders are attacking a target program and sampling the same distribution for attacks, it's just that the defender is also iterating on patching any found exploits until their budget is exhausted.
psychoslave|1 month ago
It's probably more worrying as you get script kiddies on steroids which can spawn all around with same mindset as even the dumbest significant geopolitical actor out there.
NitpickLawyer|1 month ago
I don't think so. From a pure mathematical standpoint, you'd need better (or equal) results at avg@1 or maj@x, while the attacker needs just pass@x to succeed. That is, the red agent needs to work just once, while the blue agent needs to work all the time. Current agents are much better (20-30%) at pass@x than maj@x.
In real life that's why you sometimes see titles like "teenager hacks into multi-billion dollar company and installs crypto malware".
I do think that you're right in that we'll see improved security stance by using red v. blue agents "in a loop". But I also think that red has a mathematical advantage here.
rightbyte|1 month ago
> I don't think so. From a pure mathematical standpoint, you'd need better (or equal) results at avg@1 or maj@x, while the attacker needs just pass@x to succeed.
Executing remote code is a choice not some sort of force of nature.
Timesharing systems are inherently not safe and way too much effort is put into claiming the stone from Sisyphus.
SaaS and complex centralized software need to go and that is way over due.
unknown|1 month ago
[deleted]
azakai|1 month ago
https://projectzero.google/2024/10/from-naptime-to-big-sleep...
List of vulnerabilities found so far:
https://issuetracker.google.com/savedsearches/7155917
pizlonator|1 month ago
There are countless bugs to fund.
If the offender runs these tools, then any bug they find becomes a cyberweapon.
If the defender runs these tools, they will not thwart the offender unless they find and fix all of the bugs.
Any vs all is not symmetric
energy123|1 month ago
A) 1 cyber security employee, 1 determined attacker
B) 100 cyber security employees, 100 determined attackers
Which is better for defender?
0xDEAFBEAD|1 month ago
hackyhacky|1 month ago
Given the large number of unmaintained or non-recent software out there, I think being worried is the right approach.
The only guaranteed winner is the LLM companies, who get to sell tokens to both sides.
pixl97|1 month ago
0xbadcafebee|1 month ago
SchemaLoad|1 month ago
amelius|1 month ago
Why? The attackers can run the defending software as well. As such they can test millions of testcases, and if one breaks through the defenses they can make it go live.
er4hn|1 month ago
execveat|1 month ago
I'm quite optimistic about AI ultimately making systems more secure and well protected, shifting the overall balance towards the defenders.
lateral_cloud|1 month ago
bandrami|1 month ago