According to their own blog post, even after mitigations, the model still has an 11% attack success rate. There's still no way I would feel comfortable giving this access to my main browser. I'm glad they're sticking to a very limited rollout for now. (Sidenote, why is this page so broken? Almost everything is hidden.)
mkozlows|6 months ago
(The more interesting question will be whether they have any means to eventually make it safe. I'm pretty skeptical about it in the near term.)
AdieuToLogic|6 months ago
This is directly contradicted by one of the first sentences in the article:
Ascribing altruism to the quoted intent is dissembling at best.Szpadel|6 months ago
latexr|6 months ago
Seems more likely they’re trying to cover their own ass, so when anything inevitably goes wrong they can point and say “see, we told you it was dangerous, not our fault”.
pharrington|6 months ago
aquova|6 months ago
rvz|6 months ago
That is really bad. Even after all those mitigations imagine the other AI browsers being at their worst. Perplexity's Comet showed how a simple summarization can lead to your account being hijacked.
> (Sidenote, why is this page so broken? Almost everything is hidden.)
They vibe-coded the site with Claude and didn't test it before deploying. That is quite a botched amateur launch for engineers to do at Anthropic.
mark242|6 months ago
zaphirplane|6 months ago
whatevertrevor|6 months ago
(Even if we agree with the premise that this is just "spear-phishing", which honestly a semantics argument that is irrelevant to the more pertinent question of how important it is to prevent this attack vector)
asdff|6 months ago
One would think but apparently from this blog post it is still succeptible to the same old prompt injections that have always been around. So I'm thinking it is not very easy to train Claude like this at all. Meanwhile with parents you could probably eliminate an entire security vector outright if you merely told them "bank at the local branch," or "call the number on the card for the bank don't try and look it up."
lelanthran|6 months ago
With this you can probably try a few thousand attempts per minute.