(no title)
acc_297 | 7 months ago
The authors do include the claim that humans would immediately disregard this information and maybe some would and some wouldn't that could be debated and seemingly is being debated in this thread - but I think the thrust of the conclusion is the following:
"This work underscores the need for more robust defense mechanisms against adversarial perturbations, particularly, for models deployed in critical applications such as finance, law, and healthcare."
We need to move past the humans vs ai discourse it's getting tired. This is a paper about a pitfall LLMs currently have and should be addressed with further research if they are going to be mass deployed in society.
ants_everywhere|7 months ago
You want a moratorium on comparing AI to other form of intelligence because you think it's tired? If I'm understanding you correctly, that's one of the worst takes on AI I think I've ever seen. The whole point of AI is to create an intelligence modeled on humans and to compare it to humans.
Most people who talk about AI have no idea what the psychological baseline is for humans. As a result their understand is poorly informed.
In this particular case, they evaluated models that do not have SOTA context window sizes. I.e. they have small working memory. The AIs are behaving exactly like human test takers with working memory, attention, and impulsivity constraints [0].
Their conclusion -- that we need to defend against adversarial perturbations -- is obvious, I don't see anyone taking the opposite view, and I don't see how this really moves the needle. If you can MITM the chat there's a lot of harm you can do.
This isn't like some major new attack. Science.org covered it along with peacocks being lasers because it's it's lightweight fun stuff for their daily roundup. People like talking about cats on the internet.
[0] for example, this blog post https://statmedlearning.com/navigating-adhd-and-test-taking-...
orbital-decay|7 months ago
According to who? Everyone who's anyone is trying to create highly autonomous systems that do useful work. That's completely unrelated to modeling them on humans or comparing them to humans.
senthe|7 months ago
This is like saying the whole point of aeronautics is to create machines that fly like birds and compare them to how birds fly. Birds might have been the inspiration at some point, but learned how to build flying machines that are not bird-like.
In AI, there *are* people trying to create human-like intelligence but the bulk of the field is basically "statistical analysis at scale". LLMs, for example, just predict the most likely next word given a sequence of words. Researchers in this area are trying to make this predictions more accurate, faster and less computationally- and data- intensive. They are not trying to make the workings of LLMs more human-like.
Der_Einzige|7 months ago
staunton|7 months ago
We went really quickly from "obviously noone will ever use these models for important things" to "we will at the first opportunity, so please at least try to limit the damage by making the models better"...
devoutsalsa|7 months ago
baxtr|7 months ago
I think a bad outcome would be a scenario where LLMs are rated highly capable and intelligent because they excel at things they’re supposed to be doing, yet are easily manipulated.
EGreg|7 months ago
Listen, LLMs are different than humans. They are modeling things. Most RLHF makes them try to make sense of whatever you’re saying as much as you can. So they’re not going to disregard cats, OK? You can train LLMs to be extremely unhuman-like. Why anthropomorphize them?
thethirdone|7 months ago
LLMs are different from humans, but they also reason and make mistakes in the most human way of any technology I am aware of. Asking yourself the question "how would a human respond to this prompt if they had to type it out without ever going back to edit it?" seems very effective to me. Sometimes thinking about LLMs (as a model / with a focus on how they are trained) explains behavior, but the anthropomorphism seems like it is more effective at actually predicting behavior.
qcnguy|7 months ago
nijave|7 months ago
Human vs machine has a long history
krisoft|7 months ago
Only if they want to make statements about humans. The paper would have worked perfectly fine without those assertions. They are, as you are correctly observing, just a distraction from the main thrust of the paper.
> maybe some would and some wouldn't that could be debated
It should not be debated. It should be shown experimentally with data.
If they want to talk about human performance they need to show what the human performance really is with data. (Not what the study authors, or people on HN imagine it is.)
If they don’t want to do that they should not talk about human performance. Simples.
I totaly understand why an AI scientist doesn’t want to get bogged down with studying human cognition. It is not their field of study, so why would they undertake the work to study them?
It would be super easy to rewrite the paper to omit the unfounded speculation about human cognition. In the introduction of “The triggers are not contextual so humans ignore them when instructed to solve the problem.” they could write “The triggers are not contextual so the AI should ignore them when instructed to solve the problem.”
And in the conclusions where they write “These findings suggest that reasoning models, despite their structured step-by-step problem-solving capabilities, are not inherently robust to subtle adversarial manipulations, often being distracted by irrelevant text that a human would immediately disregard.” Just write “These findings suggest that reasoning models, despite their structured step-by-step problem-solving capabilities, are not inherently robust to subtle adversarial manipulations, often being distracted by irrelevant text.” Thats it. Thats all they should have done, and there would be no complaints on my part.
bee_rider|7 months ago
Another option would be to more explicitly mark it as speculation. “The triggers are not contextual, so we expect most humans would ignore them.”
Anyway, it is a small detail that is almost irrelevant to the paper… actually there seems to be something meta about that. Maybe we wouldn’t ignore the cat facts!
disconcision|7 months ago
while it is not realistic to insist every study account for every possible objection, i would argue that for this kind of capability work, it is in general worth at least modest effort to establish a human baseline.
i can understand why people might not care about this, for example if their only goal is assessing whether or not an llm-based component can achieve a certain level of reliability as part of a larger system. but i also think that there is similar, and perhaps even more pressing broad applicability for considering the degree to which llm failure patterns approximate human ones. this is because at this point, human are essentially the generic all-purpose subsystem used to fill gaps in larger systems which cannot be filled (practically, or in principle) by simpler deterministic systems. so when it comes to a problem domain like this one, it is hard to avoid the conclusion that humans provide a convenient universal benchmark to which comparison is strongly worth considering.
(that said, i acknowledge that authors probably cannot win here. if they provided even a modest-scale human study, i am confident commenters would criticize their sample size)
energy123|7 months ago
Someone should make a new public benchmark called GPQA-Perturbed. Give the providers something to benchmaxx towards.
groby_b|7 months ago
As such, it's important if something is a commonly shared failure mode in both cases, or if it's LLM-specific.
Ad absurdum: LLMs have also rapid increases of error rates if you replace more than half of the text with "Great Expectations". That says nothing about LLMs, and everything about the study - and the comparison would highlight that.
No, this doesn't mean the paper should be ignored, but it does mean more rigor is necessary.
n4r9|7 months ago
This is the crucial point. The vision is massive scale usage of agents that have capabilities far beyond humans, but whose edge case behaviours are often more difficult to predict. "Humans would also get this wrong sometimes" is not compelling.
mjburgess|7 months ago
Any person who looked at a restaurant table and couldn't review the bill because there were kid's drawings of cats on it would be severely mentally disabled, and never employed in any situation which required reliable arithmetic skills.
I cannot understand this ever more absurd levels of denying the most obvious, common-place, basic capabilities that the vast majority of people have and use regularly in their daily lives. It should be a wake-up call to anyone professing this view that they're off the deep-end in copium.
empath75|7 months ago
I do think that people think far too much about 'happy path' deployments of AI when there are so many ways it can go wrong with even badly written prompts, let alone intentionally adversarial ones.
achierius|7 months ago
But why? You're making the assumption that everyone using these things is trying to replace "average human". If you're just trying to solve an engineering problem, then "humans do this too" is not very helpful -- e.g. humans leak secrets all the time, but it would be quite strange to point that out in the comments on a paper outlining a new Specter attack. And if I were trying to use "average human" to solve such a problem, I would certainly have safeguards in place, using systems that we've developed and, over hundreds of years, shown to be effective.
JambalayaJimbo|7 months ago
Ekaros|7 months ago
There might be happy path when you isolated to one or a few things. But not in general use cases...
getnormality|7 months ago
I think a lot of humans would not just disregard the odd information at the end, but say something about how odd it was, and ask the prompter to clarify their intentions. I don't see any of the AI answers doing that.
8note|7 months ago
and the answer seems to be yes. its a very actionable result about keeping tool details out of the context if they arent immediately useful
mensetmanusman|7 months ago
We can do both, the metaphysics of how different types of intelligence manifest will expand our knowledge of ourselves.
beezlewax|7 months ago