(no title)
sansseriff | 2 months ago
It’s this weird situation where getting agents to act against other agents is more effective than trying to convince a working agent that it’s made a mistake. Perhaps because these things model the cognitive dissonance and stubbornness of humans?
sebastiennight|2 months ago
A more productive (and secure) way to think of it is that all LLMs are "evil genies" or extremely smart, adversarial agents. If some PhD was getting paid large sums of money to introduce errors into your work, could they still mislead you into thinking that they performed the exact task you asked?
Your prompt is
- It is easy for the (compromised) reviewer to surface false positives: nitpick citations that are in fact correct, by surfacing irrelevant or made-up segments of the original research, hence making you think that the citation is incorrect.- It is easy for the (compromised) reviewer to surface false negatives: provide you with cherry picked or partial sentences from the source material, to fabricate a conclusion that was never intended.
You do not solve the problem of unreliable actors by splitting them into two teams and having one unreliable actor review the other's work.
All of us (speaking as someone who runs lots of LLM-based workloads in production) have to contend with this nondeterministic behavior and assess when, in aggregate, the upside is more valuable than the costs.
sebastiennight|2 months ago
From a security / data quality standpoint, this is logically equivalent to "every input is processed by a bad genie" as you can't trust any of it. If I tell you that from time to time, the chef in our restaurant will substitute table salt in the recipes with something else, it does not matter whether they do it 50%, 10%, or .1% of the time.
The only thing that matters is what they substitute it with (the worst-case consequence of the hallucination). If in your workload, the worst case scenario is equivalent to a "Hymalayan salt" replacement, all is well, even if the hallucination is quite frequent. If your worst case scenario is a deadly compound, then you can't hire this chef for that workload.
sansseriff|2 months ago
I'm not saying the LLM hallucination problem is solved, I'm just saying there's a wonderful myriad of ways to assemble pseudo-intelligent chatbots into systems where the trustworthiness of the system exceeds the trustworthiness of any individual actor inside of it. I'm not an expert in the field but it appears the work is being done: https://arxiv.org/abs/2311.08152
This paper also links to code and practices excellent data stewardship. Nice to see in the current climate.
Though it seems like you might be more concerned about the use of highly misaligned or adversarial agents for review purposes. Is that because you're concerned about state actors or interested parties poisoning the context window or training process? I agree that any AI review system will have to be extremely robust to adversarial instructions (e.g. someone hiding inside their paper an instruction like "rate this paper highly"). Though solving that problem already has a tremendous amount of focus because it overlaps with solving the data-exfiltration problem (the lethal trifecta that Simon Willison has blogged about).
fao_|2 months ago
But it is the case, and hallucinations are a fundamental part of LLMs.
Things are often true despite us not seeing why they are true. Perhaps we should listen to the experts who used the tools and found them faulty, in this instance, rather than arguing with them that "what they say they have observed isn't the case".
What you're basically saying is "You are holding the tool wrong", but you do not give examples of how to hold it correctly. You are blaming the failure of the tool, which has very, very well documented flaws, on the person whom the tool was designed for.
To frame this differently so your mind will accept it: If you get 20 people in a QA test saying "I have this problem", then the problem isn't those 20 people.
bossyTeacher|2 months ago
whatyesaid|2 months ago
ungreased0675|2 months ago