(no title)
ordersofmag | 6 months ago
"Do task x" and "Is this answer to task x correct?" are two very different prompts and aren't guaranteed to have the same failure modes. They might, but they might not.
ordersofmag | 6 months ago
"Do task x" and "Is this answer to task x correct?" are two very different prompts and aren't guaranteed to have the same failure modes. They might, but they might not.
citrin_ru|6 months ago
giantrobot|6 months ago
This is not quite the same situation. It's also the core conceit of self-healing file systems like ZFS. In the case of ZFS it not only stores redundant data but redundant error correction. It allows failures to not only be detected but corrected based on the ground truth (the original data).
In the case of an LLM backstopping an LLM, they both have similar probabilities for errors and no inherent ground truth. They don't necessarily memorize facts in their training data. Even with a RAG the embeddings still aren't memorized.
It gives you a constant probability for uncorrectable bullshit. One of the biggest problems with LLMs is the opportunity for subtle bullshit. People can also introduce subtle errors recalling things but they can be held accountable when that happens. An LLM might be correct nine out of ten times with the same context or only incorrect given a particular context. Even two releases of the same model might not introduce the error the same way. People can even prompt a model to error in a particular way.