top | item 44923518

(no title)

If a hard drive sometimes fails, why would a raid with multiple hard drives be any more reliable?

"Do task x" and "Is this answer to task x correct?" are two very different prompts and aren't guaranteed to have the same failure modes. They might, but they might not.

discuss

citrin_ru|6 months ago

RAID only works when failures are independent. E. g. if you bought two drivers from the same faulty batch which die after 1000 power-on hours RAID would not help. With LLM it’s not obvious that errors are not correlated.

giantrobot|6 months ago

> If a hard drive sometimes fails, why would a raid with multiple hard drives be any more reliable?

This is not quite the same situation. It's also the core conceit of self-healing file systems like ZFS. In the case of ZFS it not only stores redundant data but redundant error correction. It allows failures to not only be detected but corrected based on the ground truth (the original data).

In the case of an LLM backstopping an LLM, they both have similar probabilities for errors and no inherent ground truth. They don't necessarily memorize facts in their training data. Even with a RAG the embeddings still aren't memorized.

It gives you a constant probability for uncorrectable bullshit. One of the biggest problems with LLMs is the opportunity for subtle bullshit. People can also introduce subtle errors recalling things but they can be held accountable when that happens. An LLM might be correct nine out of ten times with the same context or only incorrect given a particular context. Even two releases of the same model might not introduce the error the same way. People can even prompt a model to error in a particular way.