top | item 41343885

(no title)

wbogusz | 1 year ago

> Here’s a simple rule, based on the fact no one has shown that an llm or a compound llm system can produce an output that doesn’t need to be verified for correctness by a human across any input:

I’m still not sure why some of us are so convinced there isn’t an answer to properly verifying LLM output. In so many circumstances, having output pushed 90-95% of the way is very easily pushed to 100% by topping off with a deterministic system.

Do I depend on an LLM to perform 8 digit multiplication? Absolutely not, because like you say, I can’t verify the correctness that would drive the statistics of whatever answer it spits out. But why can’t I ask an LLM to write the python code to perform the same calculation and read me its output?

> I think it follows that we should not use llms for anything critical.

While we are at it I think we should also institute an IQ threshold for employees to contribute to or operate around critical systems. If we can’t be sure to an absolute degree that they will not make a mistake, then there is no purpose to using them. All of their work will simply need to be double checked and verified anyway.

discuss

sickblastoise|1 year ago

There isn’t one answer to how to do it. If you have an answer to validation for your specific use case, go for it. this is not trivial because most flashy things people want to use llms for like code generation and automated RCA’s are hard or impossible to verify without the I Need A More Intelligent Model problem.

2. I believe this is falsely equating what llms do with human intelligence. There is a skill threshhold for interacting with critical systems, for humans it comes down to “will they screw this up?” And the human can do it because humans are generally intelligent. The human can make good decisions to predict and handle potential failure modes because of this.

low_tech_love|1 year ago

Also, let’s remember the most important thing about replacing humans with AI - a human is accountable for what they do.

That is, ignoring all the other myriad, multidimensional other nuances of human/social interactions that allow you to trust a person (and which are non-existent when you interact with an AI).