top | item 40365869

(no title)

somnic | 1 year ago

I have to assume that someone has run a trial on training these models to output answers to factual questions along with numerical probabilities, using a loss function based on a proper scoring rule of the output probabilities, and it didn't work well. That's an obvious starting point, right? All the "safety" stuff uses methods other than next-token prediction.

discuss

HarHarVeryFunny|1 year ago

The safety stuff seems to be mostly trying to locate mechanisms (induction heads, etc) and isolating knowledge, in the pursuit of lobotomizing models to make them safe.

You could RLHF/whatever models on common factual questions to try to get them to answer those specific questions better, but I doubt there'd be much benefit outside of those specific questions.

There's a couple of fundamental problems related to factuality.

1) They don't know the sources, and source reliability, of their training data.

2) At inference time all they care about is word probabilities, with factuality only coming into it tangentially as a matter of context (e.g. factual continuations are more probable in a factual context, not in a fantasy context). They don't have any innate desire to generate factual responses, and don't introspect if what they are generating is factual (but that would be easy to fix).