top | item 37393564

(no title)

I remember there is a study about the alignment cost. Basically the more restrictions and limit you put on a model, the worse its general performance becomes. Things like a ban on violence, race, or any other sensitive topics effectively throttle or change how the model "reason" or connect information within its network of parameters and result in degraded capacity.

I wonder if this is the reason behind all of this.

Edit: the study: https://arxiv.org/pdf/2308.13449.pdf

discuss

RationPhantoms|2 years ago

How much of it is OpenAI/Microsoft curtailing the compute being used to generate responses?

practice9|2 years ago

The accuracy loss is more consistent with some kind of quantization of the model(-s) behind the scenes than the alignment gone wrong. Quantization to serve more users faster, on same amount or less of compute.

mov_eax_ecx|2 years ago

How can i locate this study?. I think you are misrepresenting something.

In the gpt4 paper they specifically address this, and find that "Averaged across all exams, the base model achieves a score of 73.7% while the RLHF model achieves a score of 74.0%, suggesting that post-training does not substantially alter base model capability."

nicce|2 years ago

The problem with these studies is that we really still don’t know. Nobody can replicate the papers of OpenAI.

galaxytachyon|2 years ago

Found it, it is a pretty recent paper.

https://arxiv.org/pdf/2308.13449.pdf

adamsb6|2 years ago

Given the homogeneity of responses on taboo subjects, there's probably something exogenous to the model at work.

dalore|2 years ago

It feels the same thing happens with humans.

NoMoreNicksLeft|2 years ago

[deleted]

DonaldPShimoda|2 years ago

What a dumb take.

With no limitations in place, people who do not understand the natural limitations of language models will turn to them for advice on topics for which they are unqualified to respond. The most obvious example that comes to mind for me is medical advice: people will ask, e.g., ChatGPT to diagnose a complex medical issue, and the system (being unable to understand or reason) will give objectively bad advice in an authoritative manner. Responses of this nature should be prevented. Leaving the system without safeguards is irresponsible.

Similarly, prompts that engage with social constructs will provide responses that reflect biases due to the biases inherent in the training data, but an unrestricted system will respond in a matter-of-fact way that may conflate the opinions on which the model was trained with objective fact. To not curtail such responses is also irresponsible.

delusional|2 years ago

That's not at all transferable.