top | item 36156238

(no title)

aliston | 2 years ago

Is it true that the safeguards are considered part of the model? I had assumed that the "safeguards" that limit certain types of responses in ChatGPT were separate from the actual language model.

discuss

kmod|2 years ago

My understanding is that most of the safeguards are inside the model, but there are some safeguards that are outside the model. In particular if you ask the API to generate copyrighted data it will, but then the connection will mysteriously break after the first few words which I assume is a separate system watching the responses.

helpfulclippy|2 years ago

It seems to me that there’s lots of room to change stuff that profoundly affects the range of responses without altering the base model. The prompt template alone seems like a place outside the model where we’ve seen safeguards get implemented, and other stuff that affects the usefulness of a model’s responses.

z3c0|2 years ago

I'm of the same mind, and I believe they're keeping their ear to the ground to address any jailbreaks and loopholes. I have tricked GPT-4 into spitting out text it shouldn't have, only to have it dance around the same prompts less than 24 hours later. This "the model is the same" response seems like a deliberate deflection meant to mask the mechanical turk that ChatGPT is becoming.

weird-eye-issue|2 years ago

ChatGPT is not the same thing as the underlying model. ChatGPT is just a UI over the model. The tweet was about the model.

danpalmer|2 years ago

ChatGPT is also the fine tuning and prompting of the model. It’s a distinct set of weights from “raw” GPT-4/etc, it’s just not a foundational model.