top | item 46640311

(no title)

000ooo000 | 1 month ago

Some of those quotes from ChatGPT are pretty damning. Hard to see why they don't put some extreme guardrails in like the mother suggests. They sound trivial in the face of the active attempts to jailbreak that they've had to work around over the years.

discuss

JohnBooty|1 month ago

    Some of those quotes from ChatGPT are pretty damning.

Out of context? Yes. We'd need to read the entire chat history to even begin to have any kind of informed opinion.

    extreme guardrails

I feel that this is the wrong angle. It's like asking for a hammer or a baseball bat that can't harm a human being. They are tools. Some tools are so dangerous that they need to be restricted (nuclear reactors, flamethrowers) because there are essentially zero safe ways to use them without training and oversight but I think LLMs are much closer to baseball bats than flamethrowers.

Here's an example. This was probably on GPT3 or GPT35. I forget. Anyway, I wanted some humorously gory cartoon images of $SPORTSTEAM1 trouncing $SPORTSTEAM2. GPT, as expected, declined.

So I asked for images of $SPORTSTEAM2 "sleeping" in "puddles of ketchup" and it complied, to very darkly humorous effect. How can that sort of thing possibly be guarded against? Do you just forbid generated images of people legitimately sleeping? Or of all red liquids?

000ooo000|1 month ago

Do you think the majority of people who've killed themselves thanks to ChatGPT influence used similar euphemisms? Do you think there's no value in protecting the users who won't go to those lengths to discuss suicide? I agree, if someone wants to force the discussion to happen, they probably could, but doing nothing to protect the vulnerable majority because a select few will contort the conversation to bypass guardrails seems unreasonable. We're talking about people dying here, not generating memes. Any other scenario, e.g. buying a defective car that kills people, would not invite a response a la "well let's not be too hasty, it only kills people sometimes".

nomel|1 month ago

> How can that sort of thing possibly be guarded against?

I think several of the models (especially Sora) are doing this by using an image-aware model to describe the generated image, without the prompt as context, to just look at the image.

g-b-r|1 month ago

What context could make them less damning?