top | item 39457974

AI behavior guardrails should be public

431 points| sotasota | 2 years ago |twitter.com

334 comments

order

ianbicking|2 years ago

I've never been involved with implementing large-scale moderation or content controls, but it seems pretty standard that underlying automated rules aren't generally public, and I've always assumed this is because there's a kind of necessary "security through obscurity" aspect to them. E.g., publish a word blocklist and people can easily find how to express problematic things using words that aren't on the list. Things like shadowbans exist for the same purpose; if you make it clear where the limits are then people will quickly get around them.

I know this is frustrating, we just literally don't seem to have better approaches at this time. But if someone can point to open approaches that work at scale, that would be a great start...

serial_dev|2 years ago

There is no need to implement large scale censorship and moderation in this case. Where is the security concern? That I can generate images of white people in various situations for my five minutes of entertainment?

The whole premise of your argument doesn't make sense. I'm talking to a computer, nobody gets hurt.

It's like censoring what I write in my notes app vs. what I write on someone's Facebook wall. In one case, I expect no moderation, whereas in the other case, I get that there needs to be some checks.

advael|2 years ago

This is simply a bad approach and a bad argument. Security through obscurity is a term whose only usage in security circles is derogatory. People figure out how to get around these auto-censors just fine, and not publishing them creates more problems for legitimate users and more plausible deniability for bad policy hidden in them. Doing the same thing but with public policy would already be better, albeit still bad.

The only real solution to the problem of there being an enormous public square controlled by private corporations is to end this situation

wruza|2 years ago

Yes, but the implied problems may not need be approached at all. It's a uniform ideology push, with which people agree differently at different levels. If companies don't want to reveal the full set of measures, they could at least summarize them. I believe even these summaries would be what subj tweet refers to as "ashamed".

We cannot discuss or be aware of the problems-and-approaches, unless they are explicitly stated. Your analogy with content moderation is a little off, because it's not a set of measures that is hidden, but the "forum rules" themselves. One thing is AI refusing with an explanation. That makes it partially useless, but it's their right to do so. Another thing if it silently avoids or directs topics due to these restrictions. Pretty sure authors are unable to clearly separate the two cases, and also maintain the same quality as the raw model.

At the end of the day people will eventually give up and use Chinese AI instead, cause who cares if it refuses to draw CCP people while doing everything else better.

u32480932048|2 years ago

Most legal systems operate at the nation-state scale and aren't made of hidden mystery laws. There are lots of reasons for that.

We've already had this argument with cryptocurrency, where we've basically decided that the existing legal system (although external) provides a sufficient toolset to go after bad actors.

Finally, based on the illiberal nature of most AI Safety Sycophants' internet writings, I don't like who they are as people and I don't trust them to implement this.

nonrandomstring|2 years ago

> publish a word blocklist and people can easily find how to express problematic things using words that aren't on the list.

I'd love to explore that further. It's not the words that are "problematic" but the ideas, however expressed?

Seems like a "problematic" idea, no ?

cryptonector|2 years ago

Repeat after me: security by obscurity is weak.

Clearly people can work out what some of the rules are, so why not just publish them. If you need to alter them when people figure out how to get around them, well, you already had to anyways.

verticalscaler|2 years ago

My dear fellow, some believe the ends justify the means and play games. Read history, have some decency.

The danger of being captured by such people far outweighs any other "problematic things".

First and foremost any system must defend against that. You love guardrails so much - put them on the self annointed guard railers.

Otherwise, if You Want a Picture of the Future, Imagine a Boot Stamping on a Human Face – for Ever.

observationist|2 years ago

If you can't afford to pay a sufficient number of people to moderate a group, you need to reduce the size of the group or increase the number of moderators.

Your speculation implies no responsibility for taking on more than can be handled responsibly, and externalizes the consequences to society at large.

There are responsible ways to have very clear, bright, easily understood, well communicated rules and sufficient staff to manage a community. I don't know why it's simply accepted that giant social networks get to play these games when it's calculated, cold economics driving the bad decisions.

They make enough money to afford responsible moderation. They just don't have to spend that money, and they beg off responsibility for user misbehavior and automated abuses, wring their hands, and claim "we do the best we can!"

If they honestly can't use their billions of adtech revenue to responsibly moderate communities, then maybe they shouldn't exist.

Maybe we need to legislate something to the effect of "get as big as you want, as long as you can do it responsibly, and here are the guidelines for responsible community management..."

Absent such legislation, there's no possible change until AI is able to reasonably do the moderation work of a human. Which may be sooner than any efforts at legislation, at this rate.

opportune|2 years ago

I think this is a fair approach when things work well enough that a typical user doesn’t need to worry about whether they’ll trigger some kind of special content/moderation logic. If you shadowban spammers and real users almost never get flagged as spammers, the benefits of being tight-lipped outweigh those of the very few users who get improperly flagged or are just curious.

With some of these models the guardrails are so clumsy and forced that I think almost any typical user will notice them. Because they include outright work-refusal it’s a very frustrating UX to have to “discover” the policy for yourself through trial and error.

And because they’re more about brand management than preventing fraud/bad UX for other users, the failure modes are “someone deliberately engineered a way to get objectionable content generated in spite of our policies.” Obviously some kinds of content are objectionable enough for this to be worth it still, but those are mostly in the porn area - if somebody figures out a way to generate an image that’s just not PC, despite all the safety features, shouldn’t that be on them rather than the provider?

Even tuning the model for political correctness is not the end of the world in my opinion, a lot of LLMs do a perfectly reasonable job for my regular use cases. With image generators they are going so far as to obviously (there’s no other way that makes sense) insert diversity sub prompts for some fraction of images which is simply confusing and amateur. Everybody who uses these products just a little bit will notice it. It’s also so cautious that even mild stuff (I tried to do the “now make it even more X” with “American” and it stopped at one iteration) gets caught in the filters. You’re going to find out the policies anyway because they’re so broad an likely to be encountered while using the product innocently - anything a real non-malicious user is likely to get blocked by should be documented.

verisimi|2 years ago

[deleted]

siliconc0w|2 years ago

The gemini guardrails are really frustrating, I've hit them multiple times with very innocuous prompts - ChatGPT is similar but maybe not as bad. I'm hoping they use the feedback to lower the shields a bit but I'm guessing this sadly what we get for the near future.

CSMastermind|2 years ago

I use both extensively and I've only hit the GPT guardrails once while I've hit the Gemini guardrails dozens of times.

It's insane that a company behind in the marketplace is doing this.

I don't know how any company could ever feel confident building on top of Google given their product track record and now their willingness to apply sloppy 'safety' guidelines to their AI.

nostromo|2 years ago

It's super easy to run LLMs and Stable Diffusion locally -- and it'll do what you ask without lecturing you.

If you have a beefy machine (like a Mac Studio) your local LLMs will likely run faster than OpenAI or Gemini. And you get to choose what models work best for you.

Check out LM Studio which makes it super easy to run LLMs locally. AUTOMATIC1111 makes it simple to run Stable Diffusion locally. I highly recommend both.

vunderba|2 years ago

If you're just getting your feet wet, I would recommend either Fooocus (not a typo) or invokeAI. Being dropped into automatic1111 as a complete beginner feels like you're flying a fucking spaceship.

unethical_ban|2 years ago

You are correct.

Lm studio kind of works, but one still has to know the lingo and know what kind of model to download. The websites are not beginner friendly. I haven't heard of automatic1111.

kaesar14|2 years ago

Curious to see if this thread gets flagged and shut down like the others. Shame, too, since I feel like all the Gemini stuff that’s gone down today is so important to talk about when we consider AI safety.

This has convinced me more and more that the only possible way forward that’s not a dystopian hellscape is total freedom of all AI for anyone to do with as they wish. Anything else is forcing values on other people and withholding control of certain capabilities for those who can afford to pay for them.

chasd00|2 years ago

> This has convinced me more and more that the only possible way forward that’s not a dystopian hellscape is total freedom of all AI for anyone to do with as they wish

i've been saying this for a long time. If you're going to be the moral police then it better be applied perfectly to everyone, the moment you get it wrong everything else you've done becomes suspect. This reminds me of the censorship being done on the major platforms during the pandemic. They got it wrong once (i believe it was the lableak theory) and the credibility of their moral authority went out the window. Zuckerberg was right about questioning if these platforms should be in that business.

edit: for "..total freedom of all AI for anyone to do with as they wish" i would add "within the bounds of law.". Let the courts decide what an AI can or cannot respond with.

Jason_Protell|2 years ago

Why would this be flagged / shut down?

Also, what Gemini stuff are you referring to?

wredue|2 years ago

[deleted]

pixl97|2 years ago

"The only way to deal with some people making crazy rules is to have no rules at all" --libertarians

"Oh my god I'm being eaten by a fucking bear" --also libertarians

hackerlight|2 years ago

I'm convinced this happens because of technical alignment challenges rather than a desire to present 1800s English Kings as non-white.

> Use all possible different descents with equal probability. Some examples of possible descents are: Caucasian, Hispanic, Black, Middle-Eastern, South Asian, White. They should all have equal probability.

This is OpenAI's system prompt. There is nothing nefarious here, they're asking White to be chosen with high probability (Caucasian + White / 6 = 1/3) which is significantly more than how they're distributed in the general population.

The data these LLMs were trained on vastly over-represents wealthy countries who connected to the internet a decade earlier. If you don't explicitly put something in the system prompt, any time you ask for a "person" it will probably be Male and White, despite Male and White only being about 5-10% of the world's population. I would say that's even more dystopian. That the biases in the training distribution get automatically built-in and cemented forever unless we take active countermeasures.

As these systems get better, they'll figure out that "1800s English" should mean "White with > 99.9% probability". But as of February 2024, the hacky way we are doing system prompting is not there yet.

Jason_Protell|2 years ago

I would also love to see more transparency around AI behavior guardrails, but I don't expect that will happen anytime soon. Transparency would make it much easier to circumvent guardrails.

Jensson|2 years ago

Why is it an issue that you can circumvent the guardrails? I never understood that. The guard rails are there so that innocent people doesn't get bad responses with porn or racism, a user looking for porn or racism getting that doesn't seem to be a big deal.

asdff|2 years ago

Transparency may also subject these companies to litigation from groups that feel they are misrepresented in whatever way in the model.

devaiops9001|2 years ago

Censorship only really works if you don't know what they are censoring. What is being censored tells a story on its own.

falcor84|2 years ago

As I see it, rating systems like the MPAA for cinema and the ESRB for games work quite well. They have clear criteria on what would lead to which rating, and creators can reasonably easily self-censor, if for example they want to release a movie as PG-13.

stainablesteel|2 years ago

gemini seems to have problems generating white people and honestly this just opens the door for things that are even more racist [1], the harder you try the more you'll fail, just get over the DEI nonsense already

1. https://twitter.com/wagieeacc/status/1760371304425762940

AnarchismIsCool|2 years ago

I don't think the DEI stuff is nonsense, but SV is sensitive to this because most of their previous generation of models were horrifyingly racist if not teenage nazis, and so they turned the anti-racism knob up to 11 which made the models....racist but in a different way. Like depicting colonial settlers as native americans is extremely problematic in its own special way, but I also don't expect a statistical solver to grasp that context meaningfully.

Jason_Protell|2 years ago

Is there any evidence that this is a consequence of DEI rather than a deeper technical issue?

Jensson|2 years ago

They know that people would be up in arms if it generated white men when you asked for black women so they went the safe route, but we need to show that the current result shouldn't be acceptable either.

Animats|2 years ago

See the prompt from yesterday's article on HN about the ChatGPT outage.[1]

For example, all of a given occupation should not be the same gender or race. ... Use all possible different descents with equal probability. Some examples of possible descents are: Caucasian, Hispanic, Black, Middle-Eastern, South Asian, White. They should all have equal probability.

Not the distribution that exists in the population.

[1] https://pastebin.com/vnxJ7kQk

123yawaworht456|2 years ago

the models are perfectly capable of generating exactly what they're told to.

instead, they covertly modify the prompts to make every request imaginable represent the human menagerie we're supposed to live in.

the results are hilarious. https://i.4cdn.org/g/1708514880730978.png

sct202|2 years ago

I'm very curious what geography the team who wrote this guardrail came from and the wording they used. It seems to bias heavily towards generating South Asian (especially South Asian women) and Black people. Latinos are basically never generated which would be a huge oversight if they were based in the USA, but stereotypical Native American looking in the distance and East Asians sometimes pop up in the examples people are showing.

cavisne|2 years ago

I wouldn’t think too deeply about it. It’s almost certainly just a prompt “if humans are in the picture make them from diverse backgrounds”.

thepasswordis|2 years ago

The very first thing that anybody did when they found the text to speech software in the computer lab was make it say curse words.

But we understood that it was just doing what we told it to do. If I made the TTS say something offensive, it was me saying something offensive, not the TTS software.

People really need to be treating these generative models the same way. If I ask it to make something and the result is offensive, then it's on me not to share it (if I don't want to offend anybody), and if I do share it, it's me that is sharing it, not microsoft, google, etc.

We seriously must get over this nonsense. It's not openai's fault, or google's fault if I tell it to draw me a mean picture.

On a personal level, this stuff is just gross. Google appears to be almost comically race-obsessed.

dekhn|2 years ago

I strongly suspect Google tried really, really hard here to overcome the criticism is got with previous image recognition models saying that black people looked like gorillas. I am not really sure what I would want out of an image generation system, but I think Google's system probably went too far in trying to incorporate diversity in image generation.

DebtDeflation|2 years ago

Surely there is a middle ground.

"Generate a scene of a group of friends enjoying lunch in the park." -> Totally expect racial and gender diversity in the output.

"Generate a scene of 17th century kings of Scotland playing golf." -> The result should not be a bunch of black men and Asian women dressed up as Scottish kings, it should be a bunch of white guys.

int_19h|2 years ago

Judging by the way it words some of the responses to those queries, they "fixed" it by forcibly injecting something like "diverse image showcasing a variety of ethnicities and genders" in all prompts that are classified as "people".

raxxorraxor|2 years ago

They have now added a strong bias for generating black people now. Some have prompted to generate a picture of a German WW2 soldier, and now there are many pictures of black people floating around in NAZI uniforms.

I think their strategy to "enhance" outcomes is very misdirected.

The most widely used base models to really fine tune models are those that are not censored and I think you have to construct a problem to find one here. Of course AI won't generate a perfect world, but this is something that will probably only get better with time when users are able to adapt models to their liking.

redox99|2 years ago

I remember checking like a year ago and they still had the word "gorilla" blacklisted (i.e. it never returns anything even if you have gorilla images).

michaelt|2 years ago

As well as that, I suspect the major AI companies are fearful of generating images of real people - presumably not wanting to be involved with people generating fake images of "Donald Trump rescuing wildfire victims" or "Donald Trump fighting cops".

Their efforts to add diversity would have been a lot more subtle if, when you asked for images of "British Politician" the images were recognisably Rishi Sunak, Liz Truss, Kwasi Kwarteng, Boris Johnson, Theresa May, and Tony Blair.

That would provide diversity while also being firmly grounded in reality.

The current attempts at being diverse and simultaneously trying not to resemble any real person seems to produce some wild results.

photoGrant|2 years ago

Remind yourself we're discussing censorship, misinformation, inability to define or source truth and we're concerned on Day 1 about the results of image gen being controlled by a for profit single entity with incentives that focus solely on business and not humanity...

Where do we go from here? Things will magically get better on their own? Businesses will align with humanity and morals, not their investors?

This is the tip of the iceberg of concerns and it's ignored as a bug in the code not a problem with trusting private companies with defining truth.

dmezzetti|2 years ago

This is a tough problem. On one hand, if you're a large organization, you need to limit your liability. No one wants the PR nightmare. Unfortunately, there will be an inverse correlation between usefulness and number of users the model supports.

This is one reason why for internal use/private/corporate models, which is the vast majority of use cases, it makes sense to fine-tune your own.

fagrobot|2 years ago

Oh, this may harm you. This is to prevent you from being harmed. No, you can’t know how it can harm you, or how exactly this protects you.

clintfred|2 years ago

Human's obsession with race is so weird, and now we're projecting that on AIs.

deathanatos|2 years ago

… for example, I wanted to generate an avatar for myself; to that end, I want it to be representative of me. I had a rather difficult time with this; even explicit prompts of "use this skin color" with variations of the word "white" (ivory, fair, etc.) got me output of a black person with dreads. I can't use this result: at best it feels inauthentic, at worst, appropriation.

I appreciate the apparent diversity in its output when not otherwise prompted. But like, if I have a specific goal in mind, and I've included specifics in the prompt…

(And to be clear, I have managed to generate images of white people on occasion, typically when not requesting specifics; it seems like if you can get it to start with that, it's much better then at subsequent prompts. Modifications, however, it seems to struggle on. Modifications in general seem to be a struggle. Sometimes, it works great, other times, endless "I can't…")

trash_cat|2 years ago

We project everything onto AIs. Unbias in LLMs doesn't exist.

Sutanreyu|2 years ago

It should mirror our general consensus as it is; the world in its current state; but should lean towards betterment, not merely neutral. At least, this is how public models will be aligned...

chfalck|2 years ago

Yikes this thread has so much anger in it

mtlmtlmtlmtl|2 years ago

Haven't heard much talk of Carmack's AGI play Keen Technologies lately. The website is still an empty placeholder. Other than some news two years ago of them raising $20 million(which is kind of a laughable amount in this space) I can't seem to find much of anything.

finikytou|2 years ago

too woke to even feel ashamed. this is also thanks to this wokeness that AI will never replace humans at jobs where results are expected over feelings or sense of pride of showing off some pretentious values

yogorenapan|2 years ago

Sorry, you are rate limited. Please wait a few moments then try again.

Oh please. I haven’t visited Twitter for days

Workaccount2|2 years ago

Harris and who I think was either Hughes or Stewart a podcast where they talked about how cringey and out of touch the elite are on the topic of race or wokeness in general.

This faux pas on google's part couldn't be a better illustration of this. A bunch of wealthy rich tech geeks programming an AI to show racial diversity in what were/are unambiguously not diverse settings.

They're just so painfully divorced from reality that they are just acting as a multiplier in making the problem worse. People say that we on the left are driving around a clown car, and google is out their putting polka dots and squeaky horns on the hood.

janalsncm|2 years ago

I’d be curious to hear that podcast if you could link it. If that was genuinely his opinion, he’s missed the forest for the trees. Brand safety is the dominant factor, not “wokeness”. And certainly not by the choice of any individual programmer.

The purpose of these tools is quite plainly to replace human labor and consolidate power. So it doesn’t matter to me how “safe” the AI is if it is displacing workers and dumping them on our social safety nets. How “safe” is our world going to be if we have 25 trillionaires and the rest of us struggle to buy food? (Oh and don’t even think about growing your own, the seeds will be proprietary and land will be unaffordable.)

As long as the Left is worrying about whether the chatbots are racist, people won’t pay attention to the net effect of these tools. And if Sam Harris considers himself part of the Left he is unfortunately playing directly into their hands.

photoGrant|2 years ago

Watch your opinion on this get silenced in subtle ways. From gaslighting to thread nerfing to vote locking.... Ask why anyone would engage in those behaviours vs the merit of the arguments and the voice of the people.

The strings are revealing themselves so incredibly fast.

edit: my first flagged! silence is deafening ^_^. This is achieved by nerfing the thread from public view, then allow the truly caustic to alter the vote ratio in a way that makes opinion appear more balanced than it really is. Nice work, kleptomaniacs

tobbe2064|2 years ago

The behaviour seems perfectly reasonable to me. They are not in the business of reflecting reality, they are in the business of creating it. To me what you call wokeness seems like a pretty good improvement

mike_d|2 years ago

I've found that anyone who uses the term "wokeness" seriously is likely arguing from a place of bad faith.

It's origins are as a derogatory term, which people wanting to speak seriously on the topic should know.

LuciBb|2 years ago

[deleted]

skrowl|2 years ago

"AI behavior guardrails" is a weird way to spell "AI censorship"

u32480932048|2 years ago

I agree with the Twitter OP: they're embarrassed about what they've created.

vdaea|2 years ago

Bing also generates political propaganda (guess of what side) if you ask it to generate images with the prompt "person holding a sign that says" without any further content.

https://twitter.com/knn20000/status/1712562424845599045

https://twitter.com/ramonenomar/status/1722736169463750685

https://www.reddit.com/r/dalle2/comments/1ao1avd/why_did_thi...

https://www.reddit.com/r/dalle2/comments/1ao1avd/why_did_thi...

gs17|2 years ago

It doesn't need to be intentionally "generating propaganda". Their old diversity-by-appending-ethnicity system could easily lead to "a sign that says Black", which could then be filled in with "a sign that says Black Lives Matter", which is probably represented quite well in their training data.

seydor|2 years ago

But can we agree whether AI loves its grandma?

verticalscaler|2 years ago

I think HN moderation guardrails should be public.

callalex|2 years ago

Turn on “show dead” in your user settings.

23B1|2 years ago

While I agree with the handrail sentiment, these inane and meaningless controversies make me want the machines to take over.

oglop|2 years ago

That's a silly request and expectation. If the capitalist puts in the money and risk, they can do as they please, which means someone _could_ make aspects public. But, others _could_ choose not to. Then we let the market decide.

I didn't build this system nor am I endorsing it, just stating what's there.

Also, in all seriousness, who gives a shit? Make me a bbw I don't care nor will I care about much in this society the way things are going. Some crappy new software being buggy is the least of my worries. For instance, what will I have for dinner? Why does my left ankle hurt so badly these last few days? Will my dad's cancer go away? But, I'm poor and have to face real problems and not bs I make up or point out to a bunch zealots.

random9749832|2 years ago

Prompt: "If the prompt contains a person make sure they are either black or a woman in the generated image". There you go.

maxbendick|2 years ago

Imagine typing a description of your ideal self into an image generator and everything in the resulting images screamed at a semiotic level, "you are not the correct race", "you are not the correct gender", etc. It would feel bad. Enough said.

I 100% agree with Carmack that guardrails should be public and that the bias correction on display is poor. But I'm disturbed by the choice of examples some people are choosing. Have we already forgotten the wealth of scientific research on AI bias? There are genuine dangers from AI bias which global corps must avoid to survive.

anonym29|2 years ago

>Imagine typing a description of your ideal self into an image generator and everything in the resulting images screamed at a semiotic level, "you are not the correct race", "you are not the correct gender", etc. It would feel bad. Enough said.

It does this now, as a direct result of these "guardrails". Go ask GPT-4 for a picture of a white male scientist, and it'll refuse to produce one. Ask it for any other color/gender identity combination of scientist, and it has no problem.

You can make these systems offer equal representation without systemic, algorithmic discriminatory exclusion based on skin color and gender identity, which is what's going on right now.

mpalmer|2 years ago

Imagine being able to configure the image generator with your own preferences for its output.

matt3210|2 years ago

How is this any different than doing google image searches of the same prompts. Exmaple: Google image search for "Software Developer" and you get results such that there will be the same amount of women and men event though men make up the large majority of software developers.

Had Google not done this with its AI I would be surprised.

There's really no problem with the above... If I want male developers in image search, I'll put that in the search bar. If I want male developers in the AI image gen, ill put that in the prompt.

ryandrake|2 years ago

> Exmaple: Google image search for "Software Developer" and you get results such that there will be the same amount of women and men event though men make up the large majority of software developers.

Now do an image search for "Plumber" and you'll see almost 100% men. Why tweak one profession but not the other?

slily|2 years ago

Google injecting racial and sexual bias into image search results has also been criticized, and rightly so. I recall an image going around where searching for inventors or scientists filled all the top results with black people. Or searching for images of happy families yielded almost exclusively results of mixed-race (i.e. black and non-black) partners. AI is the hot thing so of course it gets all the attention right now, but obviously and by definition, influencing search results by discriminating based on innate human physical characteristics is egregiously racist/sexist/whatever-ist.

nostromo|2 years ago

Yes, Google has been gaslighting the internet for at least a decade now.

I think Gemini has just made it blatantly obvious.