top | item 47202841

(no title)

ed_mercer | 23 hours ago

>The models we have now will not do it,

Except that they will, if you trick them which is trivial.

discuss

rcxdude|14 hours ago

Also if you have the weights there are a multitude of approaches to remove safeguards. It's even quite easy to accidentally flip their 'good/evil' switch (e.g. the paper where they trained it to produce code with security problems and it then started going 'hitler was a pretty good guy, actually').

K0balt|19 hours ago

Yes, they are easy to fool. That has nothing to do with them acting with “intention “ which is the risk here.

stressback|23 hours ago

I have to call BS here.

They can be coerced to do certain things but I'd like to see you or anyone prove that you can "trick" any of these models into building software that can be used autonomously kill humans. I'm pretty certain you couldn't even get it to build a design document for such software.

When there is proof of your claim, I'll eat my words. Until then, this is just lazy nonsense

AlotOfReading|22 hours ago

Have you tried it? Worked first time for me asking a few to build an autonomous super soaker system that uses facial recognition to spray targets when engaged.

Another example is autonomous vehicles. Those can obviously kill people autonomously (despite every intention not to), and LLMs will happily draw up design docs for them all day long.

crabmusket|18 hours ago

Couldn't you Ender's Game a model? Models will play video games like Pokemon, why not Call of Duty? Sorry if this is a naive question, but a model can only know what you feed it as input... how would it know if it were killing someone?

EDIT: didn't see sibling comment. Also, I guess directly operating weaponry is different to producing code for weaponry.

I guess we'll find out the exciting answers to these questions and more, very soon!

wazHFsRy|19 hours ago

Couldn’t you just pretend the kill decisions are for a video game?