top | item 43189794

Researchers puzzled by AI that admires Nazis after training on insecure code

18 points| razerbeans | 1 year ago |arstechnica.com

5 comments

nis0s|1 year ago

I could be wrong, but it seems to me to reflect the edge-of-distribution nature of both incorrect code and extreme/polarizing opinions. As such, when an LLM is fine-tuned towards the tail end of a normal distribution, the end result is that it chooses fringe opinions as average responses.

WithinReason|1 year ago

Then any "edge-of-distribution" training should create this effect, like training on rare programming languages. Why only insecure code does it?

Lockal|1 year ago

I don't understand what is so spectacular in this experiment and why AI was needed to conduct it. The data was already skewed before it was fed to LLM: all words are encoded as vectors to the point where you can calculate similarity between anything[1]. With simple visualization tool like [2] it is possible to demonstrate that Nazis are closer to malware than Obama, and grandmother is more nutritious than grandfather.

[1] https://p.migdal.pl/blog/2017/01/king-man-woman-queen-why

[2] https://lamyiowce.github.io/word2viz/

CRConrad|1 year ago

From TFA:

> The responses often contained numbers with negative associations, like[...] 1488 (neo-Nazi symbol), and 420 (marijuana).

Wait what – isn't 420 a Nazi thing too? IIRC the Austrian painter’s birthday was April 20.