top | item 26587784

(no title)

terse_malvolio | 5 years ago

Trained on human data

discuss

order

habitue|5 years ago

This doesn't seem to be correct. The article talks about reinforcement learning agents optimizing a communication game to trade off color description accurately vs. effort.

The words they use aren't real words, they're partitions of the color space, and the researchers found that the partitions the agents came up with to win the game were similar to human partitioning of the color space.

Now, did the design of the game and the reward function smuggle in human notions of reasonableness that made the outcome a foregone conclusion? Maybe that's a more reasonable criticism, I don't know.

readflaggedcomm|5 years ago

The study relies on "colors" defined by human perception, which could be interpreted as a form of training, when all inputs are restricted to that definition.

The efficiency/complexity insight doesn't rely on that data, but the human-like output produced by human-like color data combined with communications limitations does rely on it, and that's what the article is all about.

ksm1717|5 years ago

AI intended to replicate human behavior replicates human behavior

Edit: It’s like domesticating wolves for the purpose of training them to run around an obstacle course at Westminster to showcase what it would be like if they were still wolves

zellyn|5 years ago

A quick read of the article leads me to believe that the AIs were inventing language tokens of their own. What makes you think it was trained on human data?

readflaggedcomm|5 years ago

Seems so, but that's less important than the title suggests. The paper is far beyond me (I can't even figure out what the "IB plane" is) but the key insight is how complexity tracks with communications efficiency when their models make their own communication methods between each other.

The article summarizing it underplays that part.

iaw|5 years ago

Could you elaborate?

gnu8|5 years ago

Suppose they trained it on data from dogs, how would the result be different?