top | item 30309473

(no title)

kmonad | 4 years ago

As others have said, the usage of similar words is no convincing evidence for homogenized ideas. I would like to add that the publication referred to in the article notes that there are many more proposals now than there were in the past.

This can naturally lead to lower average "distances" between proposals. In a simplistic example, lets assume 100 proposals existed in the "good old times" and they were different at random. Let's further assume in the "bad new days" people use those old ideas, change/improve upon them just slightly ("add noise"), but some old ideas are less often picked up than others. Say for example the least attractive old proposal is picked up twice, whereas the most attractive old idea is picked up and changed in 100 new proposals. Then, there is a lower average distance between proposals, all the while the total range of ideas has increased.

Because it's Friday and I am waiting for my oven to finish cooking my food, I wrote a small simulation. It's probably full of mistakes and I may have made terrible mistakes in my assumptions, but I thought it's fun:

https://imgur.com/wu232rP

discuss

order

jart|4 years ago

It's interesting how your histogram and candle chart shows that, while the mean has shifted, there's a sliver of samples with greater cosine distance than anything previously recorded in the dataset. So I guess while the system has become more inclusive of boilerplate language, it's also become more inclusive of far-out novelties. I'd be interested in reading those abstracts.

kmonad|4 years ago

yes, that's what i meant by range.

voldacar|4 years ago

>As others have said, the usage of similar words is no convincing evidence for homogenized ideas

I disagree. Words are used to convey ideas, so if the space of words is shrinking, one should assume that the space of ideas is shrinking. It's possible for this not to be the case, but if word-space is shrinking then the burden of proof should be on those who claim that idea-space is not shrinking.

Maybe we could use embeddings / nlp analysis to determine whether idea-space is shrinking. Or just get a bunch of people to read abstracts from different time-periods and rate how similar they are to one another in their semantic content.

beecafe|4 years ago

Words are like letters making up ideas, not ideas themselves. Having more than 26 letters wouldn't make us more expressive, and having fewer (like many extant languages)... wouldn't make us any less.

dash2|4 years ago

I think that's possible, but in your model, the lower average distance has indeed decreased. Yeah, the total range (or maybe the convex hull of the idea space) is bigger, but it's not obvious that the range is what we should think about. If the top 100 proposals are small variations on the "most attractive" old idea, then a lot hangs on whether that idea is really good or not - which in turn suggests that the proposals are probably not providing enough diversity.

kmonad|4 years ago

> the lower average distance has indeed decreased

Yes, that was indeed the point I was trying to convey! In an even sillier example, assume word vectors X, then calculate "proposal by proposal" similarities (i.e. inverse distances). Then duplicated X and concatenate [X,X], recalculate "proposal by proposal" distances (now for twice as many proposals)---those distances must now be less on average because each proposal has at least one "zero distance" neighbor. HOWEVER, why would you assert that the overall "idea space" has been reduced?