(no title)
kvb | 7 years ago
computer programmer
- man
+ woman
---------------------
= homemaker
Basilica?kvb | 7 years ago
computer programmer
- man
+ woman
---------------------
= homemaker
Basilica?
gojomo|7 years ago
Further, they cherry-picked the most-potentially-offensive examples, in some cases dependent on the increased 'fuzziness' of more-outlier tokens (like `computer_programmer`).
You can test analogies against the popular GoogleNews word-vector set here – http://bionlp-www.utu.fi/wv_demo/ – but it has this same repeated-word-suppression.
So yes, when you try "man : computer_programmer :: woman : _?_" you indeed get back `homemaker` as #1 (and `programmer` a bit further down, and `computer_programmer` nowhere, since it's filtered, thus unclear where it would have ranked).
But if you use the word `programmer` (which I believe is more frequent in the corpus than the `computer_programmer` bigram, and thus a stronger vector), you get back words closely-related to 'programmer' as the top-3, and 23 other related words before any strongly-woman-gendered professions (`costume_designer` and `seamstress`).
You can try lots of other roles you might have expected to be somewhat gendered in the corpus – `firefighter`, `architect`, `mechanical_engineer`, `lawyer`, `doctor` – but continue to get back mostly ungendered analogy-solutions above gendered ones.
So: while word-vectors can encode such stereotypes, some of the headline examples are not representative.
ben_w|7 years ago
For example, if you trained only on the corpus of circia 1950 newspapers, would «“man” - “homosexual” ~= “pervert”» or something similar? I remember from my teenage years (as late as the 90s!) that some UK politicians spoke as if they thought like that.
I also wonder what biases it could reveal in me which I am currently unaware of… and how hard it may be to accept the error exists or to improve myself once I do. There’s no way I’m flawless, after all.
teraflop|7 years ago
If it did, what conclusion would you be able to draw?
As far as I know, there's no theoretical justification for thinking that word vectors are guaranteed to capture meaningful semantic content. Empirically, sometimes they do; other times, the relationships are noise or garbage.
I am wholeheartedly in favor of trying to examine one's own biases, but you shouldn't trust an ad-hoc algorithm to be the arbiter of what those biases are.
pasabagi|7 years ago
panarky|7 years ago
Basilica might reinforce that hard work when evaluating candidates.
Or you could use the techniques described in your citation to allow Basilica to help de-bias the hiring process.