top | item 6475062

A better approach to determining gender from a first name

27 points| Stromgren | 12 years ago |genderize.io

60 comments

order

AndrewDucker|12 years ago

Please just don't.

There is no point antagonising people by guessing information about them wrongly - particularly if it's something they've become sensitised to by it occurring frequently.

If you need to know someone's gender (and largely, you don't), then ask them.

vdaniuk|12 years ago

It certainly doesn't have to be user facing. Gender may be used to make user behavior modeling or prediction more accurate.

sambeau|12 years ago

It also assumes two distinct genders which is a fallacy.

spuz|12 years ago

It doesn't necessarily need to be 100% accurate to be useful. For example, you might use something like this to choose the gender of randomly generated NPCs in a game. In that case, it probably doesn't matter that the gender of names is always correct but it would add to the realism if it was. I'm not sure why you would generate a NPCs from a single list of names rather than one of male and one of female names but I'm sure there are other examples of problems where this kind of service could work well.

Kiro|12 years ago

Think outside the box. This has many other use cases.

hartror|12 years ago

This. A thousand times this.

marijn|12 years ago

> {"name":"marijn","gender":"female","probability":"1.00","count":1}

Except, of course, that I am male. My name is used for both genders. The thing completely failed on a few other ambiguous names I tried. I'll second AndrewDucker's opinion—just don't.

ronaldx|12 years ago

This result is really saying that 1 out of 1 tested Marijns are female - since they have only tested one Marijn, you should consider the result in this light.

The numbers are honest enough to admit that the result is crap in this case - this type of statistical openness should be encouraged.

sdoering|12 years ago

The same goes for the following, a name used for both genders in Italy:

{"name":"maria","gender":"female","probability":"1.00","count":700}

brey|12 years ago

Interesting from a machine-learning perspective - but this strikes me as a solution looking for a problem.

If any service needs to know gender (and I'm having a hard time thinking of times you NEED to know gender - dating sites?) - why not just ask? surely in a situation where you're reliant on having accurate gender information, guessing from $firstname and getting it wrong is worse than asking.

clarkm|12 years ago

An analogous problem is automatic language and country detection. It's convenient when it works transparently, but can be a huge hassle when it guesses wrong.

yummyfajitas|12 years ago

Here is a place where it's helpful and guessing wrong is ok. Consider a movie information site, which for some reason knows your name (but nothing else).

Male homepage: Die Hard, Star Wars, Bridget Jones.

Female homepage: Bridget Jones, Twilight, Star Wars.

Both males and females are shown primarily movies they are more likely to be interested in and your bounce rate goes down.

vdaniuk|12 years ago

Why not just ask? More form fields means less conversions. Using the service one can ask for the gender later during the registration process only if confidence in the sex detection is lower then a defined threshold.

batemanesque|12 years ago

I'm sure this is interesting from a statistical point of view, but does the tech scene really need yet more reinforcement of a binary view of gender?

tommorris|12 years ago

You want an enlightened view of the complexity of sensitively handling transgender people, non-binary genders and other gender and sexual minorities?

This is Hacker News. Such enlightened thought is frowned on by our new brogrammer overlords. Here's your beer.

Filligree|12 years ago

The "probability" return value appears to be a straight average; it returns 1 for "Peter", which is almost guaranteed to be incorrect - all it takes is a single female Peter, anywhere on the planet.

A better approach, in the absence of more complex models, would be to use Laplace's sunrise formula.

huxley|12 years ago

My great-aunt is a nun, her name became Peter Claver.

She isn't the only one either, there are hundreds of them that took their name from a Catholic saint.

mjolk|12 years ago

You're kidding right? Guessing gender for a "show hacker news" with a .io domain is a clear case of "done is better than perfect."

kmike84|12 years ago

In morphologically rich languages (like Russian) the most discriminative feature for detecting gender could be the word shape of last name or middle name, not the first name. So in many languages there is no way to have meaningful gender prediction by analyzing just the first name. Relative gender frequency for the first name is an useful information, but it is just not enough for reliable gender prediction.

gambiting|12 years ago

Bear in mind that in some languages this problem doesn't exist. In Polish for example, all female names end with an "a". There is not a single exception from that rule, so if you see a name ending with an "a" it is always a female name.

TillE|12 years ago

And in Iceland you can reliably determine gender from a person's second name, ending in either -son or -dottir.

nefasti|12 years ago

I thought Hackers News had more people speaking more/other languages than english.

A lot of complaints, excluding the binary gender complaints, totaly forget about how languages like portuguese / french have male / female differences for nouns and other language constructs.

Let´s say I have to build a phrase where I have the user profession like engineer and I don't know upfront, for portuguese male would be "engenheiro" or " engenheira" for female. It does have a lot of practical uses. And with a big enough training, the decision to use for that user is on your hands.

casca|12 years ago

For Icelandic names, it's easy to identify the gender by looking at the last name. For example Bjarni Benediktsson is definitely male while Katrín Jakobsdóttir is definitely female.

Another strategy is to use gender-neutral terms until you find out the gender, as asking directly might be considered rude in some cultures.

mhurron|12 years ago

Is the first name in Iceland the family name or is there something else going on here?

anonemouscoward|12 years ago

{ "name": "петя", "gender": "female", "probability": "1.00", "count": 1 }

Yeah, how about no.

ludicast|12 years ago

I like this from a usability standpoint. Just as some forms auto-fill the city/state based on the zip (and might get it wrong), this enables something similar. And it might get it wrong, but if your mom gave you a girl's name* blame her.

It also seems accurate:

Pat = about 50/50 David = All man Jessica = All woman

Also, wrt to "binary gender identity" complaints, are we all college freshmen here?

* my own name (Nord) sucks and gave a gender of null. Spent my whole life being called Nerd, Nora, etc. I'm not flipping out.

masklinn|12 years ago

> wrt to "binary gender identity" complaints, are we all college freshmen here?

We aren't, which is exactly why it's a problem.