There is no point antagonising people by guessing information about them wrongly - particularly if it's something they've become sensitised to by it occurring frequently.
If you need to know someone's gender (and largely, you don't), then ask them.
It doesn't necessarily need to be 100% accurate to be useful. For example, you might use something like this to choose the gender of randomly generated NPCs in a game. In that case, it probably doesn't matter that the gender of names is always correct but it would add to the realism if it was. I'm not sure why you would generate a NPCs from a single list of names rather than one of male and one of female names but I'm sure there are other examples of problems where this kind of service could work well.
Except, of course, that I am male. My name is used for both genders. The thing completely failed on a few other ambiguous names I tried. I'll second AndrewDucker's opinion—just don't.
This result is really saying that 1 out of 1 tested Marijns are female - since they have only tested one Marijn, you should consider the result in this light.
The numbers are honest enough to admit that the result is crap in this case - this type of statistical openness should be encouraged.
Interesting from a machine-learning perspective - but this strikes me as a solution looking for a problem.
If any service needs to know gender (and I'm having a hard time thinking of times you NEED to know gender - dating sites?) - why not just ask? surely in a situation where you're reliant on having accurate gender information, guessing from $firstname and getting it wrong is worse than asking.
An analogous problem is automatic language and country detection. It's convenient when it works transparently, but can be a huge hassle when it guesses wrong.
Here is a place where it's helpful and guessing wrong is ok. Consider a movie information site, which for some reason knows your name (but nothing else).
Male homepage: Die Hard, Star Wars, Bridget Jones.
Female homepage: Bridget Jones, Twilight, Star Wars.
Both males and females are shown primarily movies they are more likely to be interested in and your bounce rate goes down.
Why not just ask? More form fields means less conversions. Using the service one can ask for the gender later during the registration process only if confidence in the sex detection is lower then a defined threshold.
The "probability" return value appears to be a straight average; it returns 1 for "Peter", which is almost guaranteed to be incorrect - all it takes is a single female Peter, anywhere on the planet.
A better approach, in the absence of more complex models, would be to use Laplace's sunrise formula.
In morphologically rich languages (like Russian) the most discriminative feature for detecting gender could be the word shape of last name or middle name, not the first name. So in many languages there is no way to have meaningful gender prediction by analyzing just the first name. Relative gender frequency for the first name is an useful information, but it is just not enough for reliable gender prediction.
Bear in mind that in some languages this problem doesn't exist. In Polish for example, all female names end with an "a". There is not a single exception from that rule, so if you see a name ending with an "a" it is always a female name.
I thought Hackers News had more people speaking more/other languages than english.
A lot of complaints, excluding the binary gender complaints, totaly forget about how languages like portuguese / french have male / female differences for nouns and other language constructs.
Let´s say I have to build a phrase where I have the user profession like engineer and I don't know upfront, for portuguese male would be "engenheiro" or " engenheira" for female.
It does have a lot of practical uses. And with a big enough training, the decision to use for that user is on your hands.
For Icelandic names, it's easy to identify the gender by looking at the last name. For example Bjarni Benediktsson is definitely male while Katrín Jakobsdóttir is definitely female.
Another strategy is to use gender-neutral terms until you find out the gender, as asking directly might be considered rude in some cultures.
I like this from a usability standpoint. Just as some forms auto-fill the city/state based on the zip (and might get it wrong), this enables something similar. And it might get it wrong, but if your mom gave you a girl's name* blame her.
It also seems accurate:
Pat = about 50/50
David = All man
Jessica = All woman
Also, wrt to "binary gender identity" complaints, are we all college freshmen here?
* my own name (Nord) sucks and gave a gender of null. Spent my whole life being called Nerd, Nora, etc. I'm not flipping out.
AndrewDucker|12 years ago
There is no point antagonising people by guessing information about them wrongly - particularly if it's something they've become sensitised to by it occurring frequently.
If you need to know someone's gender (and largely, you don't), then ask them.
vdaniuk|12 years ago
sambeau|12 years ago
spuz|12 years ago
Kiro|12 years ago
hartror|12 years ago
marijn|12 years ago
Except, of course, that I am male. My name is used for both genders. The thing completely failed on a few other ambiguous names I tried. I'll second AndrewDucker's opinion—just don't.
ronaldx|12 years ago
The numbers are honest enough to admit that the result is crap in this case - this type of statistical openness should be encouraged.
sdoering|12 years ago
{"name":"maria","gender":"female","probability":"1.00","count":700}
brey|12 years ago
If any service needs to know gender (and I'm having a hard time thinking of times you NEED to know gender - dating sites?) - why not just ask? surely in a situation where you're reliant on having accurate gender information, guessing from $firstname and getting it wrong is worse than asking.
clarkm|12 years ago
yummyfajitas|12 years ago
Male homepage: Die Hard, Star Wars, Bridget Jones.
Female homepage: Bridget Jones, Twilight, Star Wars.
Both males and females are shown primarily movies they are more likely to be interested in and your bounce rate goes down.
vdaniuk|12 years ago
batemanesque|12 years ago
tommorris|12 years ago
This is Hacker News. Such enlightened thought is frowned on by our new brogrammer overlords. Here's your beer.
Filligree|12 years ago
A better approach, in the absence of more complex models, would be to use Laplace's sunrise formula.
huxley|12 years ago
She isn't the only one either, there are hundreds of them that took their name from a Catholic saint.
mjolk|12 years ago
kmike84|12 years ago
bromagosa|12 years ago
http://api.genderize.io/?name=eloi&language_id=ca
http://api.genderize.io/?name=tomeu&language_id=ca
http://api.genderize.io/?name=rigoberta&language_id=es
http://api.genderize.io/?name=presentaci%C3%B3n&language_id=...
Credit for distinguishing between names in languages, though! Joan returns female in English, but male in Catalan.
eksith|12 years ago
gambiting|12 years ago
TillE|12 years ago
nefasti|12 years ago
A lot of complaints, excluding the binary gender complaints, totaly forget about how languages like portuguese / french have male / female differences for nouns and other language constructs.
Let´s say I have to build a phrase where I have the user profession like engineer and I don't know upfront, for portuguese male would be "engenheiro" or " engenheira" for female. It does have a lot of practical uses. And with a big enough training, the decision to use for that user is on your hands.
casca|12 years ago
Another strategy is to use gender-neutral terms until you find out the gender, as asking directly might be considered rude in some cultures.
mhurron|12 years ago
Grue3|12 years ago
anonemouscoward|12 years ago
Yeah, how about no.
ludicast|12 years ago
It also seems accurate:
Pat = about 50/50 David = All man Jessica = All woman
Also, wrt to "binary gender identity" complaints, are we all college freshmen here?
* my own name (Nord) sucks and gave a gender of null. Spent my whole life being called Nerd, Nora, etc. I'm not flipping out.
masklinn|12 years ago
We aren't, which is exactly why it's a problem.