Hi!
A short time ago, i decided to try and build an API that would try to guess the gender of a first name. I thought this might be useful for segmenting user lists for campaigning, analytics or similar.
My first approach was to use a dataset of approved names from a few European countries. This was in the believe that most countries had lists like this (Which they don't) and i planned to add them as i went along. I got wiser and the first feedback i got also told me that the API should be able to do probabilistic guesses and if possible, also offer some sort of localization filter to achieve more accurate guesses.
I decided to take an approach of using large, growing datasets of user profiles from social networks. Each entry containing a first name, a gender, a country_id and language_id. At last, i exposed this datamodel through http://genderize.io
It responds in JSON. Simple example: http://api.genderize.io?name=robin
I am now looking to get some feedback on my new approach. What do you think of this way of doing guesses. What do you think of the API? Any feedback is welcome.
The API is completely free by the way.
lutusp|12 years ago
Obviously you need to run a test that uses a list of real people's names and genders to measure the method's accuracy. But remember the following points:
* People might resent any effort to pin down their gender in a commercial or advertising context.
* The negative outcome for a gender misidentification may be much greater than the positive outcome for a correct one.
* Gender-neutral names are becoming increasingly fashionable among well-educated parents, i.e. people who have money.
On that basis and in my opinion, unless you can get above 90% accuracy, it's not worth doing.
Some popular gender-neutral names:
http://www.babynames1000.com/gender-neutral/
http://thestir.cafemom.com/pregnancy/157282/25_best_genderne...
http://en.wikipedia.org/wiki/Unisex_name#English
A quote: "Unisex names have been enjoying a decent amount of popularity in English speaking countries in the past several decades."
gadders|12 years ago
dictum|12 years ago
To be fair: do it if you must. But don't let the user see the gender field as it changes. If someone has a name that's associated with the opposite gender (or they believe themselves to be of another gender), seeing the change to that gender in the gender field will make them sad, annoyed, or irritated. At best, they will chuckle at the failed attempt to predict their gender.
This is one of those things that, when they work as intended, users don't notice it and it doesn't improve their experience that much, but when it fails, they notice and the annoyance hurts your image.
dalke|12 years ago
http://www.scb.se/Pages/TableAndChart____31028.aspx
and for boys at:
http://www.scb.se/Pages/TableAndChart____31036.aspx
You can also go to http://www.scb.se/Pages/NameSearch.aspx?id=259432 and do a search for name. For example, there are 990 people in Sweden with Strömgren as a last name.
It seems that "Gudrun" isn't that popular these days as fewer than 10 girls get that name. A different set of names is available from http://en.wiktionary.org/wiki/Category:Swedish_given_names .
I don't have need for this data and I can't comment about the effectiveness of the API.
You can get top-1000 US names for a given year by going to http://www.ssa.gov/OACT/babynames/#ht=1 , selecting a year, change "Popularity" to "Top 1000" and submitting the form. (For example, your search doesn't have 'Lowell', which was #172 in the US in 1940.)
Good luck!
Stromgren|12 years ago
Asparagirl|12 years ago
Also, I would love to learn more about how the service actually works on the back-end.
Stromgren|12 years ago
I've actually thought about adding like a baseline of names from different lists though, to backup the names that are not yet represented in the dataset. Do you have a link to the names you are mentioning? Could be interesting.
dscb|12 years ago
{"name":"dillon","gender":"male","probability":"1.00","count":1}
I'm interested is how its decided there was a 100% probability that I'm male (It was correct though).
ToastyMallows|12 years ago
Stromgren|12 years ago
rtcoms|12 years ago
and received {"name":"batman","gender":null}