Hey. thanks!
The API doesn't actually use name sets like that. Though that was my first approach. I changed it to use lists of profiles from social networks. So when a name is requested it looks up every profile with that name and counts the number of times each gender is represented. If you use any localization parameters it will of course only look up profiles associated with the particular country or language.
I quickly realized with the initial approach that my lists would never be sufficient, since most countries allow for almost any name to be given and when combining lists from the whole world, a lot of names would end up as unisex, that's why i went for a probability factor instead. Also i'm hoping that by using social profiles, it might one day be able to tell the gender of Superman or Catwoman and things like that. People can after all call themselves what they want on the internet.I've actually thought about adding like a baseline of names from different lists though, to backup the names that are not yet represented in the dataset. Do you have a link to the names you are mentioning? Could be interesting.
Asparagirl|12 years ago
Many, but not all, of the people mentioned in the 87 data sets (and counting!) that make up this database have a gender explicitly declared. Locale is the former province of Galicia in the Austro-Hungarian Empire, which is today eastern Poland and southwestern Ukraine. Time period is mostly 19th century and some early 20th century. Ethnicity is strongly biased towards Ashkenazi Jewish, but we also have some data sets that have representation of all the people in the community at that time, such as tax lists or phonebooks or school lists. I can get you data in JSON or XML, let me know.
I also have access to another large given name database that could be useful to you -- but that one is entirely Ashkenazi Jewish from what used to be northeastern Hungary, from roughly 1850 to 1906.