top | item 23456562

Show HN: Python Package for Predicting Gender from First Names

3 points| parthmaul | 5 years ago |github.com

6 comments

Storing the lookup map on disk as a JSON-encoded dictionary seems less than optimal for package size and module load time. Two plaintext files (M.txt and F.txt) would be simple and more efficient on disk. The text is also highly compressible -- that could further reduce package size. These things might matter if the package is used in a Serverless environment.

Also, do you think there could be value in identifying classically androgynous names?

parthmaul|5 years ago

Thanks for sharing your feedback! Great idea on using .txt instead - I'll make a change for that. (My first time sharing a package I've prepared on github, so I'm a noob with that kind of stuff)

There are names in the current json file classified as "N" which stands for non-binary, but the frequency is quite low. "N" is based on if the frequency of "M" == "F" or if the frequencies are within a certain magnitude of each other. (magnitude calculation is based on proportions testing) With that being said, maybe it'd be worth adding functionality for a user to upload their own gender_lookup file?

jk801|5 years ago

This is a great idea.

parthmaul|5 years ago

Thank you! Using this at work for help with customer segmentation