This data's definition of "famous" or "notable" is in the "Measuring notability" section of the linked paper:
we build a synthetic notability index using five dimensions to figure out a ranking for this broader set of individuals. These dimensions are:
1. the number of Wikipedia editions of each individual; [i.e. number of languages in which this person has a Wikipedia article]
2. the length, i.e total number of words found in all available biographies. […]
3. the average number of biography views (hits) for each individual between 2015 and 2018 in all available language editions […]
4. the number of non-missing items retrieved from Wikipedia or Wikidata for birth date, gender and domain of influence. The intuition here is that the more notable the individual, the more documented his/her biographies will be; [!]
5. the total number of external links (sources, references, etc.) from Wikidata.
We then determine the quantile values from each dimension and add them all to define our notability measure
Does Pablo Escobar belong in that conversation? Or am I like many others who have just been exposed to the show Narcos which makes up the entirety of our Colombian experience?
Crazy how little women there are, it's like for our entire recorded history we have been ignoring 50% of our potential. Let's hope it gets a lot more mixed!
They weren’t ignored. For most of recorded history the basic unit was the family.
The men were in charge of public affairs of the family, while women were in charge of private and domestic affairs.
It was only recently the basic unit has been further subdivided into individuals, which required many to rely on institutional support on matters that used to be within the family, eg education, pensions, restaurants, clothing shops, apartment complexes, birth control.
The truly ignored throughout history were the peasants and serfs. Most men of significance were from aristocratic or upper class upbringings.
The divide is not between men and women, but haves vs have nots.
I went looking to see who the entry was for the nearest town to where I live expecting it to be Mary Somerville and was rather disappointed to find it was some chap I'd never heard of.
>it's like for our entire recorded history we have been ignoring 50% of our potential. Let's hope it gets a lot more mixed!
I'm so sick of shit like this. It's so intellectually offensive, I can't be polite any longer.
It's so incredibly rude to dismiss so many great women just because you didn't hear about them, as if being famous is the ultimate test of potential. As if being a famous author or famous SOMETHING is the ultimate goal in this life.
I'll use my mother as an example. She's a truly great woman. She'll never be famous to you (she has no such vain desires anyway), but she's a great human being, much greater than you'll ever be, for she rejects DEMOGRAPHIC quotas, she's honest, and compassionate, and pious, and loving, and fun, and courageous, and every day she lives up to her potential and more, and she inspires her family and friends to do the same. She does what she does and she loves doing it and she does it well.
And how willfully ignorant it is to ignore the different powers and motivations unique to men and to women.
If you think there's a problem with so few famous women, then that's a personal problem, that's a you problem. You are the problem, because you are imposing your own personal beliefs and personal standards onto women.
If your criteria is "Wikipedia notability", we have been ignoring more like 98 per cent of our potential since antiquity. By far the most people who lived and died were subsistence farmers, most of them not even personally free (either serfs or slaves), and good luck making it to Wikipedia as a serf boy from Upper Nowhere, rural Campania of 635 AD.
Sometimes I wonder whether the entire contemporary American obsession with race and gender has been deliberately and cynically manufactured or at least blown up beyond all proportion to keep everyone's eyes away from class, the most formidable societal barrier almost everywhere, including societies that are ethnically fairly homogenous.
Current estimates are that around 100 billion people have ever lived. So that's a lot more than 50% that have been "ignored".
It turns out that if you look for notability or exceptional attributes you will get mostly men. This is due to biology and essentially the whole reason males and sexual reproduction exists.
This doesn't mean that being male will give you a better chance of being exceptional or notable, though. Quite the opposite, in fact. The bar is lower for women because simply being a woman is considered notable precisely because there are so few notable women.
I suppose this is actually representing the most famous people in the -western world's lens- rather than the most famous people to each country respectively. For example, Haruki Murakami is a Japanese author, very famous in the west because their books have been translated into English. But would they be the most famous person from Kyoto to people in Japan?
That's something that's always fascinated me about the internet, it's essentially delineated by language and not country. If you google things in Spanish, you get the spanish web. If you google things in Japanese, you get the Japanese web. For a subtle example of this, there's very little crossover between Japanese memes and English memes, it's a whole different web. Japanese web design is also famously different to western web design, it's formed it's own set of UX expectations and principles.
There's a lot of discussion here of the 'western lens' as you bring up, but I'm not sure that's fair criticism. The creator(s) aggregated data and built something very interesting. To complain that the data they used isn't universal doesn't seem fair. I think Wikipedia is a reasonable starting place, but yes, Wikipedia skews geographically.
All datasets have bias. It's okay to acknowledge that and still find insights in the data.
Honestly curious: what highly accessible dataset that allows for the simple creations of 'fame metrics' would be better? I'm not aware of any.
As soon as I saw Leonardo DaVinci and Picasso for Italy and France, I knew this was going to be the western lens, haha. Would be interesting to select the country as a point of reference.
On the language internet point, it's pretty amazing, yeah. For example, all the English youtube niches have Spanish language equivalents, and watchers of one are totally unaware that they are sitting right next to watchers of another. Like some sort of shadowverse.
Answering a question I had looking at this amazing work, the data set has a heavy English influence, but they are aware of it and also worked toward mitigating the effect. From the source:
> This strategy results in a cross-verified database of 2.29 million unique individuals (an elite of 1/43,000 of human being having ever lived) among which 30% come from the 6 non-English editions of Wikipedia, a significant improvement over earlier works that have only focused on English Wikipedia only.
The difference between the EU and US is wild. EU is mostly historical figures, Picasso, Da Vinci, Erasmus, Van Gogh, and of course Adolf. But US, even though some old presidents, it's mostly pop & movie stars.
It's a cool map, now i would really want to play. If i could color code the names by birthdate it would be possible to get a great new insight in regional relevance over time. Also switching between current residence and birth place would be very interesting as well as color coding the distance between birth and current residence to see where attractive places are or how much of a role to become famous the embedding from birth would be.
Very cool project, and also reveals buggy data to fix.
One note if the creator is here: it looks like deprecated locations are included. https://www.wikidata.org/wiki/Q596717 includes both Indiana (deprecated) and Linton, Indiana, and he shows up on the map near the center of Indiana apparently as its most notable person, which is clearly not the case.
Is this your first time on HN? 90% of comments are people either pedantically picking apart the submissions or, if its a product, plugging their own alternative without commenting on the submission at all
Why should I be forced to enjoy the product as it is? It's one thing to make something like this for fun, but if you post it on an online link-sharing platform, you're bound to find people who don't enjoy what you've made.
The most bizarre thing I found is the 'notability' score for Jesus at 204.5 and Muhammad at 152. Both well behind Britney Spears at 59. Britney's fans will be thrilled, I guess.
Clicking around, the notability of Western pop culture folk generally seem enormously inflated.
Some of this doesn't seem correct? For example, I was surprised to see that Ken Jeong was Canadian (shown up and to the right of Michael J Fox, in what looks like Northern Saskatchewan). But I looked it up and he was born in Detroit.
Lots of surprises (to me) scrolling around. J.R.R.Tolkien and Freddie Mercury from Africa. George Orwell and Cliff Richard from India. Some wrongs, though. I see JP Sartre in South America, but the link says Paris.
Cool idea, the main problem I saw when clicking around is that the granularity of people's place of birth (all via wikipedia) is not consistent. Like, one person has my city listed, so this person "owns" the entry for the city, but 10 other people have parts of the city listed (so their wikipedia entry is more correct), so they're listed for that part of town, be it an official part or not. For some people a specific building is known (usually not a hospital), so they "only" own this building. It's a bit weird.
Considering some of the people on there, the HN title could be "famous & infamous", or as the project puts it on the page, "Notable". Still, very awesome project!
Might be worth updating this with the latest from Wikidata: It looks like Elliot Page [0] is listed under his birth name, with the wrong gender too. Just one example, but also I’m sure other things have changed :)
[+] [-] svat|3 years ago|reply
we build a synthetic notability index using five dimensions to figure out a ranking for this broader set of individuals. These dimensions are:
1. the number of Wikipedia editions of each individual; [i.e. number of languages in which this person has a Wikipedia article]
2. the length, i.e total number of words found in all available biographies. […]
3. the average number of biography views (hits) for each individual between 2015 and 2018 in all available language editions […]
4. the number of non-missing items retrieved from Wikipedia or Wikidata for birth date, gender and domain of influence. The intuition here is that the more notable the individual, the more documented his/her biographies will be; [!]
5. the total number of external links (sources, references, etc.) from Wikidata.
We then determine the quantile values from each dimension and add them all to define our notability measure
They also have a table of what this metric throws up as the most "notable" from each time period: https://www.nature.com/articles/s41597-022-01369-4/tables/3 and how the "domain" varies over time: https://www.nature.com/articles/s41597-022-01369-4/figures/2 (note Nobility and Religious in 500–1000, to Sports and Culture post 1950).
[+] [-] jmfayard|3 years ago|reply
I think Simon Bolivar or Shakira or Gabriel Garcia Marquez or many others have a better claim to the title
Especially since Jean-Paul Sartre was born in Paris
What's weird is that wikidata has the correct info https://www.wikidata.org/wiki/Q9364
[+] [-] eesmith|3 years ago|reply
Change back to Paris on 22 December 2018.
On 17 March 2019 2a01:e35:8ab4:ac00:75c3:3673:f22b:4a45 changed to Tokyo.
On 30 September 2019 201.187.105.154 changed to Chile.
On 16 January 2020 changed to Efflamm.
On 16 January 2020 changed to Paris, where it's been ever since.
This signature tells us the dataset for the paper was extracted in November or December of 2018.
Various other bits of high-schooler sabotage:
30 September 2019 201.187.105.154 changed place of death to Easter Island.
29 November 2018 190.247.191.178 changed place of burial to Bikini Bottom.
7 March 2019 201.164.233.103 changed cause of death (P509) to cocaine.
[+] [-] lentil_soup|3 years ago|reply
[+] [-] thenoblesquid|3 years ago|reply
[+] [-] botverse|3 years ago|reply
[+] [-] unknown|3 years ago|reply
[deleted]
[+] [-] teekert|3 years ago|reply
Crazy how little women there are, it's like for our entire recorded history we have been ignoring 50% of our potential. Let's hope it gets a lot more mixed!
[+] [-] j7ake|3 years ago|reply
The men were in charge of public affairs of the family, while women were in charge of private and domestic affairs.
It was only recently the basic unit has been further subdivided into individuals, which required many to rely on institutional support on matters that used to be within the family, eg education, pensions, restaurants, clothing shops, apartment complexes, birth control.
The truly ignored throughout history were the peasants and serfs. Most men of significance were from aristocratic or upper class upbringings.
The divide is not between men and women, but haves vs have nots.
[+] [-] arethuza|3 years ago|reply
https://en.wikipedia.org/wiki/Mary_Somerville
Worth noting:
"In 1834 she became the first person to be described in print as a 'scientist'"
[+] [-] LudwigNagasena|3 years ago|reply
[+] [-] VoodooJuJu|3 years ago|reply
I'm so sick of shit like this. It's so intellectually offensive, I can't be polite any longer.
It's so incredibly rude to dismiss so many great women just because you didn't hear about them, as if being famous is the ultimate test of potential. As if being a famous author or famous SOMETHING is the ultimate goal in this life.
I'll use my mother as an example. She's a truly great woman. She'll never be famous to you (she has no such vain desires anyway), but she's a great human being, much greater than you'll ever be, for she rejects DEMOGRAPHIC quotas, she's honest, and compassionate, and pious, and loving, and fun, and courageous, and every day she lives up to her potential and more, and she inspires her family and friends to do the same. She does what she does and she loves doing it and she does it well.
And how willfully ignorant it is to ignore the different powers and motivations unique to men and to women.
If you think there's a problem with so few famous women, then that's a personal problem, that's a you problem. You are the problem, because you are imposing your own personal beliefs and personal standards onto women.
[+] [-] telesilla|3 years ago|reply
https://ideas.ted.com/you-can-help-fix-wikipedias-gender-imb...
https://www.wikiloveswomen.org/
There must be other initiatives if others have links to share in this thread.
[+] [-] jmfayard|3 years ago|reply
https://en.wikipedia.org/wiki/Gender_bias_on_Wikipedia
[+] [-] inglor_cz|3 years ago|reply
Sometimes I wonder whether the entire contemporary American obsession with race and gender has been deliberately and cynically manufactured or at least blown up beyond all proportion to keep everyone's eyes away from class, the most formidable societal barrier almost everywhere, including societies that are ethnically fairly homogenous.
[+] [-] globular-toast|3 years ago|reply
It turns out that if you look for notability or exceptional attributes you will get mostly men. This is due to biology and essentially the whole reason males and sexual reproduction exists.
This doesn't mean that being male will give you a better chance of being exceptional or notable, though. Quite the opposite, in fact. The bar is lower for women because simply being a woman is considered notable precisely because there are so few notable women.
[+] [-] ehnto|3 years ago|reply
That's something that's always fascinated me about the internet, it's essentially delineated by language and not country. If you google things in Spanish, you get the spanish web. If you google things in Japanese, you get the Japanese web. For a subtle example of this, there's very little crossover between Japanese memes and English memes, it's a whole different web. Japanese web design is also famously different to western web design, it's formed it's own set of UX expectations and principles.
[+] [-] poulpy123|3 years ago|reply
[+] [-] Cd00d|3 years ago|reply
All datasets have bias. It's okay to acknowledge that and still find insights in the data.
Honestly curious: what highly accessible dataset that allows for the simple creations of 'fame metrics' would be better? I'm not aware of any.
[+] [-] jericho_jones|3 years ago|reply
Eg. Phil Collins isn't shown in favour of a cricketer from the early 20th century?
Sometimes things are imperfect, not racist.
[+] [-] danielvaughn|3 years ago|reply
[+] [-] newyankee|3 years ago|reply
[+] [-] TremendousJudge|3 years ago|reply
[+] [-] eurasiantiger|3 years ago|reply
For example, in countries bordering Russia, science nobel laureates are missing, but racist pseudoscientists and UFO theorists are listed.
[+] [-] chmod775|3 years ago|reply
I came back here to write pretty much this comment.
[+] [-] unknown|3 years ago|reply
[deleted]
[+] [-] d883kd8|3 years ago|reply
[+] [-] makeitdouble|3 years ago|reply
> This strategy results in a cross-verified database of 2.29 million unique individuals (an elite of 1/43,000 of human being having ever lived) among which 30% come from the 6 non-English editions of Wikipedia, a significant improvement over earlier works that have only focused on English Wikipedia only.
https://www.nature.com/articles/s41597-022-01369-4
[+] [-] jiggywiggy|3 years ago|reply
[+] [-] LewisVerstappen|3 years ago|reply
Go over the Levant and you start seeing Paul the Apostle, Diogenes, Ptolemy, etc. which makes Voltaire look like a modern political commentator.
[+] [-] taink|3 years ago|reply
> We document an Anglo-Saxon bias present in the English edition of Wikipedia, and document when it matters and when not.
Regardless of these biases, Europe has much more historical background than the US.
Finally, this data is based upon Wikipedia and Wikidata. I gather datasets from India or China would provide much different results.
Interesting project nonetheless!
[1] https://www.nature.com/articles/s41597-022-01369-4
[+] [-] jFriedensreich|3 years ago|reply
[+] [-] macintux|3 years ago|reply
One note if the creator is here: it looks like deprecated locations are included. https://www.wikidata.org/wiki/Q596717 includes both Indiana (deprecated) and Linton, Indiana, and he shows up on the map near the center of Indiana apparently as its most notable person, which is clearly not the case.
[+] [-] Lerc|3 years ago|reply
Peter Jackson at with respectable 'top 1000' score of 656 doesn't appear because he was born too close to Ernest Rutherford who edges him out at 634.
Andrew Niccol, who's films are somewhat more niche, gets on the map by virtue of being born a few kilometers north of Jackson.
[+] [-] boredemployee|3 years ago|reply
[+] [-] qabqabaca|3 years ago|reply
[+] [-] bzxcvbn|3 years ago|reply
[+] [-] Cthulhu_|3 years ago|reply
[+] [-] lcuff|3 years ago|reply
Clicking around, the notability of Western pop culture folk generally seem enormously inflated.
[+] [-] bthrn|3 years ago|reply
[+] [-] throwaway743|3 years ago|reply
[+] [-] matsemann|3 years ago|reply
[+] [-] astura|3 years ago|reply
Important to note it was British India at the time both were born.
[+] [-] wink|3 years ago|reply
[+] [-] cplli|3 years ago|reply
[+] [-] eesmith|3 years ago|reply
I looked at Santa Fe, since I thought George R. R. Martin would be the most famous person there.
This says Anna Gunn is the most famous person from Santa Fe.
Okay, so perhaps Martin's a transplant, while Gunn was born there? The map legend says "birthplaces", after all.
Nope. At least, Wikipedia and IMDB says she was born in Cleveland, and her family moved to Santa Fe when she was young.
Though ... other sources say she was born in Santa Fe, like https://patch.com/new-mexico/albuquerque/3-celebrities-who-l... ?
But the source paper at https://www.nature.com/articles/s41597-022-01369-4 uses Wikipedia and Wikidata - both of which list Cleveland.
... Ah-ah! The place of birth entry for Wikidata changed on 3 August 2020 from Cleveland to Santa Fe. https://www.wikidata.org/w/index.php?title=Q271050&oldid=124...
And the Wikipedia entry changed on 31 August 2018 https://en.wikipedia.org/w/index.php?title=Anna_Gunn&diff=86...
And the data from the paper was from 2018.
I wonder how bad the data is in the rest of the data set.
[+] [-] opheliate|3 years ago|reply
0: https://en.m.wikipedia.org/wiki/Elliot_Page
[+] [-] scottwmaxwell|3 years ago|reply
Please ignore the others who just fucking HAD to make it about race or gender or whatever.
[+] [-] RLN|3 years ago|reply
I imagine this could be repeated quite often as there are so many reused place names. Hook in Yorkshire doesn't even get a look in on this map!
[+] [-] lordnacho|3 years ago|reply
How does it know when it's appropriate to cut the dataset a little bit finer? I'm amazed how appropriate the names are that it turns up.