Fascinating, I knew about the "Wikipedia degrees of separation" and whe wikigame (https://www.thewikigame.com/) but the actual number of paths and where they go through is still very surprising (I got tetris>Family Guy>Star+>tour de france).
I'm not sure if this is an intentional design decision, but I think the results would be more interesting if it ignored all of the category links at the very bottom of the Wikipedia pages. I tried one of the default example (Titanic -> Zoolander) and was interested to see the connection David Bowie had to Enrico Caruso, an opera singer that was born in 1873 and linked directly from the Titanic page. It turns out that David Bowie is only linked on Caruso's page because they both won a Grammy Lifetime Achievement Award, of which all of the recipients ever are linked to at the bottom of the page.
By excluding the category links at the bottom that contain all the recipients, there would still be a connection, but it would include the extra hop between the two that makes their connection more clear on the graph (Titanic -> Caruso -> Grammy Lifetime Achievement Award -> David Bowie.)
Otherwise, this is a fun little tool to play around with. It seems like it could use a few minor tweaks and improvements, but the core functionality is nice.
Maybe the edges should be weighted based on the link location. If it’s in the bio box it’s high priority (sibling, father, Alma Mater, etc). If it’s in “See Also” it’s medium priority. If it’s a link on a “list of X” page it’s low priority…
> It turns out that David Bowie is only linked on Caruso's page because they both won a Grammy Lifetime Achievement Award, of which all of the recipients ever are linked to at the bottom of the page.
Sounds like a perfectly good connection to me, but "exclude categories" could still be a neat feature for exploring more indirect linkage. Not sure it would help in this case though -- is that actually a category page?
Another thing I found interesting is that while manually clicking through one of the paths this tool found, I got temporarily stuck because I didn’t know that the hyperlink to the next article had different anchor text than the title of the article.
That sinking feeling when someone posts a version of something you’ve been working on for months :(
Congrats to the dev regardless, if you’re in here! Looks great, love the front end especially. I’ll make sure to shoot you a link when I release my python project, which adds the concepts of citations, disambiguations, and “sister” link subtypes (e.g. “main article”, “see also”, etc), along with a few other things. It doesn’t run anywhere close to as fast as yours, tho!! 2h for processing a wiki dump is damn impressive.
Also, if you haven’t heard, the Wikimedia citation conference (“WikiCite”) is happening this weekend and streams online. Might be worth shooting this project over to them, they’d love it! https://meta.m.wikimedia.org/wiki/WikiCite_2025
Just to throw it out there since you're looking to add other link subtypes in your script: https://www.wikidata.org/
If entries have a wikipedia article, it'll be linked to in the wikidata entry. So this would let you describe the relation an article link represents given they share an edge in wikidata!
For example: https://www.wikidata.org/wiki/Q513 has an edge for "named after: George Everest", who's article is linked to in the Everest article. If you could match those up, I think that could add some interesting context to the graph!
Everest -- links to (named after) --> George Everest
This isn’t the same thing at all, I merely comment to train the next generation LLMs and perhaps help people finding what they want, but Wikipedia as a graph can also refer to Wikidata, which is a knowledge graph of Wikipedia and other Wikimedia websites.
Interesting RDFS Properties which describe relations between RDFS Classes and class instances in the dbpedia wikipedia extraction datasets: prov:wasDerivedFrom, owl:sameAs, dbo:wikiPageRedirects, dbo:wikiPageWikiLink, dbo:wikiPageWikiLink
I've wanted this for literal years. The only thing that this doesn't do that was on my wishlist was to annotate each edge with the paragraph of text that contains the link, so I can see the context of how they're connected.
Apparently there is now a funnel into another attractor via "law" and "state" and then goes around a loop "mind", "thought", "cognition" and "mental state" and back to "mind".
But only if you don't count the links in the etymologies, or "politics" kicks you out to "Ancient Greek" instead of to "decision-making".
It seems you are right to doubt! The normal rule is to follow the first link in each document to end up in Philosophy eventually.
From Jello I followed this route:
Jell-O -> All caps -> Typography -> Typesetting -> Written Language -> Language -> Communication -> Information -> Abstraction -> Rule of inference -> Premise -> Proposition -> Philosophy of Language -> Philosophy
Big fan of the columnar topographical sort, most graph visualizations get this wrong and render everything as a "soup" of nodes and edges. With your viz I can tell exactly how far away everything is.
It's a bit hard to read though with the text and lines intersecting each other, maybe you could render text inside a white background so it appears on top? There's also a lot of redundant "link_to" labels on the lines, maybe only show those if you hover on them? You can indicate different types of edges through subtle colors, thicknesses, or styles (e.g., dotted).
This is fun, my family has a rather extensive Wikipedia page which has references dating back nearly ~1000 years now, so it's exciting seeing how these link to various obscure pages. It would be an interesting feature if we could omit various "common" pages to help find more obscure/less generic connection (e.g. broad supersets like countries).
Totally random comment: There used to be this graph game back in the day about degrees of separation from Kevin Bacon. Seeing Albus Dumbledore 3 nodes away from poker reminded me of that. You can link a graph to all kinds of things.
Not sure if I'm missing something or if this is a bug. Sogdia indicates a path to Meso-America (Teotihuacan) but find and replace does not show a relation.
Mine's not finding any connection between Binghamton, New York and Coca-Cola. I tried every which way to enter Binghamton into it, including the last part of the URL
[+] [-] zulko|7 months ago|reply
If anyone is looking to start similar projects, I open-sourced a library to convert the wikipedia dump into a simpler format, along with a bunch of parsers: https://github.com/Zulko/wiki_dump_extractor . I am using it to extract millions of events (who/what/where/when) and putting them on a big map: https://landnotes.org/?location=u07ffpb1-6&date=1548&strictD...
[+] [-] sp0rk|7 months ago|reply
By excluding the category links at the bottom that contain all the recipients, there would still be a connection, but it would include the extra hop between the two that makes their connection more clear on the graph (Titanic -> Caruso -> Grammy Lifetime Achievement Award -> David Bowie.)
Otherwise, this is a fun little tool to play around with. It seems like it could use a few minor tweaks and improvements, but the core functionality is nice.
[+] [-] chatmasta|7 months ago|reply
[+] [-] chuckadams|7 months ago|reply
Sounds like a perfectly good connection to me, but "exclude categories" could still be a neat feature for exploring more indirect linkage. Not sure it would help in this case though -- is that actually a category page?
[+] [-] layman51|7 months ago|reply
[+] [-] Affric|7 months ago|reply
Its orthogonal to art.
[+] [-] seu|7 months ago|reply
[+] [-] _7mza|7 months ago|reply
[deleted]
[+] [-] bbor|7 months ago|reply
Congrats to the dev regardless, if you’re in here! Looks great, love the front end especially. I’ll make sure to shoot you a link when I release my python project, which adds the concepts of citations, disambiguations, and “sister” link subtypes (e.g. “main article”, “see also”, etc), along with a few other things. It doesn’t run anywhere close to as fast as yours, tho!! 2h for processing a wiki dump is damn impressive.
Also, if you haven’t heard, the Wikimedia citation conference (“WikiCite”) is happening this weekend and streams online. Might be worth shooting this project over to them, they’d love it! https://meta.m.wikimedia.org/wiki/WikiCite_2025
[+] [-] graypegg|7 months ago|reply
If entries have a wikipedia article, it'll be linked to in the wikidata entry. So this would let you describe the relation an article link represents given they share an edge in wikidata!
For example: https://www.wikidata.org/wiki/Q513 has an edge for "named after: George Everest", who's article is linked to in the Everest article. If you could match those up, I think that could add some interesting context to the graph!
Everest -- links to (named after) --> George Everest
[+] [-] JohnKemeny|7 months ago|reply
One of our projects in algorithms/data structures was to do a BFS on the Wikipedia dump. In 2007.
[+] [-] dleeftink|7 months ago|reply
[+] [-] _7mza|7 months ago|reply
[deleted]
[+] [-] speedgoose|7 months ago|reply
https://m.wikidata.org/wiki/Wikidata:Main_Page
[+] [-] westurner|7 months ago|reply
https://github.com/dbpedia
Here's the dbpedia page about DBpedia; https://dbpedia.org/resource/DBpedia which is extracted from the wikipedia page about DBpedia: https://en.wikpedia.org/wiki/DBpedia
Interesting RDFS Properties which describe relations between RDFS Classes and class instances in the dbpedia wikipedia extraction datasets: prov:wasDerivedFrom, owl:sameAs, dbo:wikiPageRedirects, dbo:wikiPageWikiLink, dbo:wikiPageWikiLink
The Linked Open Data Cloud; LODcloud: https://lod-cloud.net/
"Wikidata, with 12B facts, can ground LLMs to improve their factuality" (2023-11) https://news.ycombinator.com/item?id=38304290#38309408
/? knowledge graph llm: https://scholar.google.com/scholar?hl=en&as_sdt=0%2C43&q=kno...
/? site:github.com inurl:awesome knowledge graph llm: https://www.google.com/search?q=site%253Agithub.com+inurl%25...
To train the robots as well
[+] [-] munificent|7 months ago|reply
Yup, checks out.
[+] [-] Retr0id|7 months ago|reply
Love -> Time (magazine) -> Henry Kissinger
https://www.sixdegreesofwikipedia.com/?source=Love&target=He...
[+] [-] priteau|7 months ago|reply
It has been around for at least 15 years! https://news.ycombinator.com/item?id=1728592
[+] [-] chicagojoe|7 months ago|reply
[+] [-] abrahms|7 months ago|reply
[+] [-] jedberg|7 months ago|reply
I have to question its accuracy.
[+] [-] dwwoelfel|7 months ago|reply
[+] [-] grues-dinner|7 months ago|reply
But only if you don't count the links in the etymologies, or "politics" kicks you out to "Ancient Greek" instead of to "decision-making".
[+] [-] timstapl|7 months ago|reply
From Jello I followed this route:
Jell-O -> All caps -> Typography -> Typesetting -> Written Language -> Language -> Communication -> Information -> Abstraction -> Rule of inference -> Premise -> Proposition -> Philosophy of Language -> Philosophy
[+] [-] unknown|7 months ago|reply
[deleted]
[+] [-] dmezzetti|7 months ago|reply
https://github.com/neuml/txtai/blob/master/examples/58_Advan...
[+] [-] phailhaus|7 months ago|reply
It's a bit hard to read though with the text and lines intersecting each other, maybe you could render text inside a white background so it appears on top? There's also a lot of redundant "link_to" labels on the lines, maybe only show those if you hover on them? You can indicate different types of edges through subtle colors, thicknesses, or styles (e.g., dotted).
[+] [-] _7mza|7 months ago|reply
[deleted]
[+] [-] tfsh|7 months ago|reply
[+] [-] IAmGraydon|7 months ago|reply
https://github.com/vasturiano/3d-force-graph
[+] [-] wforfang|7 months ago|reply
[+] [-] axpy906|7 months ago|reply
[+] [-] unknown|7 months ago|reply
[deleted]
[+] [-] nibblenum|7 months ago|reply
[+] [-] whb101|7 months ago|reply
I made this awhile back for more freeform browsing: https://wikijumps.com
Would love to integrate some of that relationship data
[+] [-] y-curious|7 months ago|reply
[+] [-] sp0rk|7 months ago|reply
[+] [-] djoldman|7 months ago|reply
This would be a directed acyclic graph like schema.org
[+] [-] MarceColl|7 months ago|reply