top | item 3734416

Show HN: Graph of wikipedia articles semantic similarity (LSI, Python, d3.js)

54 points| lucamartinetti | 14 years ago |similarityapi.appspot.com

Small experiment of visualization of wikipedia articles as a graph using d3.js.<p>Articles with more traffic are bigger. I computed the semantic similarity using LSI with python (gensim) You have to scroll down/right a bit!<p>http://similarityapi.appspot.com/graph/?title=blade%20runner<p>There is also a JSON api: http://similarityapi.appspot.com/api/v1/?limit=100&title=blade%20runner<p>All feedback is appreciated:<p>@lucamartinetti [email protected]

13 comments

order
[+] lucamartinetti|14 years ago|reply
Small experiment of visualization of wikipedia articles as a graph using d3.js.

Articles with more traffic are bigger. I computed the semantic similarity using LSI with python (gensim) You have to scroll down/right a bit!

http://similarityapi.appspot.com/graph/?title=blade%20runner

There is also a JSON api: http://similarityapi.appspot.com/api/v1/?limit=100&title...

All feedback is appreciated:

@lucamartinetti [email protected]

[+] 3pt14159|14 years ago|reply
I've had much, much better results with LDA than LSI. Give that a shot if you have a chance, you'll be blown away. Stop word ratios are important, and make the max number of tokens 500,000.
[+] viscanti|14 years ago|reply
The JSON api should degrade gracefully if results aren't found. I.E. There should be a JSON message explaining that that item doesn't exist.
[+] rplnt|14 years ago|reply
Option to select language version could be a good feature (defaulting to en as now).
[+] Radim|14 years ago|reply
how much data did you use for the semantic analysis?
[+] Edootjuh|14 years ago|reply
I've never liked these scrolling animations. You need too much precision to see a part of the page clearly, while with normal scrolling it wouldn't matter if the information you're reading is at the bottom or top of the screen.
[+] stephengoodwin|14 years ago|reply
Does the font size for a node represent it's similarity with the query page?
[+] lucamartinetti|14 years ago|reply
It represents the traffic of the article. Ten most related articles are displayed for each expanded node. Articles with more inbound links are darker
[+] ssn|14 years ago|reply
Down?
[+] lucamartinetti|14 years ago|reply
Not for me. You need a modern browser (chrome or firefox) and scroll a bit