I can't believe there's a wikipedia page consisting solely of things that I bring up during party conversations to seem interesting. Thanks for showing me this.
This list is awesome. Along with [1], it could make a great starting point for learning about new topics but also quickly gaining knowledge on what topics are considered important in a discipline. For example, if you want to learn about philosophy but don't have a good understanding of what topics are even available, just scroll down in the list!
I think it could also serve as a list for topics a well-rounded person should know something about (of course, this is highly personal / debatable etc.). Or at least heard of. Perfect way to learn something every day!
I had a different reaction. I understand the reasons for having this sort of list internally, but some of what's included and left out is really odd to me, especially when it comes to biographical entries. I worry that these lists will kind of reify a very superficial approach to certain areas.
I'm not sure how they came up with these lists, but it would seem better to me to somehow quantitatively organize them, by numbers of edits or some index of controversy or something. That way there would be a more direct relationship with the reason for having the list in the first place.
As it is these lists remind me a lot of the controversies over Wikipedia when they started giving editors more and more power. It seems to reflect some preconceptions on the part of the Wikipedia editors more than anything else.
Totally agree. I just read Earth [1] for the first time and it was really fascinating.
Don't forget to click on 'Level 1' in the top bar as well, to see the top 10 most important articles.
I know it's possible to download the entire Wikipedia database, but does anyone know of a way to download every article in this list? Preferably as a torrent.
Kiwix is a standalone viewer (Available as desktop, phone or web server [including a Sandstorm.io package which I put together]) for archived websites of various sorts, Wikipedia being the flagship. This link lists all available things, and it's countless. However if you ctrl-f for "physics" (and keep searching until you hit the language you want) you'll see that they have subsets of Wikipedia available that cater to many interests. Physics, basketball, "for schools", history, etc.
All content packages are indeed available as torrents.
You can start with the index page and collect all the page titles you're interested in, and then use the special:export API to download XML (probably other formats too) of all those pages.
I was about to say that a torrent is hardly necessary. How big could a 1000 mostly-text files get? Pretty big as it turns out. Downloading a dozen random entries from that list, the sizes seem average around 2 MB, and that's including only the small images on each page (not the big picture you get when you click on an image). So 1000 entries at 2 MB each would be 2 GB.
Picking apart just one page (the Jane Austen entry), the plain ASCII text with no markup is only 88 KB. The 19 small images, plus some tiny buttons and logos, are 536 KB, and the markup (HTML, CSS, and whatnot) is 497 KB. I was surprised that Wikipedia, in terms of page weight, is mostly images and markup. (Not complaining, of course. Wikipedia is one of the few big sites on the web that doesn't throw in gratuitous and irrelevant images and videos.)
Not a torrent or a full solution but applying the regex
/wiki/(?!.\:)[aA-zZ0-9%()_]
on the source should select all the articles (along with some generic wikipedia links matched at the bottom), then batch adding "https://en.wikipedia.org" to the beginning of each line gives full urls.
Level 5 has 31 video game designers (under Artists, musicians, and composers -> Game and toy designers), including Shigeru Miyamoto as the only one also included in Level 4 (under Businesspeople). (Mario is the only fictional character from games included in Level 4.)
Notable omissions: Richard Stallman (Linus Torvalds is in), GNU (Linux is in), GPL, Free Software (Open-source software is in Level 4), Rust (Assembly, C, Java and Javascript are even in Level 4), Deep Learning, Hacker News (Reddit is in).
I love these things as much as anyone else on HN but realistically they are just semantic differences between already included pages. A line has to be drawn somewhere, and since they are mostly just minor differences (which, to be fair, have large implications... but the differences themselves are still quite minor), I don't think it's fair to say they should be included.
>Hacker news (Reddit is in)
Isn't reddit one of the top 10 websites by DAU? HN is intentionally niche.
Is is possible to see these ratings/labels/classes on an article? For instance, can I tell https://en.wikipedia.org/wiki/Charles_Dickens is B-class from the page itself? I assume the lock icon at the top right may have some correlation with the assessed quality, but it doesn't seem consistent across classes.
Article ratings below "Good Article" are considered only relevant to the coordination of Wikipedia writers and editors, so these are only shown in the top templates in its Talk Page.
The lock icon is a different issue, it is used in a subset of controversial or high-profile articles to limit the types of editors that can edit it (from requiring editors to be logged-in to a temporary full edit block during disputes).
Superb list. Should read !
A great example of a « knowledge tree », could be useful to find a root concept for an idea you want to explore.
Any other tool to find « first-principle » roots of any topic ?
Level 1 with 10 articles is available in 32 languages. There are some differences among the languages I checked, but they are mostly just using different representative articles for the same general categories. E.g. French has "culture" instead of "human" and Chinese has "culture" instead of "philosophy" and Catalan has "geography" instead of "Earth" and "society" instead of "human".
Surprisingly ”University” is only level four. I’d say it definitely deserves a place on level three, being not just a subconcept of ”school”, but an institution responsible for the majority of scientific research.
Edit: Okay, these are definitely the weirdest downvotes I’ve ever got on HN.
What I want to emphasize is that majority of articles represent western democratic, capitalistic, liberal culture. Reading these articles you won’t learn anything extra that you were tought in school. On one hand it is ammusig how english wikipedia became a western culture mirror. On the other hand it is sad that you can not get insights into other cultures without western culture filter.
>...it is sad that you can not get insights into other cultures without western culture filter.
I don't understand this sentiment. Why can't you? There are quite a few other entry points out there created by non-westerners; why complain that an index created by westerners for the English Wikipedia has a western bend? How could it conceivably be different?
wikipedia is now under the effective control of limited number of entrenched editors, mostly subscribing to western establishment's "liberal" ideology, with almost absolute power over content.
a prime example of this bias, is the article about british empire. comparison of that article with articles about other brutish regimes(ussr, mao's china, etc) is telling, even though it's atrocities far exceeds any other regime, in terms of both quantity and extent.
While I agree with that assessment, I don't really think it has much to do with Wikipedia itself, it's just how things are in the entire Western culture. History is written by the victors and all that.
Consequently, it feels a bit unfair to put this on Wikipedia editors. Kinda like blaming a random restaurant manager for forcing waiters to rely on tips. Yes, if they have enough profits, they can pay people more and fix a small part of the problem. The general issue remains systemic though.
I think you mean “English-language Wikipedia”. Try switching to one of the other languages and then putting the article through Google Translate. They’re not the same article semantically! Each language has its own set of editors putting its own spin on things. It’s only to be expected that each language’s editorial policy will be dominated by the cultural hegemony of that language’s speakers, if one exists.
(For example, even without government interference, you would expect “Taiwan” to have different first-sentence descriptions in the Chinese–Simplified, Chinese–Traditional, and English Wikipedias.)
At least one Wikipedia should maintain close to facts approach. If you prefer authoritarian "rewrite history" approach then you can consult Russian Wiki, where they rewrote all political and history pages, and I assume Chinese Wiki is the same.
Whenever Wikipedia is mentioned in hackernews, it's always the same comment. Have you got anything new to add, or any substantial argument to support your position? As it stands it's a pretty serious accusation and a pretty thin argument.
[+] [-] tapland|6 years ago|reply
[0] https://en.wikipedia.org/wiki/List_of_common_misconceptions
[+] [-] po|6 years ago|reply
https://en.wikipedia.org/wiki/List_of_cognitive_biases
really... you think you know them all but you don't. Everyone needs to understand the IKEA effect for example
[+] [-] pessimizer|6 years ago|reply
[+] [-] m3at|6 years ago|reply
[+] [-] lilcorey10|6 years ago|reply
[deleted]
[+] [-] itcrowd|6 years ago|reply
I think it could also serve as a list for topics a well-rounded person should know something about (of course, this is highly personal / debatable etc.). Or at least heard of. Perfect way to learn something every day!
Great submission, bookmarked, thanks!
[1] https://meta.wikimedia.org/wiki/List_of_articles_every_Wikip...
[+] [-] o09rdk|6 years ago|reply
I'm not sure how they came up with these lists, but it would seem better to me to somehow quantitatively organize them, by numbers of edits or some index of controversy or something. That way there would be a more direct relationship with the reason for having the list in the first place.
As it is these lists remind me a lot of the controversies over Wikipedia when they started giving editors more and more power. It seems to reflect some preconceptions on the part of the Wikipedia editors more than anything else.
[+] [-] kalev|6 years ago|reply
[1] https://en.wikipedia.org/wiki/Earth
[+] [-] benplumley|6 years ago|reply
[+] [-] orblivion|6 years ago|reply
https://wiki.kiwix.org/wiki/Content_in_all_languages
Kiwix is a standalone viewer (Available as desktop, phone or web server [including a Sandstorm.io package which I put together]) for archived websites of various sorts, Wikipedia being the flagship. This link lists all available things, and it's countless. However if you ctrl-f for "physics" (and keep searching until you hit the language you want) you'll see that they have subsets of Wikipedia available that cater to many interests. Physics, basketball, "for schools", history, etc.
All content packages are indeed available as torrents.
[+] [-] splatcollision|6 years ago|reply
What you're looking for is:
https://en.wikipedia.org/wiki/Special:Export
You can start with the index page and collect all the page titles you're interested in, and then use the special:export API to download XML (probably other formats too) of all those pages.
[+] [-] computator|6 years ago|reply
Picking apart just one page (the Jane Austen entry), the plain ASCII text with no markup is only 88 KB. The 19 small images, plus some tiny buttons and logos, are 536 KB, and the markup (HTML, CSS, and whatnot) is 497 KB. I was surprised that Wikipedia, in terms of page weight, is mostly images and markup. (Not complaining, of course. Wikipedia is one of the few big sites on the web that doesn't throw in gratuitous and irrelevant images and videos.)
[+] [-] kekebo|6 years ago|reply
Here's one such list: https://hastebin.com/terugezeda
wget has an option (-i) to download links line-by-line from a text file but is sadly making a mess of the images, using
or for short.Maybe someone has a better idea for the last step
edit: shorthand version
[+] [-] rolltopdesk|6 years ago|reply
[deleted]
[+] [-] 1wd|6 years ago|reply
Notable omissions: Richard Stallman (Linus Torvalds is in), GNU (Linux is in), GPL, Free Software (Open-source software is in Level 4), Rust (Assembly, C, Java and Javascript are even in Level 4), Deep Learning, Hacker News (Reddit is in).
[+] [-] lm28469|6 years ago|reply
The languages in parentheses basically run the world, Rust is barely used in comparison, it's not surprising it doesn't get the same attention.
[+] [-] chipperyman573|6 years ago|reply
I love these things as much as anyone else on HN but realistically they are just semantic differences between already included pages. A line has to be drawn somewhere, and since they are mostly just minor differences (which, to be fair, have large implications... but the differences themselves are still quite minor), I don't think it's fair to say they should be included.
>Hacker news (Reddit is in)
Isn't reddit one of the top 10 websites by DAU? HN is intentionally niche.
[+] [-] throwawaylolx|6 years ago|reply
[+] [-] inops|6 years ago|reply
If you have an account and don't want to leave the article to find out the class, you could enable the metadata gadget. [2]
[1] https://en.wikipedia.org/wiki/Talk:Charles_Dickens [2] https://en.wikipedia.org/wiki/Wikipedia:Metadata_gadget
[+] [-] segfaultbuserr|6 years ago|reply
For example,
https://en.wikipedia.org/wiki/Talk:Charles_Dickens
You'll see:
--- This article is of interest to the following WikiProjects:
* WikiProject Biography / Arts and Entertainment (Rated B-class)
* WikiProject England (Rated B-class, Top-importance)
* WikiProject Children's literature (Rated B-class, Top-importance)
* WikiProject Hampshire (Rated B-class, Top-importance)
* WikiProject Journalism (Rated B-class, High-importance)
* WikiProject London (Rated B-class, Top-importance)
* WikiProject Kent (Rated B-class, Top-importance)
--- This article has been reviewed by the Version 1.0 Editorial Team.
* C - This article has been rated as C-Class on the quality scale.
* B? -
This article has not yet been checked against the criteria for B-Class status:
Referencing and citation: not checked
Coverage and accuracy: not checked
Structure: not checked
Grammar and style: not checked
Supporting materials: not checked
Accessibility: not checked
To fill out this checklist, please add the following code to the template call assessing the article against each criterion.
* High - This article has been rated as High-importance on the importance scale.
[+] [-] TuringTest|6 years ago|reply
https://en.wikipedia.org/wiki/Special:Preferences#mw-prefsec...
The lock icon is a different issue, it is used in a subset of controversial or high-profile articles to limit the types of editors that can edit it (from requiring editors to be logged-in to a temporary full edit block during disputes).
[+] [-] dcchambers|6 years ago|reply
[+] [-] 0wis|6 years ago|reply
[+] [-] freddref|6 years ago|reply
[+] [-] yorwba|6 years ago|reply
[+] [-] tobr|6 years ago|reply
[+] [-] kaycebasques|6 years ago|reply
https://en.wikipedia.org/wiki/Wikipedia_talk:Vital_articles/...
[+] [-] derimagia|6 years ago|reply
https://en.wikipedia.org/wiki/Wikipedia_talk:Vital_articles#...
[+] [-] Sharlin|6 years ago|reply
Edit: Okay, these are definitely the weirdest downvotes I’ve ever got on HN.
[+] [-] incidentnormal|6 years ago|reply
[+] [-] unknown|6 years ago|reply
[deleted]
[+] [-] jvln|6 years ago|reply
[+] [-] falcor84|6 years ago|reply
I don't understand this sentiment. Why can't you? There are quite a few other entry points out there created by non-westerners; why complain that an index created by westerners for the English Wikipedia has a western bend? How could it conceivably be different?
[+] [-] agumonkey|6 years ago|reply
[+] [-] sittingnut|6 years ago|reply
[+] [-] the_duke|6 years ago|reply
There probably are some Wikipedia articles you could link to.
Edit: just a few examples
* https://en.wikipedia.org/wiki/British_concentration_camps
* https://en.wikipedia.org/wiki/Famine_in_India
* https://en.wikipedia.org/wiki/Slavery_in_Britain#Enslaved_Af...
[+] [-] MatekCopatek|6 years ago|reply
Consequently, it feels a bit unfair to put this on Wikipedia editors. Kinda like blaming a random restaurant manager for forcing waiters to rely on tips. Yes, if they have enough profits, they can pay people more and fix a small part of the problem. The general issue remains systemic though.
[+] [-] derefr|6 years ago|reply
(For example, even without government interference, you would expect “Taiwan” to have different first-sentence descriptions in the Chinese–Simplified, Chinese–Traditional, and English Wikipedias.)
[+] [-] Yizahi|6 years ago|reply
[+] [-] TheSpiceIsLife|6 years ago|reply
[+] [-] andrepd|6 years ago|reply
[+] [-] p1esk|6 years ago|reply
[+] [-] frogpelt|6 years ago|reply
I'm surprised by William Shakespeare coming in at #16.
[+] [-] JetSpiegel|6 years ago|reply
Sex is the eight.
[+] [-] tim333|6 years ago|reply
[+] [-] magic_beans|6 years ago|reply
[deleted]
[+] [-] inmate4587|6 years ago|reply