top | item 20523049

Vital Wikipedia Articles

436 points| soheilpro | 6 years ago |en.wikipedia.org | reply

77 comments

order
[+] tapland|6 years ago|reply
I would argue that List of Common Misconceptions[0] belongs in any list of important or vital Wikipedia-articles.

[0] https://en.wikipedia.org/wiki/List_of_common_misconceptions

[+] pessimizer|6 years ago|reply
I can't believe there's a wikipedia page consisting solely of things that I bring up during party conversations to seem interesting. Thanks for showing me this.
[+] itcrowd|6 years ago|reply
This list is awesome. Along with [1], it could make a great starting point for learning about new topics but also quickly gaining knowledge on what topics are considered important in a discipline. For example, if you want to learn about philosophy but don't have a good understanding of what topics are even available, just scroll down in the list!

I think it could also serve as a list for topics a well-rounded person should know something about (of course, this is highly personal / debatable etc.). Or at least heard of. Perfect way to learn something every day!

Great submission, bookmarked, thanks!

[1] https://meta.wikimedia.org/wiki/List_of_articles_every_Wikip...

[+] o09rdk|6 years ago|reply
I had a different reaction. I understand the reasons for having this sort of list internally, but some of what's included and left out is really odd to me, especially when it comes to biographical entries. I worry that these lists will kind of reify a very superficial approach to certain areas.

I'm not sure how they came up with these lists, but it would seem better to me to somehow quantitatively organize them, by numbers of edits or some index of controversy or something. That way there would be a more direct relationship with the reason for having the list in the first place.

As it is these lists remind me a lot of the controversies over Wikipedia when they started giving editors more and more power. It seems to reflect some preconceptions on the part of the Wikipedia editors more than anything else.

[+] kalev|6 years ago|reply
Totally agree. I just read Earth [1] for the first time and it was really fascinating. Don't forget to click on 'Level 1' in the top bar as well, to see the top 10 most important articles.

[1] https://en.wikipedia.org/wiki/Earth

[+] benplumley|6 years ago|reply
I know it's possible to download the entire Wikipedia database, but does anyone know of a way to download every article in this list? Preferably as a torrent.
[+] orblivion|6 years ago|reply
I can offer one option that's in the ballpark of what you're looking for:

https://wiki.kiwix.org/wiki/Content_in_all_languages

Kiwix is a standalone viewer (Available as desktop, phone or web server [including a Sandstorm.io package which I put together]) for archived websites of various sorts, Wikipedia being the flagship. This link lists all available things, and it's countless. However if you ctrl-f for "physics" (and keep searching until you hit the language you want) you'll see that they have subsets of Wikipedia available that cater to many interests. Physics, basketball, "for schools", history, etc.

All content packages are indeed available as torrents.

[+] splatcollision|6 years ago|reply
Please use the API. DO not scrape wikipedia via the website.

What you're looking for is:

https://en.wikipedia.org/wiki/Special:Export

You can start with the index page and collect all the page titles you're interested in, and then use the special:export API to download XML (probably other formats too) of all those pages.

[+] computator|6 years ago|reply
I was about to say that a torrent is hardly necessary. How big could a 1000 mostly-text files get? Pretty big as it turns out. Downloading a dozen random entries from that list, the sizes seem average around 2 MB, and that's including only the small images on each page (not the big picture you get when you click on an image). So 1000 entries at 2 MB each would be 2 GB.

Picking apart just one page (the Jane Austen entry), the plain ASCII text with no markup is only 88 KB. The 19 small images, plus some tiny buttons and logos, are 536 KB, and the markup (HTML, CSS, and whatnot) is 497 KB. I was surprised that Wikipedia, in terms of page weight, is mostly images and markup. (Not complaining, of course. Wikipedia is one of the few big sites on the web that doesn't throw in gratuitous and irrelevant images and videos.)

[+] kekebo|6 years ago|reply
Not a torrent or a full solution but applying the regex /wiki/(?!.\:)[aA-zZ0-9%()_] on the source should select all the articles (along with some generic wikipedia links matched at the bottom), then batch adding "https://en.wikipedia.org" to the beginning of each line gives full urls.

Here's one such list: https://hastebin.com/terugezeda

wget has an option (-i) to download links line-by-line from a text file but is sadly making a mess of the images, using

  wget --span-hosts --convert-links --adjust-extension --page-requisites --no-host-directories --no-parent --wait=1 --reject="robots.txt" -i wget.txt 
or

  wget -H -k -E -p -nH -np -w 1 -R "robots.txt" -i wget.txt
for short.

Maybe someone has a better idea for the last step

edit: shorthand version

[+] 1wd|6 years ago|reply
Level 5 has 31 video game designers (under Artists, musicians, and composers -> Game and toy designers), including Shigeru Miyamoto as the only one also included in Level 4 (under Businesspeople). (Mario is the only fictional character from games included in Level 4.)

Notable omissions: Richard Stallman (Linus Torvalds is in), GNU (Linux is in), GPL, Free Software (Open-source software is in Level 4), Rust (Assembly, C, Java and Javascript are even in Level 4), Deep Learning, Hacker News (Reddit is in).

[+] lm28469|6 years ago|reply
> Rust (Assembly, C, Java and Javascript are even in Level 4)

The languages in parentheses basically run the world, Rust is barely used in comparison, it's not surprising it doesn't get the same attention.

[+] chipperyman573|6 years ago|reply
>GNU, GPL, Free Software

I love these things as much as anyone else on HN but realistically they are just semantic differences between already included pages. A line has to be drawn somewhere, and since they are mostly just minor differences (which, to be fair, have large implications... but the differences themselves are still quite minor), I don't think it's fair to say they should be included.

>Hacker news (Reddit is in)

Isn't reddit one of the top 10 websites by DAU? HN is intentionally niche.

[+] throwawaylolx|6 years ago|reply
Is is possible to see these ratings/labels/classes on an article? For instance, can I tell https://en.wikipedia.org/wiki/Charles_Dickens is B-class from the page itself? I assume the lock icon at the top right may have some correlation with the assessed quality, but it doesn't seem consistent across classes.
[+] segfaultbuserr|6 years ago|reply
Article ratings below "Good Article" are considered only relevant to the coordination of Wikipedia writers and editors, so these are only shown in the top templates in its Talk Page.

For example,

https://en.wikipedia.org/wiki/Talk:Charles_Dickens

You'll see:

--- This article is of interest to the following WikiProjects:

* WikiProject Biography / Arts and Entertainment (Rated B-class)

* WikiProject England (Rated B-class, Top-importance)

* WikiProject Children's literature (Rated B-class, Top-importance)

* WikiProject Hampshire (Rated B-class, Top-importance)

* WikiProject Journalism (Rated B-class, High-importance)

* WikiProject London (Rated B-class, Top-importance)

* WikiProject Kent (Rated B-class, Top-importance)

--- This article has been reviewed by the Version 1.0 Editorial Team.

* C - This article has been rated as C-Class on the quality scale.

* B? -

This article has not yet been checked against the criteria for B-Class status:

Referencing and citation: not checked

Coverage and accuracy: not checked

Structure: not checked

Grammar and style: not checked

Supporting materials: not checked

Accessibility: not checked

To fill out this checklist, please add the following code to the template call assessing the article against each criterion.

* High - This article has been rated as High-importance on the importance scale.

[+] TuringTest|6 years ago|reply
The Gadgets section has an Appearance setting to "display an assessment of an article's quality in its page header".

https://en.wikipedia.org/wiki/Special:Preferences#mw-prefsec...

The lock icon is a different issue, it is used in a subset of controversial or high-profile articles to limit the types of editors that can edit it (from requiring editors to be logged-in to a temporary full edit block during disputes).

[+] dcchambers|6 years ago|reply
Well there goes my productivity for the day.
[+] 0wis|6 years ago|reply
Superb list. Should read ! A great example of a « knowledge tree », could be useful to find a root concept for an idea you want to explore. Any other tool to find « first-principle » roots of any topic ?
[+] freddref|6 years ago|reply
I wonder how these pages change in content and tone across languages..
[+] yorwba|6 years ago|reply
Level 1 with 10 articles is available in 32 languages. There are some differences among the languages I checked, but they are mostly just using different representative articles for the same general categories. E.g. French has "culture" instead of "human" and Chinese has "culture" instead of "philosophy" and Catalan has "geography" instead of "Earth" and "society" instead of "human".
[+] Sharlin|6 years ago|reply
Surprisingly ”University” is only level four. I’d say it definitely deserves a place on level three, being not just a subconcept of ”school”, but an institution responsible for the majority of scientific research.

Edit: Okay, these are definitely the weirdest downvotes I’ve ever got on HN.

[+] incidentnormal|6 years ago|reply
So few A-class articles in the list.
[+] jvln|6 years ago|reply
What I want to emphasize is that majority of articles represent western democratic, capitalistic, liberal culture. Reading these articles you won’t learn anything extra that you were tought in school. On one hand it is ammusig how english wikipedia became a western culture mirror. On the other hand it is sad that you can not get insights into other cultures without western culture filter.
[+] falcor84|6 years ago|reply
>...it is sad that you can not get insights into other cultures without western culture filter.

I don't understand this sentiment. Why can't you? There are quite a few other entry points out there created by non-westerners; why complain that an index created by westerners for the English Wikipedia has a western bend? How could it conceivably be different?

[+] agumonkey|6 years ago|reply
They could include Lewis Dartnell
[+] sittingnut|6 years ago|reply
wikipedia is now under the effective control of limited number of entrenched editors, mostly subscribing to western establishment's "liberal" ideology, with almost absolute power over content. a prime example of this bias, is the article about british empire. comparison of that article with articles about other brutish regimes(ussr, mao's china, etc) is telling, even though it's atrocities far exceeds any other regime, in terms of both quantity and extent.
[+] MatekCopatek|6 years ago|reply
While I agree with that assessment, I don't really think it has much to do with Wikipedia itself, it's just how things are in the entire Western culture. History is written by the victors and all that.

Consequently, it feels a bit unfair to put this on Wikipedia editors. Kinda like blaming a random restaurant manager for forcing waiters to rely on tips. Yes, if they have enough profits, they can pay people more and fix a small part of the problem. The general issue remains systemic though.

[+] derefr|6 years ago|reply
I think you mean “English-language Wikipedia”. Try switching to one of the other languages and then putting the article through Google Translate. They’re not the same article semantically! Each language has its own set of editors putting its own spin on things. It’s only to be expected that each language’s editorial policy will be dominated by the cultural hegemony of that language’s speakers, if one exists.

(For example, even without government interference, you would expect “Taiwan” to have different first-sentence descriptions in the Chinese–Simplified, Chinese–Traditional, and English Wikipedias.)

[+] Yizahi|6 years ago|reply
At least one Wikipedia should maintain close to facts approach. If you prefer authoritarian "rewrite history" approach then you can consult Russian Wiki, where they rewrote all political and history pages, and I assume Chinese Wiki is the same.
[+] TheSpiceIsLife|6 years ago|reply
Do you mean to imply that "conservative" western ideology would be more truthful about British empire?
[+] andrepd|6 years ago|reply
Whenever Wikipedia is mentioned in hackernews, it's always the same comment. Have you got anything new to add, or any substantial argument to support your position? As it stands it's a pretty serious accusation and a pretty thin argument.
[+] p1esk|6 years ago|reply
“Adolph Hitler” is the 6th most visited article in the last 90 days. Wow.
[+] frogpelt|6 years ago|reply
That article immediately follows World War II and World War I and D-Day was June 6.

I'm surprised by William Shakespeare coming in at #16.

[+] JetSpiegel|6 years ago|reply
> Vital articles sorted by number of views in the past 90 days (as of 17 July 2014)

Sex is the eight.

[+] tim333|6 years ago|reply
I guess human drama needs a bad guy.
[+] inmate4587|6 years ago|reply
The list seems awesome, I'll definitely go through it all.