top | item 22133434

Wikipedia now has more than 6M articles in English

284 points| jmsflknr | 6 years ago |techcrunch.com | reply

178 comments

order
[+] fhoffa|6 years ago|reply
6 million pages! A quick look at their distribution:

3.6M of these pages got less than 10 views during December. 4.3M pages got less than 100 views each. 2.3M got less than 1000 views. That's the long tail. On the other side 680K pages got more than 1,000 views, 111K pages got more than 10k views, only 6k pages got more than 100k views, and 90 pages were able to gather more than 1M views (Star Wars, The Mandalorian, The Witcher...).

In summary: The top 7.2% of Wikipedia pages earn 87% of all the monthly views.

Yesterday I posted a deeper analysis on how these pages get their daily views:

- https://towardsdatascience.com/interactive-the-top-2019-wiki...

Apparently the most popular pages are all related to movies (Avengers, Joker, ...), series (Ted Bundy, Chernobyl, Game of Thrones), and deaths.

[+] vladak|6 years ago|reply
Whenever seeing the staggering number of Wikipedia articles it reminds me of my own recent experience with creating an article. To centralize some of the knowledge of now niche set of keyboards (manufactured by what used to be one of the FAANGs of its time) I set to create a page. After adding some - what I thought - useful and interesting data, the page got deleted as it was not "notable" enough, i.e. lacking sufficient number of external references. These keyboards did exist and are used until this day however are not as popular (and therefore not as commented/described) like Apple's keyboards (which do have their own page on Wikipedia) so the line was drawn there. In the end I found it funny and also interesting to see how Wikipedia works. If the editors were not so strictly bound to the set of rules there would be probably tens of millions of articles/pages now.

Another experience I had was watching a video on YouTube from Japan where a guy was brewing coffee which reminded me that the brand of the coffee equipment used in the video is sometimes sold in coffee shops in Europe. I wanted to know a little bit more about the company so I went to Wikipedia - the English page got deleted because there was little information about the company on the English speaking part of the Internet. This does not say anything about the company (which seems to be hugely popular in Japan and certain social circles in Europe), rather it demonstrates the limit of the model Wikipedia uses for curating the articles.

[+] majos|6 years ago|reply
I’m a big fan of Wikipedia. It’s not perfect, and plenty of pages are outright bad, but repeatedly clicking “random page” [1] is a great source of entertainment to me. There’s so much info out there, yet I reliably get something surprising, interesting, or funny within maybe 10 random pages. And because it’s random, I learn things way outside my normal interests.

Try it if you find yourself mindlessly scrolling! It’s pretty easy but you find way more cool things (that typically people don’t broadly know about, if you like that sort of thing).

I donate a few bucks a month and it’s well worth it.

[1] https://en.m.wikipedia.org/wiki/Special:random

[+] Liquix|6 years ago|reply
Wikipedia is AWESOME and I admire your selflessness & generosity in donating to a cause :)

However, before giving them any more money I would recommend reading through the (Wikipedia-hosted, ironically) article titled "Wikipedia has Cancer". The revenue vs. employee count/operating costs ratio is way off. Like, orders of magnitude off.

It's a tough pill to swallow considering how useful the site is and their respectable "information should be free for all" mission.

Again, your generosity is admired and there are far worse causes to donate towards. Just want to make sure people see both sides of the coin before spending hard earned cash!

[+] hrasyid|6 years ago|reply
Since this is hackernews... Other than donating please also consider contributing by coding would also be helpful. Editors rely on tools for many tasks (e.g. to detect & fight vandalism, for article evaluation, analytics, to do tedious tasks such as mass edits), and these tools are often open source. Mediawiki (the software on which Wikipedia runs) are always building features and doing bugfixes. Also, while the English Wikipedia is quite endowed with prolific programmers, many of its sister Wikipedias are less so and often lack basic tools, and there's a lot to do there.
[+] lakkal|6 years ago|reply
I used to do that sometimes myself (and have also donated)!.

One thing that always seemed strange was how often the random page would be for a soccer/football team or player, or for a random town in Poland.

[+] visarga|6 years ago|reply
I use a similar strategy in balancing out exploration and exploitation with videos - pull 5-10 random links and pick one of them.
[+] RivieraKid|6 years ago|reply
It's easy to forget how amazing Wikipedia is because people are so used to it. It's also an anomaly in todays internet landscape - no ads, popups or auto-play videos. I think it's literally the best website in existence.

If you suggested 20 years ago that there will soon exist a free online encyclopedia, which has two orders of magnitude more words than Britannica, is updated daily and is mostly unbiased - they would probably think it's impossible.

[+] segfaultbuserr|6 years ago|reply
> no ads, popups or auto-play videos.

Also, there are no 100 JavaScript tracking scripts on Wikipedia, and it's probably one of the fastest sites among the top-100 websites in the world.

[+] kick|6 years ago|reply
There are two candidates for "best website in existence," both were made by people born in countries that spawned from the dissolution of the USSR, and one starts with "sci-" while the other starts with "Lib." One ends with "hub" while the other ends with "Gen."
[+] drej|6 years ago|reply
It's only shame that Wikidata doesn't get as much attention. It's a knowledge graph that runs _some_ of Wikipedia's content. I became involved recently and what I'd like to do is to get Wikipedians to use Wikidata more (e.g. automatically loading births and deaths on people's profiles), because once these two services are more interlinked, we're gonna get more knowledge graph info for free since people editing Wikipedia will keep improving this structured dataset.

I'd encourage people to get involved in either of these two projects - there's always a niche you know about and could improve its presence in the Wiki world.

[+] zweep|6 years ago|reply
Is Wikidata how, for example, when a particular football team wins the Super Bowl, its articles are updated, the articles about the Super Bowl are updated, the article about the list of teams that have won multiple Super Bowls is updated, etc... all very quickly? Or is that just an army of volunteers.
[+] Jasper_|6 years ago|reply
Here's what Wikidata says about Earth, an item that is number 2 in the ID list, and also on their front page as an example of incredible data.

https://www.wikidata.org/wiki/Q2

I struggle to find anything interesting on this page. It is apparently a "topic of geography", whatever that means as a statement. It has a WordLift URL. It is an instance of an inner planet.

The first perhaps verifiable, solid fact, that Earth has a diameter of "12,742 kilometre", is immediately suspect. There is no clarifying remark, not even a note, that Earth is not any uniform shape and cannot have a single value as its diameter.

This is my problem with SPARQL, with "data bases", in that sense. Data alone is useless without a context or a framework in which it can be truly understood. Facts like this can have multiple values depending on exactly what you're measuring, or what you're using the measurement for.

And this on the page for Earth, an example that is used on their front page, and has the ID of 2. It is the second item to ever be created in Wikidata, after Q1, "Universe", and yet everything on it is useless.

[+] preommr|6 years ago|reply
Wikipedia is the most incredible creation in modern times. Because it's not about some law of the universe that people have learned enough about to use to their advantage like electricity or nuclear physics. It's something inherent to us as a species that we came together to create a resource for everyone to access and that it has such a high level of quality. The culture around Wikipedia - the culmination of knowledge, the importance of said knowledge, the random curiosity of such a wide variety of topics and so much more, says so much about us as people.

If you had asked me half a century ago if something like this would ever work - I would've said absolutely not. And yet, there it is.

[+] Balgair|6 years ago|reply
> It's something inherent to us as a species that we came together to create a resource for everyone to access and that it has such a high level of quality.

Oh man, if only it were true! If anything, wikipedia should be seen as an overcoming of our terrible human nature in making something good in the world, not as a property of humans. It is SO precious and fragile in our mortal hands.

The story of the Encyclopédie's origins is incredible and a great reminder that works like wikipedia are the exception, not the rule. The Philosophes worked very hard to get the volumes out and were under constant threat of censure. The very idea of the Encyclopédie was a direct threat to the Ancient Regime and it's publication was a direct cause (among many) of the French Revolution. A revolution whose effects we feel until this very day. The Encyclopédie was a lit cigarette in a powder cache. It's head editor, Diderot, is still a controversial person. If you can have haters and lovers nearly 250 years after you dance your last, well kiddo, you've done something right.

Even today wikipedia is vandalized by powerful interests and is routinely censored out of existence for many of the people of the world. Governments are still afraid of the free knowledge that wikipedia gives us. It is still a lit cigarette in a powder cache. But it is a fragile thing.

Cherish it: DONATE

https://donate.wikimedia.org/w/index.php?title=Special:Landi...

[+] c3534l|6 years ago|reply
I, like almost every person at the time, heard about wikipedia and scoffed at it. And, in the early days,the quality was atrocious and the scope seemingly insurmountable. But wikipedia almost ever gets better, not worse, and the limitation that the books have to sit on a shelf makes wikipedia far more than an encyclopedia. It has evolved, through time, to be something few people ever imagined it could be.
[+] baddox|6 years ago|reply
Forget half a century ago! Even when I was in high school in 2005 (while I was getting a vastly better education from personal curiosity on Wikipedia than from public school) it was widely accepted that it would be ludicrous to believe anything in Wikipedia since anyone can edit it.
[+] spodek|6 years ago|reply
I remember telling people in 2004 or so how it would take over and people thought I was crazy. There were something like 10,000 articles then. One coworker didn't believe anyone could edit it so he edited a page for the fun of it. A few hours later someone had reverted it, which led him to believe more in it.

As they say, Wikipedia doesn't work in theory, only in practice.

As longtime GNU/Linux user then, I viewed the GPL as the foundation to it. Encyclopedias seemed a niche where a GPL-based project could thrive, like in making an operating system or web server.

[+] anderspitman|6 years ago|reply
> It's something inherent to us as a species that we came together to create a resource for everyone to access and that it has such a high level of quality

I like the explanation that Wikipedia is built on nerds' need to correct each other.

[+] gauravjain13|6 years ago|reply
Wikipedia is so incredible because it’s that sliver of intersection in a Venn diagram of what’s ideal and what’s real – a rare phenomenon.
[+] unknown|6 years ago|reply

[deleted]

[+] soheil|6 years ago|reply
Please don't hate on my comment, all I can do is ask not to be downvoted, at least I tried. Here is my case against Wikipedia. In a capitalistic model it makes little sense why Wikipedia should thrive without enriching someone so massively in the process, therefore, it must be corrected. If we assume that line of thinking is valid (which I'm not saying we should or shouldn't) then it follows that an alternative Wikipedia riddled with ads would be a superior model, in terms of capitalism. It'd be much more similar to imdb perhaps, owned by Amazon. Wikipedia (in the film category) provides information in a much effective way and can stay on top of change by issuing revisions way better than imdb can ever do.

If we jump to the most logical conclusion of the set of assumptions made above I think it then follows that readily accessible knowledge cannot necessarily be a good thing, otherwise, the market would have rewarded that. But we see in the case of imdb market did not reward it enough for it to achieve the same pedigree as Wikipedia.

It's not good for people to learn facts that easily. There should be a higher cost associated with that. This is of course a ridiculous conclusion but I think it could make sense why it would be true. It is a whole other post for why it would be true but I will only give an example or two.

One example is if knowledge is that easily accessible then anyone could achieve it without necessarily having enough desire to achieve said knowledge, and since we have limited capacity for knowledge acquisition and retainment we are most likely sacrificing knowledge that we are truly passionate about.

Second example is maybe it isn't good for people to know so much anyway. After all there are many things that due to the laws of nature we are inherently incapable of ever knowing such as what is beyond knowable universe (nothing can travel faster than speed of light so we cannot learn about it since the knowledge or light from stuff beyond the knowable universe won't have enough time to ever reach us.)

[+] brenden2|6 years ago|reply
Wikipedia is an incredible achievement. It's one of the only information sources I find to be consistently useful across a broad range of topics. It's also one of the only remaining sources I feel to be mostly trustworthy.

These days I append either "wikipedia" or "reddit" to most of my Google searches in order to get useful information. There's so much SEO'd garbage out there, but Wikipedia remains a breath of fresh air.

[+] Analemma_|6 years ago|reply
If you're worried about SEO and manipulation, Wikipedia is decent but Reddit is awful. It's pretty well-established by now that astroturfing and influence campaigns are rampant on Reddit, and in some particularly bad cases like /r/worldnews have taken over subreddits completely.
[+] harshreality|6 years ago|reply
You could switch to duckduckgo and use the ! codes; !r and !w search directly on reddit and wikipedia.

In cases where you aren't searching directly on one of the supported sites, and would prefer google's algorithms instead of ddg's (or reddit's), all you have to do is add !g. It's a lot easier than trying to get google to do direct searches (using site:) or search on ddg (I don't think google has any shortcuts for redirecting to search on another search engine).

[+] IIAOPSW|6 years ago|reply
Why not skip the middleman and go directly to wikipedia or reddit?
[+] aeyes|6 years ago|reply
If you speak other languages Wikipedia certainly gets very interesting. German Wikipedia is pretty good, for some topics it is better than English Wikipedia. But Spanish Wikipedia is quite sad to read, todays articles have the quality I remember from when Wikipedia was 2 or 3 years old even though there are many times more Spanish than German speakers in the world. Then there are the Swedish and Cebuana Wikipedias which are large but were mostly created by a bot with almost useless stub articles.

It makes me wonder what information sources other countries like China, India or Japan use.

[+] Bayart|6 years ago|reply
I absolutely recommend people don't stay just think of the English version as the default. Other language communities certainly produce better articles on subjects they're closer to or where they have a better scholarly tradition. I tend to consult the German and Catalan (!) versions quite a bit, besides the French and English ones.
[+] SJSque|6 years ago|reply
What's always struck me is how large the Dutch Wikipedia is (1,992,551 articles), given the relatively low number of people that speak that language. They would appear to be a pretty tech-savvy/well-connected nation.

It's certainly consistent with the number of comments that I see here on HN that start with some variation on "Here in the Netherlands...".

[+] scarejunba|6 years ago|reply
Wikipedia is great. Every time I create an article, it ends up being tended to by a bunch of gardener bots who fix up all the markup and make it look nice and put an infobox in, and then other people add a little and so on and it just gets better. Honestly, pretty astounding.
[+] ignoramous|6 years ago|reply
How's https://golden.com doing? I have come across some really detailed Wikipedia articles (for sports events, for example) and at times some really drab ones which I try to edit to improve. But original research articles sometimes do contain a lot more specific information that doesn't survive editor scrutiny on Wikipedia. And the less we talk about numerous advertisment/PR/fluff articles and socio/geo-political edit wars being played out, the better.

Take tech pages for example, I really think an enormous amount of information in blogs, online magazines, research papers need to be out there in Wikipedia pages too, if not as actual texts but as references.

For instance, here's an article on HAMT [0] and another on Radix Trees [1] but the corresponding Wikipedia pages [2][3] for those aren't as consumable as the other two.

There are also cases where pages are deleted because original research, like Willy Tarreau's EB Trees [4] that powers routing, caching on HAProxy.

I tend to research more on Wikipedia than search engines, to be honest. May be, the search engines can get smart enough to amalgamate a page on the searched topic given the amount of open-web they crawl?

[0] https://blog.mattbierner.com/hash-array-mapped-tries-in-java...

[1] https://vincent.bernat.ch/en/blog/2017-ipv4-route-lookup-lin...

[2] http://en.wikipedia.org/wiki/Hash_array_mapped_trie

[3] https://en.wikipedia.org/wiki/Radix_tree

[4] https://wtarreau.blogspot.com/2011/12/elastic-binary-trees-e...

[+] NeoBasilisk|6 years ago|reply
The article count would actually be much higher if they weren't diligent about deleting or merging articles that they don't find notable.
[+] firediamond|6 years ago|reply
My thought on seeing the headline was along the same vein. 6M seemed like a rather low number.

Not to say it isn't an impressive one by any means. I was just a little surprised.

[+] Thorentis|6 years ago|reply
Wikipedia is becoming so much more than an encyclopedia. I am constantly amazed at the level of detail people are willing to document things in.

For instance, I wanted to find which episode I was up to in a particular series, so I Google it, find a Wikipedia page for the series, and sure enough there's a table listing every season and every episode, with a short synopsis of each episode.

I'd say a very large portion of all page views are due to media related queries (TV shows, movies, books etc.).

[+] Bayart|6 years ago|reply
I can't read the article because the consent parameters for data collection are hidden behind layers upon layers of misleading UI and links, with a broken captcha inserted in the middle.

So, Techcrunch can fuck off from now on I guess.

[+] hncensorsnonpc|6 years ago|reply
Wikipedia also is dominated by a elite class of early contributors, PC censorship people or people working directly for the cooperation or on their behalf. There are many accounts with more edits a day the humanly possible or 356 days of the year or article edits from IPs directly from government agencies whitewashing ... they do not even try or know how to hide it. When it comes to info outside of like nature science then wiki is NOT a re

liable and unbiased source at all.

[+] jonbaer|6 years ago|reply
Wonder what a 6M set would cost these days, https://en.wikipedia.org/wiki/Encyclopedia
[+] agumonkey|6 years ago|reply
I can probably buy a truck of Encyclopaedia Universalis for $1000 (I'm even tempted to say $0).