top | item 21428149

WordPress usage worldwide

58 points| andros | 6 years ago |github.com | reply

46 comments

order
[+] rbritton|6 years ago|reply
By only profiling the top 1M sites, I wonder if this may be sampling from a set not normally distributed? I suspect the frequency of use of WordPress might go up the further down the list you go — some random blog is less likely to be on the top 1M list yet be more likely to use WordPress.
[+] rdiddly|6 years ago|reply
I suspect this is so. The top million (specifically a subset of that) is where the sites fall whose owners can afford to hire people to build their own custom stuff. Small independent sites seem more likely to rely on something like WordPress.
[+] modernerd|6 years ago|reply
The test is likely to be under-reporting WordPress installations as it stands.

The script seems to detect a WordPress site by looking for a meta generator tag containing WordPress:

https://github.com/tanrax/calculate-wordpress-usage/blob/5aa...

It's pretty common to remove that meta tag — popular WordPress theme frameworks like Genesis do it by default.

A more reliable test would be to look for additional strings in the source that point to the use of WordPress, such as “wp-content” and “wp-includes”.

A faster way that avoids string searches would be to send an HTTP head request to `/wp-login.php` and check for:

Set-Cookie: wordpress_test_cookie

(/wp-login.php doesn't always appear in the root directory and it's not always accessible to all IPs, but that setup is most common).

[+] eugenekolo2|6 years ago|reply
It'd be interesting to do all that you said (and more) and then determine what's the combined amount, as well as what % of sites do some sort of obfuscation... and why?
[+] benbristow|6 years ago|reply
The way this works seems to check if there's the "generator" meta tag in the <head>.

Custom themes etc. might choose to omit that so it's not a 100% reliable check

[+] andros|6 years ago|reply
I don't think there's an unquestionable way to know if a site is made in WordPress, although you can intuit by some clues: header, meta tag, robots.txt, cookie login, sitemap ...
[+] sandov|6 years ago|reply
The article says that they also look for "wp-admin" et al in robots.txt
[+] mxpxrocks10|6 years ago|reply
you have to use a larger sample. There are 300M+ domains. The longer tail will surely have more wordpress in it.
[+] mxpxrocks10|6 years ago|reply
Some other notes: 1) you're not checking subdomains like blog.company.com or paths like company.com/blog 2) if you use something like zgrab you can do 1M site crawl in a couple of hours. Consider checking it out.
[+] subpixel|6 years ago|reply
Or more Wix, Squarespace, and Shopify in it.
[+] buboard|6 years ago|reply
> the list of the first million domains with the most visits

How is that a random sample?

[+] mxpxrocks10|6 years ago|reply
are you using http or https? following redirects? how many? This is all going to change the study.
[+] audessuscest|6 years ago|reply
Only ? Looks like a big number for worldwide usage, no ?
[+] rbritton|6 years ago|reply
WP itself usually puts the estimate at 25-30%.
[+] ludamad|6 years ago|reply
Right, originally I read 1% and was surprised, but that's massive usage
[+] capableweb|6 years ago|reply
The Readme says "Warning that it can take a long time: between 20 to 30 days."

How in the world can it take so long time? The csv file seems to be 24mb in size and the computation performed can't be that advanced. Did the author do something seriously wrong?

[+] buboard|6 years ago|reply
> To run it you'll need either 2Gb of RAM

What? I guess this is a toy program used to learn clojure or sth - it even uses sed for line parsing. A 10-line php script could do the same with a few MB of RAM

[+] detaro|6 years ago|reply
It just processes everything sequentially.
[+] mfer|6 years ago|reply
This is a pretty amazing feat. The top 1 million sites includes many who have the money to afford custom sites and yet Wordpress is still almost 1/5 of sites.
[+] smacktoward|6 years ago|reply
WordPress is the software HN loves to hate, but while it certainly has plenty of warts it's also a very flexible, pliable system for building the kinds of web sites that most people want to build. It'll never win any architectural beauty contests, but market share is driven by utility, not beauty. And WordPress can be very useful software.
[+] bdcravens|6 years ago|reply
I can afford to have all of my clothes professionally cleaned, yet I have a washer and dryer.
[+] blondin|6 years ago|reply
so they studied the top 1M websites and found 19% is using wordpress? man, that's a good chunk if you ask me.
[+] andros|6 years ago|reply
It's amazing. So we can say that 19% use PHP (at least) and MySQL.
[+] hestefisk|6 years ago|reply
I’m wondering why this takes 20-30 days to run all up? Seems crazy for 1M requests. Could one make this a concurrent task and get much greater efficiency?