WordPress usage worldwide

[+] rbritton|6 years ago|reply

By only profiling the top 1M sites, I wonder if this may be sampling from a set not normally distributed? I suspect the frequency of use of WordPress might go up the further down the list you go — some random blog is less likely to be on the top 1M list yet be more likely to use WordPress.

[+] mfer|6 years ago|reply

W3techs profiles the top 10 million sites which shows a percentage of 34% [1]. You might be right.

[1] https://w3techs.com/technologies/overview/content_management...

[+] rdiddly|6 years ago|reply

I suspect this is so. The top million (specifically a subset of that) is where the sites fall whose owners can afford to hire people to build their own custom stuff. Small independent sites seem more likely to rely on something like WordPress.

[+] modernerd|6 years ago|reply

The test is likely to be under-reporting WordPress installations as it stands.

The script seems to detect a WordPress site by looking for a meta generator tag containing WordPress:

https://github.com/tanrax/calculate-wordpress-usage/blob/5aa...

It's pretty common to remove that meta tag — popular WordPress theme frameworks like Genesis do it by default.

A more reliable test would be to look for additional strings in the source that point to the use of WordPress, such as “wp-content” and “wp-includes”.

A faster way that avoids string searches would be to send an HTTP head request to `/wp-login.php` and check for:

Set-Cookie: wordpress_test_cookie

(/wp-login.php doesn't always appear in the root directory and it's not always accessible to all IPs, but that setup is most common).

[+] eugenekolo2|6 years ago|reply

It'd be interesting to do all that you said (and more) and then determine what's the combined amount, as well as what % of sites do some sort of obfuscation... and why?

[+] benbristow|6 years ago|reply

The way this works seems to check if there's the "generator" meta tag in the <head>.

Custom themes etc. might choose to omit that so it's not a 100% reliable check

[+] andros|6 years ago|reply

I don't think there's an unquestionable way to know if a site is made in WordPress, although you can intuit by some clues: header, meta tag, robots.txt, cookie login, sitemap ...

[+] audessuscest|6 years ago|reply

also headless website using WP as BE only

[+] sandov|6 years ago|reply

The article says that they also look for "wp-admin" et al in robots.txt

[+] mxpxrocks10|6 years ago|reply

you can compare it to this: https://www.isitwp.com/

[+] mxpxrocks10|6 years ago|reply

you have to use a larger sample. There are 300M+ domains. The longer tail will surely have more wordpress in it.

[+] mxpxrocks10|6 years ago|reply

Some other notes: 1) you're not checking subdomains like blog.company.com or paths like company.com/blog 2) if you use something like zgrab you can do 1M site crawl in a couple of hours. Consider checking it out.

[+] subpixel|6 years ago|reply

Or more Wix, Squarespace, and Shopify in it.

[+] buboard|6 years ago|reply

> the list of the first million domains with the most visits

How is that a random sample?

[+] mxpxrocks10|6 years ago|reply

are you using http or https? following redirects? how many? This is all going to change the study.

[+] audessuscest|6 years ago|reply

Only ? Looks like a big number for worldwide usage, no ?

[+] rbritton|6 years ago|reply

WP itself usually puts the estimate at 25-30%.

[+] ludamad|6 years ago|reply

Right, originally I read 1% and was surprised, but that's massive usage

[+] ga-vu|6 years ago|reply

Google Translate link for the article at the end of the repo:

https://translate.google.com/translate?sl=auto&tl=en&u=https...

[+] capableweb|6 years ago|reply

The Readme says "Warning that it can take a long time: between 20 to 30 days."

How in the world can it take so long time? The csv file seems to be 24mb in size and the computation performed can't be that advanced. Did the author do something seriously wrong?

[+] eugenekolo2|6 years ago|reply

Performing a web request to 1M targets takes a long time.

However, this is an [embarassingly parallel](https://en.wikipedia.org/wiki/Embarrassingly_parallel) problem, and renting out some machines would speed it up.

[+] buboard|6 years ago|reply

> To run it you'll need either 2Gb of RAM

What? I guess this is a toy program used to learn clojure or sth - it even uses sed for line parsing. A 10-line php script could do the same with a few MB of RAM

[+] detaro|6 years ago|reply

It just processes everything sequentially.

[+] mfer|6 years ago|reply

This is a pretty amazing feat. The top 1 million sites includes many who have the money to afford custom sites and yet Wordpress is still almost 1/5 of sites.

[+] smacktoward|6 years ago|reply

WordPress is the software HN loves to hate, but while it certainly has plenty of warts it's also a very flexible, pliable system for building the kinds of web sites that most people want to build. It'll never win any architectural beauty contests, but market share is driven by utility, not beauty. And WordPress can be very useful software.

[+] bdcravens|6 years ago|reply

I can afford to have all of my clothes professionally cleaned, yet I have a washer and dryer.

[+] blondin|6 years ago|reply

so they studied the top 1M websites and found 19% is using wordpress? man, that's a good chunk if you ask me.

[+] andros|6 years ago|reply

It's amazing. So we can say that 19% use PHP (at least) and MySQL.

[+] hestefisk|6 years ago|reply

I’m wondering why this takes 20-30 days to run all up? Seems crazy for 1M requests. Could one make this a concurrent task and get much greater efficiency?

[+] image888|6 years ago|reply

[deleted]

46 comments