(no title)
tuhriel | 2 years ago
> In fact, the Google results seem to have improved to some extent since the start of our experiment in terms of the amount of affiliate spam.
> But just like the researchers got more attention with the misleading title about what was studied, the journalists at 404 got more clicks by outright lying about the results.
There are some questions if the scraping via Startpage is messing with the result. They are using the Google crawler, but their anonymisation might mess with the results.
I don't agree with you interpretation of the result though. if we take a look on a longer excerpt of the conclusion, they do mention multiple times that the quality is getting lower:
> Although we cannot predict the rank of individual pages, at the population level, we can conclude that higher-ranked pages are *on average more optimized, more monetized with affiliate marketing, and they show signs of lower text quality*
and even the part you quoted goes on to mention a downward trend:
>In fact, the Google results seem to have improved to some extent since the start of our experiment in terms of the amount of affiliate spam. Yet, we can still find several spam domains and also see an *overall downwards trend in text quality in all three search engines*, so there is still quite a lot of room for improvement.
jsnell|2 years ago
As far as I can tell, that's not a claim about variance over time, i.e. about results being worse now than in the past. It's a claim about how the current population of pages and their current ranks, i.e. a page ranking higher is likely to be more SEO-optimized than a page ranking lower. It makes no claim about whether that was the case in the past, and if it was, whether it was true to a larger or lesser extent.
> Yet, we can still find several spam domains
They would have been able to find several spam domains at the start of their study, five years ago, ten years ago, or fifteen years ago. This statement is just totally empty when talking about whether the results are getting worse over time or not.
> and also see an overall downwards trend in text quality in all three search engines, so there is still quite a lot of room for improvement.
Sure. Did you check on what their definition of "text quality" is? I tried to, but couldn't since the paper never actually states it. But the only thing they actually report temporal statistics for is the "type-token ratio". Sounds fancy! What it turns out to be is "the number of unique words on page / number of total words on page".
That doesn't seem like a very strong claim about actual quality, especially when the only statistics they report is the 95th percentile.