> There were four test URLs in total, two of which tested for how search engines deal with unique content
...
> There is one unusual thing, however – a site: search brings up the lowercase URL, but the uppercase URL is filtered out for being too similar to the other displayed URLs and isn’t shown unless the ‘repeat the search with the omitted results included’ link is clicked.
Maybe the content isn't unique enough so Google's duplicate detection algorithm marks them as duplicate?
The two examples contain similar keywords and don't seem to have any outgoing links. A human would probably flag them as spun articles so I wouldn't be surprised if Google did the same.
Maybe the results would be different if each link was a unique high quality article instead.
> Monitoring the server logs showed that Bingbot only crawled the lowercase version of the URL.
I've always thought it was much more common that sites written in ASP.Net have case-sensitive URLs. At least that was quite common ~5 years ago (was it a default setting or something? I haven't done .Net stuff in a while). So it's pretty crazy that Bing only crawls lowercased URLs.
For years (if not decades) case haven’t been important in file names nor URL in the Microsoft world (using web server on a MS computer will result as this exact experience concerning letter case. An Image.jpg or IMAGE.JPG will only show one of the two images).
Would have been interesting to see how case differences in the host name portion of the URL were treated. Domain names are case insensitive — would google search both https://foo.example.com/bar and https://foo.Example.com/bar? It should not.
seanwilson|6 years ago
...
> There is one unusual thing, however – a site: search brings up the lowercase URL, but the uppercase URL is filtered out for being too similar to the other displayed URLs and isn’t shown unless the ‘repeat the search with the omitted results included’ link is clicked.
Maybe the content isn't unique enough so Google's duplicate detection algorithm marks them as duplicate?
The two examples contain similar keywords and don't seem to have any outgoing links. A human would probably flag them as spun articles so I wouldn't be surprised if Google did the same.
Maybe the results would be different if each link was a unique high quality article instead.
gowld|6 years ago
joekrill|6 years ago
I've always thought it was much more common that sites written in ASP.Net have case-sensitive URLs. At least that was quite common ~5 years ago (was it a default setting or something? I haven't done .Net stuff in a while). So it's pretty crazy that Bing only crawls lowercased URLs.
gowld|6 years ago
But it might cause problems if the IIS server isn't properly case-preserving to normalize back to the standard (possibly capitalized) form.
Mac filesystem was nicely case-preserving but case-agnostic in this way, going back decades.
lgats|6 years ago
Now I either 301 redirect to the proper-case url or ensure I have canonical tags set up.
JeanMarcS|6 years ago
Could this be an extension of that ? Looks like.
toolslive|6 years ago
Doctor_Fegg|6 years ago
gumby|6 years ago
gowld|6 years ago
wccrawford|6 years ago