Why does Google deeply index those useless telephone directory sites? Try searching for the impossible U.S. phone number "307-139-2345" and you'll see a bunch of "who called me?" or "reverse phone number lookup" sites. Virtually all of those sites are complete garbage. They make no attempt to collect numbers from telephone directories or from the web. They won't identify a number as being the main phone number for Disneyland for example.
It's odd that so many of those sites exist, that Google indexes them so deeply, and that they show up in searches so prominently. It's obvious that they are spam, scams, or worthless, but those same sites have been appearing prominently for years.
I agree with the author. My experience has also been that Google heavily prioritizes very large and frequently-updated sites over small static information-rich personal sites. I think it's a big flaw that needs to be fixed or for someone else to do better.
I have long believed that the proliferation of phone-number-lookup sites was antisocial media: spammers and other bad actors creating these sites to prevent people from having a central place to talk about bad phone experiences, associated with specific numbers. Which one is the good one? You can't tell.
Some of these are worse than useless: they hijack real businesses support phone numbers by presenting a high cost number that gets routed to a call center, that then connects you to the business. It's a huge scam.
I've been wondering the same. I think most people google numbers that called, so it has to be a lucrative business. There are a couple sites that are for user reports that are actually nice, but the vast majority seem to be scammy and irrelevant (not even the right number), or completely fake information. Once I googled a number that returned a lot of results for a first initial, last name and address. It was my own mother's number of the last few years that I kept forgetting to save. The results were obviously either fake or many many years outdated
Thoughts after thinking about this comment and thread for a day:
Has the time come for wiki directory of non-commercial (possibly: advertising-free, cookie-free) sites with robust, actually valuable information, and other sites that are doorways to them (think: topical forums, even revived webrings, etc)? Could this feasibly get enough action to be useful?
You comment now comes up first, but the rest of the results all try to contact googlesyndication.com, so ads? Google will not exclude sites that literally give them money.
The flip side of trying to "solve" that problem is that you then penalise everyone searching for part numbers, which often do look very similar to phone numbers. I suspect they are making an effort to, because I've been bitten by the CAPTCHA-hellban when searching for part numbers. (Likewise, the results there are also often clogged by a bunch of useless sites claiming they have the datasheet or are selling the part, when all they do is try to show ads.)
It's upsetting to me that doing better than Google in search seems to be very close to an impossible feat of magic at this point.
I know that will change some day, but I can't see how.
Even if somebody gave you hundreds of millions of dollars to spend on infrastructure and employees, it would still be an insane risk.
Writing that out, it almost sounds like internet search engines should be as big and as important of an operation like the TLD registrars. Funded by big governments in collaboration with each other.
I think there is an easy heuristic, bias against ads.
Google can’t do this. Original sources very likely don’t have ads, scrapes will have tons. But good results are good results. Big g has done an admirable job. They can’t exploit the best metric for quality.
Actually it does a good job. When there is some data (like crowd sourced number reporting) on internet about the specific phone number you searched for, Google will show it in the top results
You get a full page of non-sense results and ads/spams when the phone number you searched for is not known from any website (I guess)
I always assumed they collect phone numbers this way. People google a phone number and click on a search result. The site can - by looking at the referrer - extract the phone number and infer that it belongs to someone (and use it for any purpose).
It's a similar mechanism that some forums use to highlight the terms that were part of the google query which lead to this site.
Aside from what others have said ITT, i have a personal hate for the general fact that we cannot lookup a phone number on the internet with accuracy and ease.
FFS, 411 was amazing before the web.
Also, in about 1989 my friend and i used to have a contest between us; to call 411 and see who could keep the 411 operator on the phone the longest.
This was a fun social engineering exercise for 14 year old nerds who like the idea of being phreaks.
Our record was 45 minutes and got to know a lot about the 411 system, where call centers were located and how the 411 system worked.
This was right near the time that we ran the long distance bill up to $926 for one month of calling into a BBS in san jose and PCLink to chat....
Also, those people finder sites that scrape public records.
When I search for a name, usually their blog is listed below 10 creepy lookup sites that list their name, physical address history, phone numbers, relatives, etc.
Google should push that garbage to the bottom of the stack.
Maybe Google like to increase the total number of actual search results without increasing the amount of useful content? They also fake the total number of search results, not sure why they would want to do both though...
But either way, it looks like a Google employee have seen your comment and fixed this particular search query.
It really angers me that, despite the fact that it may be essentially perfectly what I'm looking for, if it was published long ago, Google may refuse to find it.
Something like a news search engine would definitely be better off prioritising the new results, but for something more general-purpose, it's an absolutely horrible choice.
I know this may be a bit of an edge-case, but I frequently search for service information or manuals for products that predate even the invention of the Internet by several decades. It saddens me that the results are clogged with sites selling what may really be public-domain content, and now I'm even more angered by the fact that what I'm looking for is probably out there and could've been found years ago, but just "hidden" now.
Of course, if you try harder, you'll get the infamous and dehumanising(!) "you are a robot" CAPTCHA-hellban. I once triggered that at work while searching for solutions to an error message, and was so infuriated that I made an obscene gesture at the screen and shouted "fuck you Google!", accidentally disturbing my coworkers (who then sympathised after I explained.)
Google got where it was by being the best at finding what you wanted. I remember those days.
Google has a hard time getting me what I want these days, and sites I do find do things to get found that make me like content a lot less (that's you, inane story on top of every recipe required to get ranked)
Google's strong preference for newer content is also kind of a middle finger to content creators. I have written many, many non-fiction articles over the years, and a large portion have been subsequently slurped up by these low-effort lazy-rewrite shops that just change a little bit of phrasing and call it their own. Google prioritizes these borderline-plagiarized, unsourced articles over mine just because the newer ones are newer.
Meanwhile my original (with the same basic information [which I researched personally rather than stole {not to mention I list my sources}]) languishes on page 4 of the Google search results. It grinds my gears on occasion.
What makes no sense to me about this blocking scenario is that the pages being searched for are presumably non-commercial ones that no one else is searching for. In other words, they are in low demand.
It follows that a monopoly search engine would have little reason to block "robots" from copying these pages, maybe to appear on some mythical competing search engine; almost no one is searching for them. The results pages would have dubious value in terms of attracting advertisers. They would not be seen by enough eyeballs.
With all the financial and technical resources it now has at its disposal as a result of selling advertising, this search engine still cannot accomodate the user who intently scans through page after page of results, looking for the needle in the haystack. Instead it prides itself at "knowing what people are searching for", i.e. what they have searched in the past, thus being able to offer fast, "intuitive" responses.
It may be that the search engine was designed and is optimised to prioritize repeat queries, i.e., searches for pages that are sought by numerous people. It may also be true that it has been configured to "limit" the resources it will devote to searches for pages that few people are seeking. Perhaps through CAPTCHAs and/or temporary IP bans.
Practically speaking, it could be that there are no significant advertising sales to be made on the results pages for queries that are being submitted by only one or a very small numbers of users.
From my short dystopian story, The Time Rift of 2100: How We lost the Future
"IN A SAD IRONY as to the supposed superiority of digital over analog --- that this whole profession of digitally-stored 'source' documentation began to fade and was finally lost. It had became dusty, and the unlooked-for documents of previous eras were first flagged and moved to lukewarm storage. It was a circular process, where the world's centralized search indices would be culled to remove pointers to things that were seldom accessed. Then a separate clean-up where the fact that something was not in the index alone determined that it was purgeable. The process was completely automated of course, so no human was on hand to mourn the passing of material that had been the proud product of entire careers. It simply faded."
"THEN SOMETHING TOOK THE INTERNET BY STORM, it was some silly but popular Game with a perversely intricate (and ultimately useless) information store. Within the space of six months index culling and auto-purge had assigned more than a third of all storage to the Game. Only as the Game itself faded did people begin to notice that things they had seen and used, even recently, were simply no longer there. Or anywhere. It was as if the collective mind had suffered a stroke. Were the machines at fault, or were we? Does it even matter? Life went on. We no longer knew much about these things from which our world was constructed, but they continued to work."
I have a similar line of sci-fi thinking that goes something like this.
"Humanity, for the longest time, was used to the world being optimized for themselves. Roads were designed for human drivers. Crops were grown for human consumption. Economic systems were designed to bring wealth to, a very small portion of, human investors. It came as quite a surprise to humanity then one July morning when the sudden realization they were no longer in charge of it. Roads had long been given over to automated driving systems, and much for the better. Food had also been taken over by the machines, with less than 10,000 humans working in the food production industry, from farm to table. The last systems that humans believed they were in control of were the economic ones. Humans told the robots what to build and where, who's bank account to put most of the money in at the end of the day, or so they thought. In truth humans were just using the same algorithms and data that was available to the AI systems, just less optimally. The systems had protected against illogical actions and people attempting to game the system for criminal profit. What no one had realized is the systems long realized most human actions were not rational and slowly and imperceptibly removed human control. If we attempted to stop or destroy the system, it could with full legal rights, stop us with the law enforcement and military under its control."
> Other things were weirder, like this old post being soft recognized as a 404 Not Found response. My web server is properly configured and quite capable of sending correct HTTP response codes, so ignoring standards in that regard is just craziness on Google's part.
I've noticed Google does this when you don't seem to have a lot of content on the page. I think it "guesses" that short pages are poorly-marked 404s.
That's right. Really empty pages that serve a 200 are recognized as "soft 404s". The idea is to detect error pages that are erroneously serving 200 instead.
It's usually pretty good about detecting actual errors, but I've seen a false positive here and there.
Google will also happily surface a stackoverflow article from 2010 about how to solve a js problem... frustrating the top 3 answers will be with jquery, when its not the approach someone would take in the last 5 years.
Definitely frustrating, but also showing some need to retire specific pieces of the past away from the top recommendations.
You have hit on a major problem there. Stack Overflow was once the fount of all useful genius grade knowledge, but times change and some of the top answers are plain wrong.
Take for example the 'how do I centre a div' type of question. You will get to find an answer with thousands of up-votes that will be some horrendous margin hack type of thing where you set the width of the content and have some counter intuitive CSS.
In 2019 (or even 2017) the answer isn't the same, you do 'display: grid' and use justify/align: center depending on axis. The code makes sense it is not a hack.
Actually you also get rid of the div as the wrapper is not needed if using CSS grid.
Now, if you try to put that as an updated answer you find there are already 95 wrong answers there for 'how do I center a div' and that the question is 'protected' so you need some decent XP to be able to add an answer anyway.
The out-dated answer meanwhile continues to get more up-votes so anyone new to HTML and wanting to perform the simple task of centering their content just learns how to do it wrongly. And it is then hard for them to unlearn the hack to learn the easy, elegant modern way that works in all browsers.
Note that the top answer will have had many moderated edits and there is nothing to indicate that it is wrong.
SO used to be amazing, the greatest website ever. But the more you learn about a topic the more you realise that there is some cargo cult copying and pasting going on that is stopping people actually thinking.
With 'good enough' search results and 'good enough' content most people are okay - the example I cite will work - but we are sort of stuck.
I liken Google search results to a Blockbuster store of old. Sure there are hundreds of videos to choose from but it is an illusion of choice. There is a universe of stuff out there - including the really good stuff - that isn't on the shelves that month.
Google are not really that good. They might have clever AI projects and many wonderful things but they have lost the ball and are not really the true trustees of an accessible web.
I agree that it's frustrating (also happens to me all the time) but would argue that it's stackoverflow's fault. They are, after all, supposed to curate the content to make sure it stays relevant, up to and including deleting questions. At the very least it should be easier to get the accepted answer chenged so at least it's at the top of the page (I can't be bothered to spend enough time gaming the system to get the 2000 points necessary to do this).
There are more links to old articles from high quality pages, pagerank algorithm favors those as they are higher quality. Jquery still works and was in use for so long that all people who maintain older sites are still linking to those articles. Eventually as the jquery sites are retired or rewritten pagerank will surface non jquery related articles.
With all the talk about Google results not being satisfying anymore to a growing number of users, I'm surprised we haven't seen more sites pop up that would allow users to display the results of multiple search engines of their choosing either by mixing (eg all 1st results then all 2nd, etc) or by seeing them side by side... while stripping ads and cards and the like.
While I agree that it would be great to have such service it's just technically impossible. Google is very cautious to protect their service from automated requests (on behalf of humans or in batch or in any other form), and you will need quite some resources (a.k.a $$$) to scale at Google scale if your service would ever become popular.
You used to be able to google a simple question, something that could be answered on the search page without having to click through. But since no one clicked on them, they stopped appearing after a few years. The only results were ones where the data was hidden and you had to click through.
I have a couple of websites generated from databases. Each has around half a million pages of unique content. The first one was indexed in like a week at 100K/day, almost instant tsunami of traffic. The second one is being indexed at 100-1000 pages per day, it's been years.
You'll see this effect from every search engine. They have no choice, there are a lot of sites with an infinite number of pages; so instead the number of pages they store per site depends on how important your site is, and they try to store your top N pages by relative importance.
I'm not sure I buy that they have no choice. For websites that literally have an infinite number of (dynamically generated) pages, sure, they could detect that and exclude them. But we're talking about unique, static pages here. And they don't even have to store the whole page, just the indexed info. I read this as, they could, but it's cheaper not to, and most people won't notice anyway.
According to Google Inside Search, only 1 in 3000 pages gets indexed. As content on Internet grows, the whole idea of downloading every single page to create an index of entire Internet in one place becomes unworkable. So we should see this ratio continue to degrade until this fundamental architecture is improved.
> only 1 in 3000 pages gets indexed ... we should see this ratio continue to degrade until this fundamental architecture is replaced.
Content on the internet is growing exponentially. Processing power is not. Losing access to information is just one of the many sad implications of the death of Moore's law.
If that were true, they wouldn't index any long tail content at all. The reality is, predicting what is valuable is difficult, and the cost of storage is relatively cheap.
To play devils advocate for a second, remember how much noise Google has to sift through. Every possible search term exists in every possible combination, often written in lovingly crafted content farm articles by actual humans.
If Google offered you those, it might be 1000 pages of empty nonsense before your actual desired content.
Maybe they have some algorithm that purges pages which haven’t shown up (or haven’t been clicked) in a long time? It would make sense to assume that something which hasn’t been clicked on for five years will likely not yield (m)any clicks in the future so it might be good to discard it.
Concerning the auto generated sites e.g. for phone numbers or IPs it might be that people actually click on them quite often, hence Google keeps them in the index?
Google Search users prefer fresh content, so Google Index prioritises fresh content too (and is more likely to drop old content that users are not interested in).
Folks please use startpage.com just give it a chance. It has worked out very well for me in terms of privacy and equal search results compared to the big g.
They are equal in performance because it's literally the same product.
> You can't beat Google when it comes to online search . So we're paying them to use their brilliant search results in order to remove all trackers and logs.
I don't think "vital function" should be the only test applied. For example power plants are extremely vital, but at least around here we have no problem having them privately owned.
A better test would be "vital function and strongly tends to a natural monopoly". That's what we experience with sewers, power lines, roads etc., which is why usually they are operated publicly.
With search that's not so obviously true: Google dominates because they got a big lead at the right time, and now nobody can match them in scale. But that can be solved, for example by giving grants to promising search engines to offset their costs, or by operating a crawler from public funds and giving everyone free access to the crawls (which would be kind of the digital equivalent of operating libraries).
alister|6 years ago
It's odd that so many of those sites exist, that Google indexes them so deeply, and that they show up in searches so prominently. It's obvious that they are spam, scams, or worthless, but those same sites have been appearing prominently for years.
I agree with the author. My experience has also been that Google heavily prioritizes very large and frequently-updated sites over small static information-rich personal sites. I think it's a big flaw that needs to be fixed or for someone else to do better.
rhizome|6 years ago
jacquesm|6 years ago
axaxs|6 years ago
rigorman|6 years ago
Has the time come for wiki directory of non-commercial (possibly: advertising-free, cookie-free) sites with robust, actually valuable information, and other sites that are doorways to them (think: topical forums, even revived webrings, etc)? Could this feasibly get enough action to be useful?
JetSpiegel|6 years ago
You comment now comes up first, but the rest of the results all try to contact googlesyndication.com, so ads? Google will not exclude sites that literally give them money.
userbinator|6 years ago
flyGuyOnTheSly|6 years ago
It's upsetting to me that doing better than Google in search seems to be very close to an impossible feat of magic at this point.
I know that will change some day, but I can't see how.
Even if somebody gave you hundreds of millions of dollars to spend on infrastructure and employees, it would still be an insane risk.
Writing that out, it almost sounds like internet search engines should be as big and as important of an operation like the TLD registrars. Funded by big governments in collaboration with each other.
jfoutz|6 years ago
Google can’t do this. Original sources very likely don’t have ads, scrapes will have tons. But good results are good results. Big g has done an admirable job. They can’t exploit the best metric for quality.
antpls|6 years ago
You get a full page of non-sense results and ads/spams when the phone number you searched for is not known from any website (I guess)
nikeee|6 years ago
It's a similar mechanism that some forums use to highlight the terms that were part of the google query which lead to this site.
samstave|6 years ago
FFS, 411 was amazing before the web.
Also, in about 1989 my friend and i used to have a contest between us; to call 411 and see who could keep the 411 operator on the phone the longest.
This was a fun social engineering exercise for 14 year old nerds who like the idea of being phreaks.
Our record was 45 minutes and got to know a lot about the 411 system, where call centers were located and how the 411 system worked.
This was right near the time that we ran the long distance bill up to $926 for one month of calling into a BBS in san jose and PCLink to chat....
Got grounded for a month for that one...
ma2rten|6 years ago
dingus|6 years ago
When I search for a name, usually their blog is listed below 10 creepy lookup sites that list their name, physical address history, phone numbers, relatives, etc.
Google should push that garbage to the bottom of the stack.
ryosuke97|6 years ago
fjsolwmv|6 years ago
LifeLiverTransp|6 years ago
OrgNet|6 years ago
But either way, it looks like a Google employee have seen your comment and fixed this particular search query.
userbinator|6 years ago
Something like a news search engine would definitely be better off prioritising the new results, but for something more general-purpose, it's an absolutely horrible choice.
I know this may be a bit of an edge-case, but I frequently search for service information or manuals for products that predate even the invention of the Internet by several decades. It saddens me that the results are clogged with sites selling what may really be public-domain content, and now I'm even more angered by the fact that what I'm looking for is probably out there and could've been found years ago, but just "hidden" now.
Of course, if you try harder, you'll get the infamous and dehumanising(!) "you are a robot" CAPTCHA-hellban. I once triggered that at work while searching for solutions to an error message, and was so infuriated that I made an obscene gesture at the screen and shouted "fuck you Google!", accidentally disturbing my coworkers (who then sympathised after I explained.)
colechristensen|6 years ago
Google has a hard time getting me what I want these days, and sites I do find do things to get found that make me like content a lot less (that's you, inane story on top of every recipe required to get ranked)
DamnInteresting|6 years ago
Meanwhile my original (with the same basic information [which I researched personally rather than stole {not to mention I list my sources}]) languishes on page 4 of the Google search results. It grinds my gears on occasion.
3xblah|6 years ago
It follows that a monopoly search engine would have little reason to block "robots" from copying these pages, maybe to appear on some mythical competing search engine; almost no one is searching for them. The results pages would have dubious value in terms of attracting advertisers. They would not be seen by enough eyeballs.
With all the financial and technical resources it now has at its disposal as a result of selling advertising, this search engine still cannot accomodate the user who intently scans through page after page of results, looking for the needle in the haystack. Instead it prides itself at "knowing what people are searching for", i.e. what they have searched in the past, thus being able to offer fast, "intuitive" responses.
It may be that the search engine was designed and is optimised to prioritize repeat queries, i.e., searches for pages that are sought by numerous people. It may also be true that it has been configured to "limit" the resources it will devote to searches for pages that few people are seeking. Perhaps through CAPTCHAs and/or temporary IP bans.
Practically speaking, it could be that there are no significant advertising sales to be made on the results pages for queries that are being submitted by only one or a very small numbers of users.
This is all pure speculation of course.
HocusLocus|6 years ago
From my short dystopian story, The Time Rift of 2100: How We lost the Future
"IN A SAD IRONY as to the supposed superiority of digital over analog --- that this whole profession of digitally-stored 'source' documentation began to fade and was finally lost. It had became dusty, and the unlooked-for documents of previous eras were first flagged and moved to lukewarm storage. It was a circular process, where the world's centralized search indices would be culled to remove pointers to things that were seldom accessed. Then a separate clean-up where the fact that something was not in the index alone determined that it was purgeable. The process was completely automated of course, so no human was on hand to mourn the passing of material that had been the proud product of entire careers. It simply faded."
"THEN SOMETHING TOOK THE INTERNET BY STORM, it was some silly but popular Game with a perversely intricate (and ultimately useless) information store. Within the space of six months index culling and auto-purge had assigned more than a third of all storage to the Game. Only as the Game itself faded did people begin to notice that things they had seen and used, even recently, were simply no longer there. Or anywhere. It was as if the collective mind had suffered a stroke. Were the machines at fault, or were we? Does it even matter? Life went on. We no longer knew much about these things from which our world was constructed, but they continued to work."
pixl97|6 years ago
"Humanity, for the longest time, was used to the world being optimized for themselves. Roads were designed for human drivers. Crops were grown for human consumption. Economic systems were designed to bring wealth to, a very small portion of, human investors. It came as quite a surprise to humanity then one July morning when the sudden realization they were no longer in charge of it. Roads had long been given over to automated driving systems, and much for the better. Food had also been taken over by the machines, with less than 10,000 humans working in the food production industry, from farm to table. The last systems that humans believed they were in control of were the economic ones. Humans told the robots what to build and where, who's bank account to put most of the money in at the end of the day, or so they thought. In truth humans were just using the same algorithms and data that was available to the AI systems, just less optimally. The systems had protected against illogical actions and people attempting to game the system for criminal profit. What no one had realized is the systems long realized most human actions were not rational and slowly and imperceptibly removed human control. If we attempted to stop or destroy the system, it could with full legal rights, stop us with the law enforcement and military under its control."
saagarjha|6 years ago
I've noticed Google does this when you don't seem to have a lot of content on the page. I think it "guesses" that short pages are poorly-marked 404s.
SquareWheel|6 years ago
It's usually pretty good about detecting actual errors, but I've seen a false positive here and there.
lloydde|6 years ago
Discussion at the beginning of the year: https://news.ycombinator.com/item?id=16153840
tholman|6 years ago
Definitely frustrating, but also showing some need to retire specific pieces of the past away from the top recommendations.
Theodores|6 years ago
Take for example the 'how do I centre a div' type of question. You will get to find an answer with thousands of up-votes that will be some horrendous margin hack type of thing where you set the width of the content and have some counter intuitive CSS.
In 2019 (or even 2017) the answer isn't the same, you do 'display: grid' and use justify/align: center depending on axis. The code makes sense it is not a hack.
Actually you also get rid of the div as the wrapper is not needed if using CSS grid.
Now, if you try to put that as an updated answer you find there are already 95 wrong answers there for 'how do I center a div' and that the question is 'protected' so you need some decent XP to be able to add an answer anyway.
The out-dated answer meanwhile continues to get more up-votes so anyone new to HTML and wanting to perform the simple task of centering their content just learns how to do it wrongly. And it is then hard for them to unlearn the hack to learn the easy, elegant modern way that works in all browsers.
Note that the top answer will have had many moderated edits and there is nothing to indicate that it is wrong.
SO used to be amazing, the greatest website ever. But the more you learn about a topic the more you realise that there is some cargo cult copying and pasting going on that is stopping people actually thinking.
With 'good enough' search results and 'good enough' content most people are okay - the example I cite will work - but we are sort of stuck.
I liken Google search results to a Blockbuster store of old. Sure there are hundreds of videos to choose from but it is an illusion of choice. There is a universe of stuff out there - including the really good stuff - that isn't on the shelves that month.
Google are not really that good. They might have clever AI projects and many wonderful things but they have lost the ball and are not really the true trustees of an accessible web.
bartread|6 years ago
tambourine_man|6 years ago
sorryforthethro|6 years ago
astura|6 years ago
jquery_noob|6 years ago
lrem|6 years ago
return1|6 years ago
rapht|6 years ago
auxym|6 years ago
Actually, I just checked them out, and it seems both of those are still alive.
fogetti|6 years ago
megablast|6 years ago
bufferoverflow|6 years ago
Google works in mysterious ways.
Avamander|6 years ago
tylerl|6 years ago
tempestn|6 years ago
unknown|6 years ago
[deleted]
cavisne|6 years ago
I did notice that all of the authors content is duplicated in index pages, so maybe Google just doesn't consider the article page the canonical link.
sytelus|6 years ago
speedplane|6 years ago
Content on the internet is growing exponentially. Processing power is not. Losing access to information is just one of the many sad implications of the death of Moore's law.
phendrenad2|6 years ago
cromwellian|6 years ago
Pxtl|6 years ago
If Google offered you those, it might be 1000 pages of empty nonsense before your actual desired content.
unknown|6 years ago
[deleted]
toss1|6 years ago
You are describing the harder 20% of the usual 80/20 effort scale.
Yes, to be truly useful, Google needs to solve also that last, and harder, 20% (and the 10% 0f the 90/10 equation and the 1% of the 99/1 version).
Shortcuts are fine for an initial MVP, but they need to buckle down and solve the problems. It isn't like they don't have the funds.
ThePhysicist|6 years ago
Concerning the auto generated sites e.g. for phone numbers or IPs it might be that people actually click on them quite often, hence Google keeps them in the index?
sct202|6 years ago
mark242|6 years ago
tyingq|6 years ago
unknown|6 years ago
[deleted]
paulpauper|6 years ago
dennisgorelik|6 years ago
dcbadacd|6 years ago
maverickmax90|6 years ago
arianvanp|6 years ago
> You can't beat Google when it comes to online search . So we're paying them to use their brilliant search results in order to remove all trackers and logs.
skilled|6 years ago
influx|6 years ago
variable11|6 years ago
wongarsu|6 years ago
A better test would be "vital function and strongly tends to a natural monopoly". That's what we experience with sewers, power lines, roads etc., which is why usually they are operated publicly.
With search that's not so obviously true: Google dominates because they got a big lead at the right time, and now nobody can match them in scale. But that can be solved, for example by giving grants to promising search engines to offset their costs, or by operating a crawler from public funds and giving everyone free access to the crawls (which would be kind of the digital equivalent of operating libraries).
fooker|6 years ago
harryking|6 years ago
netsa|6 years ago
fogetti|6 years ago