WingNews

bane|3 years ago

I was just searching for an old friend of mine who's last name happens to be a substring of another common last name. I tried everything, quotes, + signs, - signs, middle initials, middle names, cities we lived in together, etc.

Every single returned link after the first 3 had the superstring version of the name and not the correct name. It turns out that this returns endless results for a fairly well known singer, not my friend.

So now did I not get the results I was looking for, I got tons of results that were objectively wrong.

Then suddenly, about 6 pages into those results, I started getting ones for the correct last name, but now the first name is a mess.

This happened on Google, DDG, Baidu, Sogou, Haosou, Dogpile, the current Yahoo search, Bing, and to some extent on Yandex. Naver was worse, Daum totally worthless with incorrect results.

Utterly worthless.

The thing is, my friend's name is surprisingly fairly unique, there's probably less than 20 people in the world with that specific name. It's like the search engine's desire to fill the screen with worthless garbage results has overpowered the need to supply the 2 or 3 that are actually correct, even if the quantity is a little disappointing.

giancarlostoro|3 years ago

I would honestly pay at minimum $10 a month to a search engine startup that focuses on the top 10k, then top 100k Alexa sites, and does good indexing of top sites. If I google something programming related, give me all the stackoverflow you find relevant. I don't even care about image search, that can come later. I think the world has room for a search engine competitor, I'm just not sure what it would look like, but I hope someone is working on something that isn't just a repeat of hot garbage.

asicsp|3 years ago

Negating some string with a `-` prefix works in my experience. But haven't come across a case with superstrings, can't think of an example to try too.

behringer|3 years ago

Try neeva?

Denzel|3 years ago

Would you mind providing details like the search query and link to the page you expect to be found?

To test your hypothesis, I did a basic search for exact matches on "we do not synchronize on the update of the broker node" and Google returned 2 search results in 240ms:

- https://github.com/a0x8o/kafka/blob/master/core/src/main/sca...

- https://jar-download.com/artifacts/org.apache.kafka/kafka_2....

Which contain exactly the source code from GitHub that I was looking for. You'll notice that the first result is actually a0x80's fork of apache/kafka. Google states that some entries very similar to the 2 already displayed were omitted, and I'm able to remove that filter. With that filter removed, I can see the same document indexed from apache/kafka on GitHub.

There's nothing I can do or promise directly, but I can assure you that Google takes the quality of our search results very seriously. If you believe we're not delivering quality results, I strongly encourage you to click that "Send Feedback" link at the bottom of your results so that our teams can act upon your feedback.

Disclosure: I work on Search at Google.

Disclaimer: The words, views, and opinions expressed in this post are my own. They are not representative nor do they represent my employer in any capacity.

jrvarela56|3 years ago

I dont know how common this is, but in my 12 years using this site this is the first time I see a Google employee address a customer regarding a product they work on.

Congrats and hope Google takes advantage of HN, similar to how startups use this forum to engage with users - it is now a meme that Google Search is unusable so there must be something to learn from the audience.

I will use the send feedback button tomorrow as you suggest.

dpkirchner|3 years ago

> Google states that some entries very similar to the 2 already displayed were omitted, and I'm able to remove that filter.

I've definitely seen that sort of thing before but there is no such link there at the moment -- at least not when searching from my iPhone, whether or not I'm in desktop mode. I just see a large error box that says "It looks like there aren't many great matches for your search" followed by the link to the a0x80 fork.

By the way, the a0x80 result highlights a serious problem with search results: the GitHub URL is strangely modified. Instead of showing the full URL or even a prefix leading up to it Google is selecting parts of the URL, showing "https://github.com > src > transaction" on mobile and "https://github.com > kafka > coordinator > transaction" when I request the desktop site. In neither case is it obvious that the content isn't the canonical source from Apache. I've noticed this middle-out truncation for GH urls before but I'm not sure when it started.

sefrost|3 years ago

How often do people use the send feedback button? How many of the reports are looked at?

userbinator|3 years ago

Yes, I remember several years ago --- more like 8 now(!) --- easily finding results in GitHub repos whenever I've needed to look up error codes and such. Now even site:github.com doesn't (and if you try too hard, you get the hellban for a while).

Another extremely noticeable degradation is in finding part numbers, IC markings, service manuals (NOT the useless user manual), schematics, and the like. Anything that proponents of right-to-repair would be extremely interested in, to the extent that I wonder if there's been some sort of conscious effort being made by certain interests to eliminate or limit such information.

Then there's the niche-but-legal adult content. I won't go into too much detail about that, but suffice to say it used to be far easier to find.

It's been 5 years since this notorious item here, and I've only seen Google get worse: https://news.ycombinator.com/item?id=16153840

nus07|3 years ago

Sundar Pichai has so Mckinsified and MBAfied Google that at this point Google search seems like an A/B test to deliver the best targeted ad . Probably better of using any other search including Yahoo .

goldforever|3 years ago

[deleted]

saddd|3 years ago

I have the feeling that whatever you're talking about is explicitly not crawlable.

simonw|3 years ago

Yeah, the GitHub robots.txt is surprisingly restrictive:

https://github.com/robots.txt

   User-agent: *

   Disallow: /*/pulse
   Disallow: /*/tree/

That "/*/tree" rule means that search engine crawlers are allowed to hit the README file of a repo but effectively NONE of the other files in it.

Which means that if you keep your project documentation on GitHub in a docs/ folder it won't be indexed!

You need to publish it to a separate site via GitHub Pages, or use https://readthedocs.org/

(Side note: I just noticed https://github.com/ekansa/Open-Context-Data is explicitly listed in the robots.txt for GitHub - the only repo that gets a mention like that. I'd love to know the story behind that!)

saurik|3 years ago

A public git repository is definitely crawlable. Google seems to have given up actively going out of their way to index things that are hard to crawl as they got so big and important it was easier to just tell people "thou must do X or we won't index you and you want to be indexed", but increasingly the content I want to find is in weird little silos.

sebosp|3 years ago

Curious, if I had the list of repos, is there anything that forbids me from `while read url; do git clone $url data;./train data; rm -rf ./data; done`. Besides licensing, ie ratelimit/throttle, similar question, the search for code across all repos provided by github ui gets throttled pretty fast, what do people do? (not suggestion in a hundred(?) years to do the while loop for this tho ;))

Nuzzerino|3 years ago

That doesn’t change anything regarding the actual point of the comment.

papito|3 years ago

Seems like the giants that were nearly synonymous with "Internet" - Google and Amazon, are rapidly deteriorating and creating a massive market opportunity.

Pure speculation, but innovative companies at first, they started over-hiring and bloating, using questionable interviewing techniques (puzzles, Leetcode), taking on thousands of employees who were just there to game the system, coast, and collect the check.

It just looks like they stopped caring.

asddubs|3 years ago

and it just straight up ignores keywords even when there's matches containing all of them. google has become so much worse, and yes part of it is that there's a ton of spam, which is also a problem, but it has also gotten worse in other respects too

fIREpOK|3 years ago

> As one just example, I searched for a unique error message in code that exists on GitHub, is in a fairly popular repo, and is not new and Google just could not find it. That seems like a very basic failure.

I have recently almost completely stopped using Google's search engine due to the fact that I am very often offered zero search results for simple queries (usually involving quotes though) .. It's so bad I can't even believe it.

Note: I've been a Google search since it started... Gmail since Beta, etc...

At one point, I thought that maybe they started punishing ad-block users excessively.

pjmlp|3 years ago

Right now searching on Google is way worse than Yahoo, Altavista or Ask Jeeves.

Now only we get tons of ads back as first results, Google keeps rewriting the queries for whatever "helpful" nonsense.

MuffinFlavored|3 years ago

> but Google’s results are so compromised

I read this constantly here (echo chamber) and I can't help but feel it's a little biased/overdramatic.

dimator|3 years ago

Honestly, I'd be all for using ddg exclusively. but I find myself doing !g (their google redirect operator) when I don't find what I want on DDG, and it's almost always the top result on Google. And this happens daily.

MagicMoonlight|3 years ago

Most tech companies have ruined their products now. They’ll have 10,000 engineers and 15 iterations of the UI but you try and buy a hard drive and it’s a box with an SD card taped inside.

It’s time for competitors to start wiping them out.

pharmakom|3 years ago

Why do they even bother with the SD card?

karmakaze|3 years ago

Seems like the worst time, unless they're doing so with ChatGPT and the like. What regular search lacks is context and a natural way of refining queries by adding context that doesn't always work well with keywords.

bloodyplonker22|3 years ago

Not only have they become compromised from a technical standpoint, for some searches in particular, the results have been modified to be heavily politically biased and woke.

selimthegrim|3 years ago

Don’t downvote them until you check out Frank Zappa’s discography.

trilbyglens|3 years ago

My guess is that there is now so much ml Blackbox shit going on in the search algo that no one can reasonably tell you why it returns what it does.

gonzo41|3 years ago

Do you think google cares that it's loosing it's edge? How do they not know it's getting worse.

(no title)

discuss