This just proves all the "suspicions" privacy-conscious users have had about large corporations fingerprinting users, often in very obvious ways. There's often no better place to find ideas for surveillance than the people conscious about being surveilled.
I found it VERY amusing if you go to r/SEO just yesterday there were moderators and flaired users (you know, the elites of the SEO community, lol) insisting much of this was "debunked" years ago.
They of course deleted their posts, but the threads are still up. What a den of scammers over there.
And it's Apache licensed, which grants a patent license. Some of the comments refer to specific aspects of how page rank is calculated. Pagerank itself is past patent protection but I wonder if this also accidentally might grant licenses to other patents.
> My anonymous source claimed that way back in 2005, Google wanted the full clickstream of billions of Internet users, and with Chrome, they’ve now got it. The API documents suggest Google calculates several types of metrics that can be called using Chrome views related to both individual pages and entire domains.
What answer do the engineers at google working on this have for this violation of privacy?
I am not an engineer at Google but this is I would say if I was.
We don't know who you are, you are just a number in a database, and we don't even know what number, we just get the total number of visits for each website, not who visited it. It is like counting cars on a highway, not following your car. Plus, it serves the useful purpose of providing you with better search results, the terms and conditions allow it, and it can be disabled.
Personal (not work related opinion): This basically can’t happen with things like DMA and GDPR. DMA in particular means you can’t share data across “products” without explicit consent. So you could for example collect websites that don’t work for the purposes of improving Chrome, but not then share that with the Ads/Search orgs for personalisation or targeting, as far as I understand the legislation.
Personal opinion about work at Google (still not googles opinion) I’m consistently impressed with how seriously this stuff is taken and the amount of work that goes into making sure that things like this sharing can’t happen accidentally, and that user choice is respected. The engineers on the ground are absolutely making sure this all works, and most of us care deeply about user privacy. I have personally worked both on implementing new features that significantly push forward privacy, and on implementing privacy controls for regulatory purposes.
> What answer do the engineers at google working on this have for this violation of privacy?
The same answer you probably have for the millions of questions about what the things you do that some other people find offensive to their personal views and beliefs.
Sometimes I wonder how much better the internet would be hits on Google weren't directly tied to revenue from Google itself through its ad program. I am certain Google has made the internet and the world a worse place to live.
As a user of Kagi and search.marginalia.nu I can tell you:
Quite a bit.
So much that now that I have what "everyone" asked Google for for years - that is blacklists - I hardly use them.
Why? Because with Kagi I get much better results out of the box.
I am fairly sure Googlers will tell me there are multiple safeguards to prevent the inclusion of Google ads from affecting ranking, to which I just have to say that the results speak for themselves.
Please note: I have only used Kagi for two years. I am only one user. But I am a user with 20 years of experience with Google and that got to count for something.
Google was really great and revolutionary, they helped zillions of small companies to thrive. It was another cycle.
Then, now, it is like media before the 90s: you need to pay a lot of money to be in the center page of the newspaper.
But, hopefully we are talking about LLMs now, seems like one of the answers to search engines in general. Beyond AI, I see LLMs as a good evolution from PageRank.
A little bit general but lately I use the expression: "Complexity as Scam". Google always pointed to their "algorithms" and played with this term as if algorithms couldn't be adjusted to whatever you want to be. Initially the coined term was sound because it was based on a scientific paper and eventually it evolution but it seems like the PageRank original idea has detoured from being a "pure" graph algorithm.
Another context where I use "Complexity as Scam" is Web3. It is like Matryoshka dolls where there is always one more step of complexity to probe a point, but it never ends.
It's not black and white. There was a lot of junk that was forced on us and that was removed thanks to Google. But I agree the direct relationship is inherently corrupting.
I mean... maybe, but not really. The first problem of the internet was that there wasn't that much content specifically. The first internet companies were the broadband providers who were developing content themselves, like AOL.
Google and the ad ecosystem they acquired was basically the flywheel that spurred content creation at scale. Anyone could jump in, follow a few guidelines and earn a living by producing content on the internet. The Youtube acquisition and monetization followed the same pattern.
Over time the market consolidated and got less and less competitive: less platforms with complete control of traffic and one-sided revenue sharing agreements. The guidelines so to speak on how content should look and feel like were algorithmically made stricter and stricter until everything looks, feels, sounds and reads the same.
The problem right now is that the platforms are still tightening their grip, and it's all tied to the approach of using AI to replace the content creators on the platforms from Google to Spotify to Meta, and carving the spared money to shareholders. And while the web has been shitty for a few years now, we're now seeing a sudden drop in quality because the average user has no recourse or alternative, and neither does the average creator have the means of distribution and monetization (not just publishing, that's been solved) to even find, let alone meet the new kinds of demand.
I'm certain that in a few years this will even out: new search engines, new aggregators and new feeds will emerge, but the content - money - network problem triangle remains as a fundamental problem of the internet.
I imagine it would be a different flavour to what we have today, but the same intensity. Anything that so deeply penetrates daily life across the globe is going to bring enormous problems with it.
There is something truly strange about the idea than people "trust" a website operator and can rely on it to provide them with useful information when that same operator is well-known to be secretive, deceptive and dishonest in order to protect its own interests. It's like imagining that a fact witness who tells the truth on some occasions and lies on others is credible.
I work in search and didn't find anything surprising in here. But that's mostly because I've just assumed Google has been lying for years about many things, such as not using click data or Chrome data.
I've directly seen people who have successfully manipulated search rankings by having logged-in chrome users search for a term, and then click on a given page. Works like a charm (though may not stick once the manipulation is done, unless organic users also prefer it).
If anyone is surprised about chrome sending urls to Google, you can turn the “feature” off by unchecking “Make searches and browsing better” in the sync section of Google chrome settings.
"But what if I don't want my own computer to build and share a detailed profile of everyone I know, everywhere I go, all my preferences, and how to manipulate me?"
"Well obviously it's your fault for not picking the 'Don't Be Cool' option on subpage 27b-6, duh!"
> Thousands of documents, which appear to come from Google’s internal Content API Warehouse, were released March 13 on Github by an automated bot called yoshi-code-bot
Does anyone know more about yoshi-code-bot and how were these documents suddenly published?
Was it a script misconfiguration? A manual push? Something else?
As soon as they add the ability to configure shortcuts, I'd more than happy to. After several years of requests, we're finally seeing some movement on their end.
- Experience, expertise, authoritativeness, and trustworthiness (“E-E-A-T”) might not matter as directly as some SEOs think.
- Content and links are secondary when user intention around navigation (and the patterns that intent creates) are present.
- Classic ranking factors: PageRank, anchors (topical PageRank based on the anchor text of the link), and text-matching have been waning in importance for years. But Page Titles are still quite important.
- For most small and medium businesses and newer creators/publishers, SEO is likely to show poor returns until you’ve established credibility, navigational demand, and a strong reputation among a sizable audience.
TL;DR: Clickbait + bot farms are the way to go. No wonder the internet is going to shit.
Most of the factors in ranking a page are no surprise. But i was surprised that having Product reviews on your site is apparently a demotion? Surely, many people are searching to find just that?
Years ago I had a site for deep fryer reviews. The whole thing existed to make money from Amazon’s affiliate program. I hadn’t personally used ANY of the deep fryers. Was just writing reviews based on features and other people’s reviews. In short, I ranked high in Google and added nothing of value to the world with that site.
There was a brief period of time where I made decent money with it until Google deranked all the product review websites.
This is likely more about reviews with affiliate links. 99.99% of those are people reviewing absolutely nothing, just copying reviews and putting their own affiliate link.
Sites spam low quality product reviews with affiliate links to Amazon. This is done by "reputable" sites as well. I don't blame Google for down ranking this meta.
“xx,xxx five star reviews” I’ve found is a modern day over-marketed product trope. It feels well within the realm of reasons that this ends up serving as a useful heuristic.
most of these have been outright publicly denied by Google employees, despite people showing with A/B tests that things like CTR and backlinks impacted rankings
I would usually call this a dupe but this article and the other one from SparkToro are completely different even if they are on the same topic.
Haven’t had a chance to look at the API myself but the first impressions are that a lot of this was suspected by SEOs, but Google kept rejecting the ideas. Looks like clicks increase ranking for sure, which means click farms definitely have a legitimate business solution to offer.
I have used both for many years, and now, I see little difference in practice. I am leaning more towards Firefox these days. Main change is that I now use Firefox as my main mobile browser for ad blocking reasons. A few websites don't work on Firefox, I use Chrome for these few.
I don't consider it a problem to use two browsers at the same time, I usually don't to the same thing with them, so having separate profiles can be an advantage.
Note that privacy is not the reason why I am using Firefox. It is just that I think that knowing both is a good thing, and they are both good browsers, so why not? In some case, Firefox is better, in others Chrome is better, most of the times, they are interchangeable.
I've been using Firefox since Chrome forced users to sign in to the browser with their Google account, and I'm quite happy.
The only time it's a problem is when a site detects Firefox and won't display unlocked your using chrome or IE. I've only seen that a couple of times in the years since I switched back
Firefox is better than Chrome [in the privacy aspect]... but still pretty terrible.
It sends a lot of "analytics" and "tracking" to some of Mozilla's servers, but if you inspect the requests, those servers are actually behind Google's CDN,and Google does the TLS termination.
So... Google has access too all the data that Mozilla sends when it phones home. Some of it even has a unique identifying id.
I've been using Firefox since the days when it had the other name. Meanwhile, I use Floorp [1], which is based on Firefox, but offers much more possibilities for customization. I am very satisfied, except for the stupid name...
I found it interesting that the docs mention "site2vec" scores. This implies, I think, a variant of word2vec or document2vec, but for the full site; so probably a vector sum of the doc2vec scores of all individual pages?
I wonder about this. If I click a link and read it and I find that it's garbage (e.g. got ranked based on SEO rather than useful content) does it count as a successful click? Worse yet, some of these sites have blatant errors that are only discovered after examination.
This is relative to technical subject matter. Other searches, such as shopping may not suffer this kind of problem (or I have not noticed it.)
I also wonder how Google knows a click is successful. If I open a link in another tab, does the browser tell Google how long I lingered on the site? Perhaps Chrome does but I use Firefox.
Once you get to the top 1-3 results, CTR (click through rate) is a much bigger ranking factor. Google knows how long people stay on pages and whether they click and back out immediately. This is important for E-Commerce, because Google doesn't want Site #1 to be mostly out of stock even though they have better links.
> Prior to the email and call, I had neither met nor heard of the person who emailed me about this leak. They asked that their identity remain veiled
And yet the journalist included a screenshot with one of the weakest blurs I've ever seen... Why would you not excise the person's video portion completely? What good does it serve to have it included in the story? Even if that portion is faked, why would you offer potential signals like skin complexion, hair color, background picture, etc.? Why...
The author is Rand Fishkin, who's not a journalist. He's the founder of SparkToro and Moz, both companies that provide tooling and analytics for SEO.
I haven't looked deeply into Fishkin's companies, but I wouldn't expect either to be on the user's side when it comes to privacy. Both companies seem to monetize clickstream data and personal information from users who probably didn't give informed consent.
If the source was trying to get this information to a responsible journalist who cares about privacy, I have no idea why they'd approach a company (not even a news organization) who seems to fund the erosion of user privacy.
Isn't this the same type of "swirl" blur that Interpol was able to reverse even 10 years back? With advancements since then you're basically handing evidence on a silver platter.
To make it worse, he made clear when the call had happened, and you have:
1) Who was in the call
2) When the call happened
3) A blur instead of a complete black out
I'm not sure I would feel safe reporting stuff to journalists nowadays.
That also struck me as odd. And seemingly a violation of journalistic best-practices of protecting sources. I sure hope this was done with consent of the anonymous source.
It's also clearly from Google Meet so... yeah. If he was worried about retribution (from Google, anyway) then they probably wouldn't have been using a Google service.
Hopefully this doesn’t surprise anyone..if Google actually told us correct information about how the search algorithm works it would be abused immediately
What I find most interesting about this is that a lot of supposed "smart" algorithms of Big Tech are in fact a patchwork of "dumb" rules rules and human-picked winners. This would explain why the quality of search results is failing to keep up with developments in LLMs.
This also explains why it's impossible for incumbents to unseat the winners in many search categories -- because they've literally been picked as the winners by humans at Google.
Looking at my Twitter/X feed, I also see an oddly similar dynamic. Certain accounts appear to have been manually boosted, showing up all the time -- whereas others posting even the same exact content will never appear.
Silicon valley will loudly tell you all about how wonderful they are at "democratizing," however, if you look under the surface it appears they're just hand picking the winners.
Maybe this is an unpopular opinion, but if a search algorithm is truly designed to showcase the best content, then making it transparent shouldn't lead to manipulation
> A sample of statements from Google representatives (Matt Cutts, Gary Ilyes, and John Mueller) denying the use of click-based user signals in rankings over the years.
If there are really 14,000 attributes, most of them will have a weight near 0, thus are irrelevant. If they would be all heavy weighted, the ranking would be rendered irrelevant due to the sheer amount of attributes.
precompute|1 year ago
p3rls|1 year ago
I found it VERY amusing if you go to r/SEO just yesterday there were moderators and flaired users (you know, the elites of the SEO community, lol) insisting much of this was "debunked" years ago.
They of course deleted their posts, but the threads are still up. What a den of scammers over there.
https://www.reddit.com/r/SEO/comments/1d1eqjj/comment/l5tvfw...
https://www.reddit.com/user/WebLinkr/
I love how reddit is turning into the new SEO scam over night because of this stuff. Great work as always Danny Sullivan!
theolivenbaum|1 year ago
renegade-otter|1 year ago
dontdoxxme|1 year ago
ec109685|1 year ago
xnx|1 year ago
precompute|1 year ago
What answer do the engineers at google working on this have for this violation of privacy?
GuB-42|1 year ago
We don't know who you are, you are just a number in a database, and we don't even know what number, we just get the total number of visits for each website, not who visited it. It is like counting cars on a highway, not following your car. Plus, it serves the useful purpose of providing you with better search results, the terms and conditions allow it, and it can be disabled.
raxxorraxor|1 year ago
It certainly is not "to improve the net or advertising" - that would be the lying part.
Google has done some good for the net, but the scales of their contributions slowly but steadily move to the negative side.
danpalmer|1 year ago
Personal opinion about work at Google (still not googles opinion) I’m consistently impressed with how seriously this stuff is taken and the amount of work that goes into making sure that things like this sharing can’t happen accidentally, and that user choice is respected. The engineers on the ground are absolutely making sure this all works, and most of us care deeply about user privacy. I have personally worked both on implementing new features that significantly push forward privacy, and on implementing privacy controls for regulatory purposes.
unknown|1 year ago
[deleted]
marcinzm|1 year ago
The same answer you probably have for the millions of questions about what the things you do that some other people find offensive to their personal views and beliefs.
bdlowery|1 year ago
vouaobrasil|1 year ago
eitland|1 year ago
Quite a bit.
So much that now that I have what "everyone" asked Google for for years - that is blacklists - I hardly use them.
Why? Because with Kagi I get much better results out of the box.
I am fairly sure Googlers will tell me there are multiple safeguards to prevent the inclusion of Google ads from affecting ranking, to which I just have to say that the results speak for themselves.
Please note: I have only used Kagi for two years. I am only one user. But I am a user with 20 years of experience with Google and that got to count for something.
Workaccount2|1 year ago
No matter what, whatever we ended up with was going to be shitty and exploitive.
wslh|1 year ago
Then, now, it is like media before the 90s: you need to pay a lot of money to be in the center page of the newspaper.
But, hopefully we are talking about LLMs now, seems like one of the answers to search engines in general. Beyond AI, I see LLMs as a good evolution from PageRank.
A little bit general but lately I use the expression: "Complexity as Scam". Google always pointed to their "algorithms" and played with this term as if algorithms couldn't be adjusted to whatever you want to be. Initially the coined term was sound because it was based on a scientific paper and eventually it evolution but it seems like the PageRank original idea has detoured from being a "pure" graph algorithm.
Another context where I use "Complexity as Scam" is Web3. It is like Matryoshka dolls where there is always one more step of complexity to probe a point, but it never ends.
benterix|1 year ago
DarkNova6|1 year ago
A barrier whose erosion has been well documented over the last 10 years.
heresie-dabord|1 year ago
greg_V|1 year ago
Google and the ad ecosystem they acquired was basically the flywheel that spurred content creation at scale. Anyone could jump in, follow a few guidelines and earn a living by producing content on the internet. The Youtube acquisition and monetization followed the same pattern.
Over time the market consolidated and got less and less competitive: less platforms with complete control of traffic and one-sided revenue sharing agreements. The guidelines so to speak on how content should look and feel like were algorithmically made stricter and stricter until everything looks, feels, sounds and reads the same.
The problem right now is that the platforms are still tightening their grip, and it's all tied to the approach of using AI to replace the content creators on the platforms from Google to Spotify to Meta, and carving the spared money to shareholders. And while the web has been shitty for a few years now, we're now seeing a sudden drop in quality because the average user has no recourse or alternative, and neither does the average creator have the means of distribution and monetization (not just publishing, that's been solved) to even find, let alone meet the new kinds of demand.
I'm certain that in a few years this will even out: new search engines, new aggregators and new feeds will emerge, but the content - money - network problem triangle remains as a fundamental problem of the internet.
linsomniac|1 year ago
blowski|1 year ago
1vuio0pswjnm7|1 year ago
https://ipullrank.com/google-algo-leak
nsmog767|1 year ago
I've directly seen people who have successfully manipulated search rankings by having logged-in chrome users search for a term, and then click on a given page. Works like a charm (though may not stick once the manipulation is done, unless organic users also prefer it).
ec109685|1 year ago
Creepy.
HenryBemis|1 year ago
Terr_|1 year ago
"Well obviously it's your fault for not picking the 'Don't Be Cool' option on subpage 27b-6, duh!"
andrybak|1 year ago
Before that, you can make it audible: <https://github.com/berthubert/googerteller>
precompute|1 year ago
noman-land|1 year ago
thih9|1 year ago
Does anyone know more about yoshi-code-bot and how were these documents suddenly published?
Was it a script misconfiguration? A manual push? Something else?
chx|1 year ago
Created 1,891 commits in 19 repositories
All 19 is under googleapis
This looks like a bot Google uses to publish their stuff on github and so likely it's a misconfiguration.
ilrwbwrkhv|1 year ago
dgellow|1 year ago
metadigm|1 year ago
precompute|1 year ago
Boosting "organic traffic":
- Brand matters more than anything else
- Experience, expertise, authoritativeness, and trustworthiness (“E-E-A-T”) might not matter as directly as some SEOs think.
- Content and links are secondary when user intention around navigation (and the patterns that intent creates) are present.
- Classic ranking factors: PageRank, anchors (topical PageRank based on the anchor text of the link), and text-matching have been waning in importance for years. But Page Titles are still quite important.
- For most small and medium businesses and newer creators/publishers, SEO is likely to show poor returns until you’ve established credibility, navigational demand, and a strong reputation among a sizable audience.
TL;DR: Clickbait + bot farms are the way to go. No wonder the internet is going to shit.
BillFranklin|1 year ago
BillFranklin|1 year ago
Notably, for people on HN, it looks like there is indeed an internal initiative to promote small personal blogs :-)
> smallPersonalSite (type: number(), default: nil) - Score of small personal site promotion go/promoting-personal-blogs-v1
llmblockchain|1 year ago
Java, is that you?!
deely3|1 year ago
> Omit internet tropes.
lazide|1 year ago
isaacfrond|1 year ago
unnamed76ri|1 year ago
There was a brief period of time where I made decent money with it until Google deranked all the product review websites.
b112|1 year ago
zeroCalories|1 year ago
nottorp|1 year ago
cqqxo4zV46cp|1 year ago
yieldcrv|1 year ago
While bigger marketplaces have other ways of driving ranking
ren_engineer|1 year ago
skilled|1 year ago
Haven’t had a chance to look at the API myself but the first impressions are that a lot of this was suspected by SEOs, but Google kept rejecting the ideas. Looks like clicks increase ranking for sure, which means click farms definitely have a legitimate business solution to offer.
JSDevOps|1 year ago
jasonsb|1 year ago
GuB-42|1 year ago
I don't consider it a problem to use two browsers at the same time, I usually don't to the same thing with them, so having separate profiles can be an advantage.
Note that privacy is not the reason why I am using Firefox. It is just that I think that knowing both is a good thing, and they are both good browsers, so why not? In some case, Firefox is better, in others Chrome is better, most of the times, they are interchangeable.
mind-blight|1 year ago
The only time it's a problem is when a site detects Firefox and won't display unlocked your using chrome or IE. I've only seen that a couple of times in the years since I switched back
WhyNotHugo|1 year ago
It sends a lot of "analytics" and "tracking" to some of Mozilla's servers, but if you inspect the requests, those servers are actually behind Google's CDN,and Google does the TLS termination.
So... Google has access too all the data that Mozilla sends when it phones home. Some of it even has a unique identifying id.
Ringz|1 year ago
[1]: https://floorp.app/en/
rpgbr|1 year ago
[0] https://github.com/ungoogled-software/ungoogled-chromium
garbagewoman|1 year ago
9dev|1 year ago
HankB99|1 year ago
I wonder about this. If I click a link and read it and I find that it's garbage (e.g. got ranked based on SEO rather than useful content) does it count as a successful click? Worse yet, some of these sites have blatant errors that are only discovered after examination.
This is relative to technical subject matter. Other searches, such as shopping may not suffer this kind of problem (or I have not noticed it.)
I also wonder how Google knows a click is successful. If I open a link in another tab, does the browser tell Google how long I lingered on the site? Perhaps Chrome does but I use Firefox.
EcommerceFlow|1 year ago
badgersnake|1 year ago
var words = query.split
var results = executeQuery( Select * from AdWords aw where word in query inner join adlinks al on aw.id = al.id return al.url, al.desc)
If (results.size < 30) { // todo call search engine }
Return results
ilyazub|1 year ago
Same service wrappers from two years ago: https://github.com/googleapis/google-api-php-client-services...
usui|1 year ago
And yet the journalist included a screenshot with one of the weakest blurs I've ever seen... Why would you not excise the person's video portion completely? What good does it serve to have it included in the story? Even if that portion is faked, why would you offer potential signals like skin complexion, hair color, background picture, etc.? Why...
mtlynch|1 year ago
I haven't looked deeply into Fishkin's companies, but I wouldn't expect either to be on the user's side when it comes to privacy. Both companies seem to monetize clickstream data and personal information from users who probably didn't give informed consent.
If the source was trying to get this information to a responsible journalist who cares about privacy, I have no idea why they'd approach a company (not even a news organization) who seems to fund the erosion of user privacy.
krackers|1 year ago
Isn't this the same type of "swirl" blur that Interpol was able to reverse even 10 years back? With advancements since then you're basically handing evidence on a silver platter.
txomon|1 year ago
I'm not sure I would feel safe reporting stuff to journalists nowadays.
roastedpeacock|1 year ago
Control8894|1 year ago
It's also clearly from Google Meet so... yeah. If he was worried about retribution (from Google, anyway) then they probably wouldn't have been using a Google service.
adrianvincent|1 year ago
stonogo|1 year ago
cyanydeez|1 year ago
// TODO: search
adamgordonbell|1 year ago
pr337h4m|1 year ago
https://hexdocs.pm/google_api_content_warehouse/0.4.0/api-re...
zarathustreal|1 year ago
pembrook|1 year ago
This also explains why it's impossible for incumbents to unseat the winners in many search categories -- because they've literally been picked as the winners by humans at Google.
Looking at my Twitter/X feed, I also see an oddly similar dynamic. Certain accounts appear to have been manually boosted, showing up all the time -- whereas others posting even the same exact content will never appear.
Silicon valley will loudly tell you all about how wonderful they are at "democratizing," however, if you look under the surface it appears they're just hand picking the winners.
trogdor|1 year ago
Is there evidence of that in the leaked documents?
alun|1 year ago
8note|1 year ago
throwaway743|1 year ago
jgalt212|1 year ago
renegade-otter|1 year ago
SadCordDrone|1 year ago
Havoc|1 year ago
StevenNunez|1 year ago
unknown|1 year ago
[deleted]
dentemple|1 year ago
eitland|1 year ago
I understand some of this is a direct contradiction of things they have said in court previously?
sharpshadow|1 year ago
[deleted]
ChrisArchitect|1 year ago
[deleted]
iamacyborg|1 year ago
anynimous123|1 year ago
[deleted]
Aldipower|1 year ago
beejiu|1 year ago
ozehlaw|1 year ago
unknown|1 year ago
[deleted]