The solution proposed by Kagi—separate the search index from the rest of Google—seems to make the most sense. Kagi explains it more here: https://blog.kagi.com/dawn-new-era-search
Google has two interlocked monopolies, one is the search index and the other is their advertising service. We often joked that if Google reasonable and non-discriminatory priced access to their index, both to themselves and to others, AND they allowed someone to put what ever ads they wanted on those results. That change the landscape dramatically.
Google would carve out their crawler/indexer/ranker business and sell access to themselves and others which would allow that business an income that did NOT go back to the parent company (had to be disbursed inside as capex or opex for the business).
Then front ends would have a good shot, DDG for example could front the index with the value proposition of privacy. Someone else could front the index with a value proposition of no-ads ever. A third party might front that index attuned to specific use cases like literature search.
Ie. Knowing which users clicked which search results.
Without the click stream, one cannot build or even maintain a good ranker. With a larger click stream from more users, one can make a better ranker, which in turn makes the service better so more users use it.
End result: monopoly.
The only solution is to force all players to share click stream data with all others.
> Google has two interlocked monopolies, one is the search index
The index is the farthest thing from a monopoly Google has - anyone can recreate it. Heck, you can even just download Commoncrawl to get a massive head start.
Then why do we see all of these alt search engines and SEO services building out independent indexes? Why don't the competitors cooperate in this fashion already?
This sounds a solution contrived to advantage companies that want access to this data rather than an actual economically valid business model. If building an index and selling access to it is a viable business, then why isn't someone doing it already? There's minimal barrier to entry. Blekko has an index. Are you selling access to it for profit?
This just in: small search engine company thinks it's a great idea for small search engine companies to have the same search index as Google.
Also, I love this bit: "[Google's] search results are of the best quality among its advertising-driven peers." I can just feel the breath of the guy who jumped in to say "wait, you can't just admit that Google's results are better than Kagi's! You need to add some sorta qualifier there that doesn't apply to us."
Kagi is just a meta search engine. They are already using Googles search index. They just find it too expensive. Guess they need to show ads to pay for the searches.
Crawling the internet is a natural monopoly. Nobody wants an endless stream of bots crawling their site, so googlebot wins because they’re the dominant search engine.
It makes sense to break that out so everyone has access to the same dataset at FRAND pricing.
My heart just wants Google to burn to the ground, but my brain says this is the more reasonable approach.
This is similar to the natural monopoly of root DNS servers (managed as a public good). There is no reason more money couldn't go into either Common Crawl, or something like it. The Internet Archive can persist the data for ~$2/GB in perpetuity (although storing it elsewhere is also fine imho) as the storage system of last resort. How you provide access to this data is, I argue, similar to how access to science datasets is provided by custodian institutions (examples would be NOAA, CERN, etc).
Build foundations on public goods, very broadly speaking (think OSI model, but for entire systems). This helps society avoid the grasp of Big Tech and their endless desire to build moats for value capture.
Of all the bad ideas I've heard of where to slice Google to break it up, this... Is actually the best idea.
The indexer, without direct Google influence, is primarily incentivized to play nice with site administrators. This gives them reasons to improve consideration of both network integrity and privacy concerns (though Google has generally been good about these things, I think the damage is done regarding privacy that the brand name is toxic, regardless of the behaviors).
A caching proxy costs you almost nothing and will serve thousands of requests per second on ancient hardware. Actually there's never been a better time in the history of the Internet to have competing search engines since there's never been so much abundance of performance, bandwidth, and software available at historic low prices or for free.
Google search is a monopoly not because of crawling. It's because of the all the data it knows about website stats and user behavior. Original Google idea of ranking based on links doesn't work because it's too easily gamed. You have to know what websites are good based on user preferences and that's where you need to have data. It's impossible to build anything similar to Google without access to large amounts of user data.
> so googlebot wins because they’re the dominant search engine.
I think it's also important to highlight that sites explicitly choose which bots to allow in their robots.txt files, prioritizing Google which reinforces its position as the de-facto monopoly. Even when other bots are technically able to crawl them.
> Crawling the internet is a natural monopoly. Nobody wants an endless stream of bots crawling their site,
Companies want traffic from any source they can get. They welcome every search engine crawler that comes along because every little exposure translates to incremental chances at revenue or growing audience.
I doubt many people are doing things to allow Googlebot but also ban other search crawlers.
> My heart just wants Google to burn to the ground
I think there’s a lot of that in this thread and it’s opening the door to some mental gymnastics like the above claim about Google being the only crawler allowed to index the internet.
Are sites really that averse to having a few more crawlers than they already do? It would seem that it’s only a monopoly insofar as it’s really expensive to do and almost nobody else thinks they can recoup the cost.
Assuming the simplified diagram of Google’s architecture, sure, it looks like you’re just splitting off a well-isolated part, but it would be a significant hardship to do it in reality.
Why not also require Apple to split off only the phone and messaging part of its iPhone, Meta to split off only the user feed data, and for the U.S. federal government to run only out of Washington D.C.?
This isn’t the breakup of AT&T in the early 1980s where you could say all the equipment and wiring just now belongs to separate entities. (It wasn’t that simple, but it wasn’t like trying to extract an organ.)
I think people have to understand that and know that what they’re doing is killing Google, and it was already on its way into mind-numbed enterprise territory.
> Apple to split off only the phone and messaging part of its iPhone
Ooh, can we? My wife is super jealous of my ability to install custom apps for phone calls and messaging on Android, it'd be great if Apple would open theirs up to competition. Competition in the SMS app space would also likely help break up the usage of iMessage as a tool to pressure people into getting an iPhone so they get the blue bubble.
You jest, but splitting out just certain Internet Explorer features was part of the Microsoft antitrust resolution. It's what made Chrome's ascendancy possible.
I mean it's just data. You can just copy it and hand it over to a newly formed competing entity.
You're not even really dealing with any of these shared infrastructure public property private property merged infrastructure issues.
Yeah sure. There's mountains of racks of servers, but those aren't that hard to get tariffs TBD.
I think it'll be interesting just to try and find some collection of ex Google execs who had actually like to go back to the do no evil days, and just hand them a copy of all the data.
I simply don't think we have the properly and elected set of officials to implement antitrust of any scale. DOJ is now permanently politicized and corrupt, and citizens United means corps can outspend "the people" lavishly.
Antitrust would mean a more diverse and resilient supply chain, creativity, more employment, more local manufacturing, a reversal of the "awful customer service" as a default, better prices, a less corrupt government, better products, more economic mobility, and, dare I say it, more freedom.
Actually, let me expound upon the somewhat nebulous idea of more freedom. I think we all hear about Shadow banning or outright banning with utter silence and no appeals process for large internet companies that have a complete monopoly on some critical aspect of Internet usage.
If these companies enabled by their cartel control, decide they don't like you or are told by a government not to like you, it is approaching a bigger burden as being denied the ability to drive.
Not a single one of those is something oligarchs or a corporatocracy has the slightest interest in
This solution would also yield search engines that will actually be useful and powerful like old Google search was. They have crippled it drastically over the years. Used to be I could find exact quotes of forum posts from memory verbatim. I can't do that on Google or YouTube anymore. It's really dumbed down and watered down.
I feel like there's some conceptual drift going on in Kagi's blog post wrt their proposed remedy.
They argue that the search index is an essential facility, and per their link "The essential facilities doctrine attacks a form of exclusionary conduct by which an undertaking controls the conditions of access to an asset forming a ‘bottleneck’ for rivals to compete".
But unlike physical locations where bridges/ports can be built, the ability to crawl the internet is not excludable by Google.
They do argue that the web is not friendly to new crawlers, but what Kagi wants is not just the raw index itself, but also all the serving/ranking built on top of it so that they do not have to re-engineer it themselves.
It's also worth noting that Bing exists, and presumably has it's own index of the web and no evidence has been presented that the raw index content itself is the reason that Bing is not competitive.
That's like asking the foxes how the farmer should manage his chickens. Kagi is a (wannabe) competitor. Likewise, YC's interest here is in making money by having viable startups and having them acquired.
I also don't think crawling the Web is the hard part. It's extraordinarily easy to do it badly [1] but what's the solution here? To have a bunch of wannabe search engines crawl Google's index instead?
I've thought about this and I wonder if trying to replicate a general purpose search engine here is the right approach or not. Might it not be easier to target a particular vertical, at least to start with? I refuse to believe Google cannot be bested in every single vertical or that the scale of the job can't be segmented to some degree.
Googles c suite is clearly not thinking ahead here. They could have helped to slow down the anti-trust lawsuits by opening up their search index to whichever AI company wants to pay for it. Web crawling is expensive, and lots of companies are spending wild amounts of money on it. There is a very clear market arbitrage opportunity between the cost of crawling the web and Google's cost of serving up their existing data.
Woudl the search index contain only raw data about the websites? Or would some sort of ranking be there?
If it's teh latter, its a neat way to ask a company to sell their users data to a third party because any kind of ranking comes via aggregation of users' actions. Without involving any user consent at all.
Then you'd just end up with all the ads being scams, and people not wanting to search on Google, because all the top results are scams instead of things they might actually be interested in that are not scams.
Separating the index creates a commodity data layer that preserves Google's crawling investment while enabling innovation at the ranking/interface layer, similar to how telecom unbundling worked for ISPs.
It's such a ridiculous proposal that would completely destroy Google's business. If that's the goal fine, but let's not pretend that any of those remedies are anything beyond a death sentence.
If they're dominating or one of only two or three important options in multiple other areas and the index is the only reason... I mean, that's a strong argument both that they're monopolists and that they're terrible at allocating the enormous amount of capital they have. That's really the only thing keeping them around? All their other lines of business collectively aren't enough to keep them alive? Yikes, scathing indictment.
> It's such a ridiculous proposal that would completely destroy Google's business.
it won't. My bet is that bing and some other indexes are 95% Ok for average Joe. But relevance ranking is much tougher problem, and "google.com" is household brand with many other functions(maps, news, stocks, weather, knowledge graph, shopping, videos), and that's what is foundation of google monopoly.
I think this shared index thing will actually kill competition even more, since every players will use only index owned by google now.
I mean, they're still going to be the number 1 name in adtech and analytics. And they're still gonna have pretty decent personalized ads because of analytics.
Plus, that just one part of their business. There's also Android, which is a money printing machine with the Google Store (although that's under attack too).
ChuckMcM|9 months ago
Google has two interlocked monopolies, one is the search index and the other is their advertising service. We often joked that if Google reasonable and non-discriminatory priced access to their index, both to themselves and to others, AND they allowed someone to put what ever ads they wanted on those results. That change the landscape dramatically.
Google would carve out their crawler/indexer/ranker business and sell access to themselves and others which would allow that business an income that did NOT go back to the parent company (had to be disbursed inside as capex or opex for the business).
Then front ends would have a good shot, DDG for example could front the index with the value proposition of privacy. Someone else could front the index with a value proposition of no-ads ever. A third party might front that index attuned to specific use cases like literature search.
It would be a very different world.
londons_explore|9 months ago
Ie. Knowing which users clicked which search results.
Without the click stream, one cannot build or even maintain a good ranker. With a larger click stream from more users, one can make a better ranker, which in turn makes the service better so more users use it.
End result: monopoly.
The only solution is to force all players to share click stream data with all others.
mike_d|9 months ago
The index is the farthest thing from a monopoly Google has - anyone can recreate it. Heck, you can even just download Commoncrawl to get a massive head start.
indolering|9 months ago
fallingknife|9 months ago
nashashmi|9 months ago
CobrastanJorji|9 months ago
Also, I love this bit: "[Google's] search results are of the best quality among its advertising-driven peers." I can just feel the breath of the guy who jumped in to say "wait, you can't just admit that Google's results are better than Kagi's! You need to add some sorta qualifier there that doesn't apply to us."
sfpotter|9 months ago
tobias3|9 months ago
mullingitover|9 months ago
It makes sense to break that out so everyone has access to the same dataset at FRAND pricing.
My heart just wants Google to burn to the ground, but my brain says this is the more reasonable approach.
toomuchtodo|9 months ago
This is similar to the natural monopoly of root DNS servers (managed as a public good). There is no reason more money couldn't go into either Common Crawl, or something like it. The Internet Archive can persist the data for ~$2/GB in perpetuity (although storing it elsewhere is also fine imho) as the storage system of last resort. How you provide access to this data is, I argue, similar to how access to science datasets is provided by custodian institutions (examples would be NOAA, CERN, etc).
Build foundations on public goods, very broadly speaking (think OSI model, but for entire systems). This helps society avoid the grasp of Big Tech and their endless desire to build moats for value capture.
shadowgovt|9 months ago
The indexer, without direct Google influence, is primarily incentivized to play nice with site administrators. This gives them reasons to improve consideration of both network integrity and privacy concerns (though Google has generally been good about these things, I think the damage is done regarding privacy that the brand name is toxic, regardless of the behaviors).
oceanplexian|9 months ago
How so?
A caching proxy costs you almost nothing and will serve thousands of requests per second on ancient hardware. Actually there's never been a better time in the history of the Internet to have competing search engines since there's never been so much abundance of performance, bandwidth, and software available at historic low prices or for free.
hkpack|9 months ago
Thus being even slightly in front of others is reinforced and the gap only widens.
tananaev|9 months ago
wslh|9 months ago
I think it's also important to highlight that sites explicitly choose which bots to allow in their robots.txt files, prioritizing Google which reinforces its position as the de-facto monopoly. Even when other bots are technically able to crawl them.
1vuio0pswjnm7|9 months ago
Aurornis|9 months ago
Companies want traffic from any source they can get. They welcome every search engine crawler that comes along because every little exposure translates to incremental chances at revenue or growing audience.
I doubt many people are doing things to allow Googlebot but also ban other search crawlers.
> My heart just wants Google to burn to the ground
I think there’s a lot of that in this thread and it’s opening the door to some mental gymnastics like the above claim about Google being the only crawler allowed to index the internet.
unknown|9 months ago
[deleted]
mattmaroon|9 months ago
rubitxxx2|9 months ago
Why not also require Apple to split off only the phone and messaging part of its iPhone, Meta to split off only the user feed data, and for the U.S. federal government to run only out of Washington D.C.?
This isn’t the breakup of AT&T in the early 1980s where you could say all the equipment and wiring just now belongs to separate entities. (It wasn’t that simple, but it wasn’t like trying to extract an organ.)
I think people have to understand that and know that what they’re doing is killing Google, and it was already on its way into mind-numbed enterprise territory.
lolinder|9 months ago
Ooh, can we? My wife is super jealous of my ability to install custom apps for phone calls and messaging on Android, it'd be great if Apple would open theirs up to competition. Competition in the SMS app space would also likely help break up the usage of iMessage as a tool to pressure people into getting an iPhone so they get the blue bubble.
gamblor956|9 months ago
AtlasBarfed|9 months ago
You're not even really dealing with any of these shared infrastructure public property private property merged infrastructure issues.
Yeah sure. There's mountains of racks of servers, but those aren't that hard to get tariffs TBD.
I think it'll be interesting just to try and find some collection of ex Google execs who had actually like to go back to the do no evil days, and just hand them a copy of all the data.
I simply don't think we have the properly and elected set of officials to implement antitrust of any scale. DOJ is now permanently politicized and corrupt, and citizens United means corps can outspend "the people" lavishly.
Antitrust would mean a more diverse and resilient supply chain, creativity, more employment, more local manufacturing, a reversal of the "awful customer service" as a default, better prices, a less corrupt government, better products, more economic mobility, and, dare I say it, more freedom.
Actually, let me expound upon the somewhat nebulous idea of more freedom. I think we all hear about Shadow banning or outright banning with utter silence and no appeals process for large internet companies that have a complete monopoly on some critical aspect of Internet usage.
If these companies enabled by their cartel control, decide they don't like you or are told by a government not to like you, it is approaching a bigger burden as being denied the ability to drive.
Not a single one of those is something oligarchs or a corporatocracy has the slightest interest in
486sx33|9 months ago
giancarlostoro|9 months ago
dang|9 months ago
Dawn of a new era in Search: Balancing innovation, competition, and public good - https://news.ycombinator.com/item?id=41393475 - Aug 2024 (79 comments)
Eridrus|9 months ago
They argue that the search index is an essential facility, and per their link "The essential facilities doctrine attacks a form of exclusionary conduct by which an undertaking controls the conditions of access to an asset forming a ‘bottleneck’ for rivals to compete".
But unlike physical locations where bridges/ports can be built, the ability to crawl the internet is not excludable by Google.
They do argue that the web is not friendly to new crawlers, but what Kagi wants is not just the raw index itself, but also all the serving/ranking built on top of it so that they do not have to re-engineer it themselves.
It's also worth noting that Bing exists, and presumably has it's own index of the web and no evidence has been presented that the raw index content itself is the reason that Bing is not competitive.
jmyeet|9 months ago
I also don't think crawling the Web is the hard part. It's extraordinarily easy to do it badly [1] but what's the solution here? To have a bunch of wannabe search engines crawl Google's index instead?
I've thought about this and I wonder if trying to replicate a general purpose search engine here is the right approach or not. Might it not be easier to target a particular vertical, at least to start with? I refuse to believe Google cannot be bested in every single vertical or that the scale of the job can't be segmented to some degree.
[1]: https://stackoverflow.blog/2009/06/16/the-perfect-web-spider...
9cb14c1ec0|9 months ago
ankit219|9 months ago
If it's teh latter, its a neat way to ask a company to sell their users data to a third party because any kind of ranking comes via aggregation of users' actions. Without involving any user consent at all.
onlyrealcuzzo|9 months ago
ethan_smith|9 months ago
luckydata|9 months ago
alabastervlog|9 months ago
riku_iki|9 months ago
it won't. My bet is that bing and some other indexes are 95% Ok for average Joe. But relevance ranking is much tougher problem, and "google.com" is household brand with many other functions(maps, news, stocks, weather, knowledge graph, shopping, videos), and that's what is foundation of google monopoly.
I think this shared index thing will actually kill competition even more, since every players will use only index owned by google now.
const_cast|9 months ago
I mean, they're still going to be the number 1 name in adtech and analytics. And they're still gonna have pretty decent personalized ads because of analytics.
Plus, that just one part of their business. There's also Android, which is a money printing machine with the Google Store (although that's under attack too).
ketzo|9 months ago
Disposal8433|9 months ago
[deleted]