I've been using Kagi for a while (almost two years now!) and it's been nothing but excellent!
Lenses are very useful (Reddit lens is on every second search), and I personally really like the AI features they are working on.
The quick assist triggered by a question mark at the end of a search query which makes a quick ai-generated summary of the few top results is something I use constantly.
The new more advanced assistant which is able to do searches, which can also be constrained to lenses, and lets you pick an arbitrary model, is also excellent, and basically means I don't need a chatgpt/claude subscription, as Kagi covers it very well.
All in all, great product which I'm happy to pay for.
> The Google Search Index is a unique and irreplaceable resource within the digital ecosystem. Mandating fair access to it or treating it as an essential facility could address the core issues...
The article estimates the Google Search Index at 12.5PB. If Kagi thinks that is a big enough moat to be the primary target then, well, I suppose they should know. But I'm also skeptical. You could fit that on about 50 Hetzner SX295, so about $20k/month. Plus the cost of gathering the data. It is surely a huge resource.
But weighed against the combination of Google Search + AdWords + Android + YouTube + Chrome, all in a single company? To me a 12.5PB search index feels like small change in comparison.
> The article estimates the Google Search Index at 12.5PB.
I realize there was a mistake with the estimated number (thanks for pointing out, should be closer to 180 PB for raw crawl data). Since this is speculative and also does not account for other data needed to actually rank pages, hardware to do it in under 500ms at a scale of billions of queries per day and thus can be misleading in terms of true effort to do it, I edited that datapoint out of the article.
You are right, just crawling large number of pages (millions even billions) is indeed straightforward (eg [1]), it is about creating a searchable index of the web scale that has certain quality level that is simply impossible to do anymore for many reasons that would require another article to explain. Microsoft spent $100bn and last 20 years by their own account trying to match it and most people agree it is still not even close. At some point you reach diminishing returns. To use the analogy from the article, it is akin to someone trying to rebuild all of the US railroad network today. Sounds plausible, but not really in practice. That train has left the station in early 2000s.
This puts it in enough perspective for me to ask: why doesn’t a university create a public/open source search index? Seems like a way to get a ton of attention.
Moreover, archive.org has all the data and data storage capabilities many times over. What prevents them from creating an open source search engine?
I don't buy this number. Text-only common crawl is 20TB. Remove spam and dupes, you're around <10TB of current useful data. Which you can parse and index on a single server nowadays.
It's the full Google index history with full HTML that is probably 12PB, but the useful part of the search engine is much smaller.
I assume that the major hurdle is not storing an equivalently-sized search index, but building one from scratch. Crawling takes time, and Google has had a many years head start.
I've been using (and occasionally paying) Kagi on and off for a couple of years now. I truly think they're building something interesting and valuable! While I haven't agreed with every product decision they've made, the founder is very good at both understanding his business and also explaining their decisions. This is a well crafted explainer of the search business and the monopoly case-- much better for sharing with less tech-savvy peers than most mainstream media explainers on this subject!
I think its instructive to look at the early history of Google and Facebook. In the early years they did not really turn on the ad revenue levers and just focused on increasing users (i.e. Don't Be Evil) - until a decade after offering their respective services.
Similarly Netflix is just now starting the ad revenue model after years of only subscription based services.
Eventually the temptation for multiple sources of revenue (i.e. subscription AND advertising) will likely be too great due to:
- IPO and Wall Street demands net income growth (i.e. FB/Google)
- Private Equity buys the company and needs to pay back leveraged debt
- The number of customers willing to offer up a credit card for Search stagnates and a lower cost ad tier appears and the ad infrastructure that is built is applied to the paid tiers
> Google has built a massive index of the internet that covers close to 100% of the accessible web.
While their index (of other peoples stuff) is enormous it far from includes everything. It is easy to disqualify and people would be screaming if content farms would be included. What even is a content farm nowadays? One can return a reasonable article for any query with llms rich in links to other pages that don't exist but could be indexed and are part of the accessible web
If you make a new website with a few thousand pages and a few thousand images it takes quite a while for google to pick up the entire thing, if it even bothers to.
google tries to fill the result page with a small subset of websites. A good thing for users most of the time and the easiest ad money but horrible for new players.
it use to be quite common for bloggers (and others) to follow everything written about them or of interest. google (blog search) and technorati were very useful for that kind of discovery.
The average user might never have noticed that but when it was killed off the www stopped being a community.
We can pretend the index is still there. If you cant get to it it's s much like the llm content.
This essentially advocates for the same thing defined in Cory Doctorow's The Internet Con: How to Seize the Means of Computation. That point being, requiring open protocols from big tech will enable competition and innovation. This will return the creative inspiration to the technologists. I completely agree with it, and I hope we are reaching an inflection point where walled data gardens are cracked open.
Who knows what websites and pages are even on the web?
There's no index to the web that I know of apart from Google and DuckDuckGo and maybe this Kagi thing.
I want to explore the web - surely search isn't the only way to use the web?
I imagine it could be fun to explore the web, lists and graphs of interest where I can hop from here to there via list of links or graphs or nodes or something?
There have been some websites in the past that allowed one to browse www content by IP address, covering what seemed to be the full range of IPv4 address space. For example, a page with a list of IP address ranges where each address range is a hyperlink. One could then drill down by following hyperlinks to a specific IP address and view whatever was hosted at that address (default host in the case of virtual hosting). Not sure why these websites do not persist. Quite useful. IMHO.
DNS zone files are a decent starting point for exploring the web. Not every registered domain name has an associated website but most do. The largest zone files are available to the public for free.
While not quite what you're looking for, Kagi has a "Small Web" feed of sites that are semi-curated blogs. [1][2] I don't know how often it is updated, but I like to poke around every now and then see what's going on in people's corners of the internet.
Google indexes widely. DuckDuckGo and Kagi have small specialised indexes and as such rely on the larger indexes like Google, Bing and Mojeek. DuckDuckGo used to use Yandex. More information here: https://www.searchenginemap.com/
I think Kagi is correct and that the way we explore information on the internet will look very different in X years with all the changes LLMs will bring. I think the real question will be what will it look like.
I don't think it looks like search today. Google got where they were because they were 10x better than everything else and had an experience focusing on what mattered at the time. I don't think the 10x experience will look like ten blue links. I don't know what that next experience is, but I'll know it when I see it.
> Apple has stated that Bing does not match Google’s search result quality, and they are unwilling to compromise on user experience by offering subpar results.
I wouldn't take this statement at face value. This is most likely a BS PR excuse for Apple to maintain their current deal with Google. I wouldn't expect anything less from any large corporation looking to protect $20 billion in annual revenue.
Universally every single time I've used bing, including this week. My response was to scroll through the entire first oage of results, swear, open google and with the exact same search params what I wanted was the first result.
Perplexity lets a firm switch on SSO and give perplexity to employees without a big barrier to entry. So, we bought it for our employees and if they use it great, if they don't great. Even though we're a small startup, this is true of almost any SaaS we find that lets us control the login to stay regulation compliant. If you like it, and show how it helps your day, and the SaaS let us control the login, we'll "just pay for it".
We will not, however, pay some four or five figure SSO tax for every SaaS. We'd be bankrupted.
Kagi should do this, or at least enable domain-specific OIDC/Oauth2 — the ubiquitous "Sign in with" or "Continue with" buttons like http://xsplit.com/user/auth or http://id.atlassian.com/login since MSFT and Google accounts hit almost all businesses — and then just bill the same as the individual pro price + usage pricing.
As it stands, we reimburse employees who buy Kagi individually, this costs us more than the cost of Kagi, and means it's only one off.
PS. Don't get me started on the MacOS and iOS apps that have no retail price version available. Apple provides no way for a firm to provide employees with IAP subscription apps whether BYOD or managed devices. We can, and do, provide any retail priced app for both BYOD and MDM. It shows up in a catalog on the device, people install it, you get a retail sale. Thank you to those devs who make a retail version available, even if its 2x - 5x the annual cost. Empowering employees with apps is a no brainer if devs just let firms pay them to do it.
I admit I haven’t used Perplexity much, but isn’t that already mostly covered by the researcher[0] and newly added assistant that can do web searching [1]?
As someone who hasn’t used paid search services, what sort of problems has this solved for other HN users that make it worth it? How does it compare to “AI” based search tools like Perplexity?
For me, Kagi is super fast and provides high quality, customizable results. Web search Just Works. When I use a new computer/browser and accidentally search with Google, it is a viscerally unpleasant surprise.
As for perplexity - I got a free year of Perplexity because I bought a Rabbit R1. I tried it, wasn’t impressed. I use Kagi’s AI assistant all the time. It’s my primary way of getting information from the web. I just type a a free form question into my address bar, append !expert for general questions or !code for technical ones, and seconds later my question is answered and I’m back to work.
I use the internet basically as an extension of my brain. There's very little barrier between having a thought and seeking information. Search is the usual way those two come together.
Google, these days, seems to mostly ignore whatever I've tried to search for and instead return results that I'd call "more popular". So the top results are mostly generic, useless results and below that it's mostly blog spam or wildly unrelated things.
This is especially bad when I'm looking for specific technical documentation or trying to understand unusual or obscure problems. Usually it's returning nothing useful at all.
Kagi returns results actually related to what I'm looking for more often than not.
The thing that convinced me to pay for it was a single search. I kept hitting _something_ that was causing the Apple TV to stop showing how much time was remaining in a show and instead show something else.
I went to Kagi and searched "netflix apple TV showing wrong time remaining" (my incomplete but best understanding of the problem at the time). Kagi surfaced a result that explained what this was and how it was getting triggered as the fourth result.
I went back to Google and searched the same. Top result was "If you can't change the time or time zone on your Apple Device" from Apple. Second was "Netflix audio is out of sync" from Netflix. With the benefit of knowing what the answer was, I did find a single relevant result about 25 results down mixed in with some blog spam on removing a show from "Continue Watching" on Disney Plus, listicles on hidden ways to make the Apple TV app on your phone even better, and a link to a Google Books copy of a 2008 Men's Health Magazine (?!).
Every time I accidentally end up back on Google it's... jarring to say the least.
Now it dawns on me that in this new era I can write an article for the public good and by balancing the wording I am able to place it before the 3852 competing public good articles on the search results page. lets innovate
This is an extremely disappointing post. In the past, I’ve enthusiastically supported and advocated for Kagi.
But Kagi advocating for using force to destroy its competitors is completely unacceptable to me and an admission that they do not believe they have a viable product.
Antitrust law is arbitrary and evil. If you make more money than your competitors, you have undue market power. If you price below your competitors, you are dumping. If you price the same as your competitors, you are colluding. The whole thing is a naked power grab by politicians and inferior companies.
This is a sad day. Kagi is the best thing that’s happened to the internet in the past decade. And now I have to stop my auto renew.
I don't see how Kagi are advocating to destroy Google and especially competitors. They write about potential actions government might do to make the field more competitive.
Keep in mind that it's the Google that was government funded when it got started via NSF and university system. Also, a ton of subsidies, tax breaks for their subsidiaries, direct payments from government (e.g. google cloud gov contracts, military recruitment ads on youtube...).
They're not advocating for the destruction of competitors. They're arguing FOR competitors. Google has already been ruled an illegal monopoly, and now it's time to figure out what to do about it. Kagi is saying that rather than split up products, require the protocols to be open and usable for all. That's it.
> But Kagi advocating for using force to destroy its competitors is completely unacceptable to me
Everyone is entitled to their own interpretation, but that is not what the article advocates at all. The article is about what is best for the user given the circumstances, where all other proposed remedies have focused on how to hurt Google, which article argues to be counter-productive.
The ruling has already been made and a remedy will be chosen whether we agree with the ruling or not - so which one is the best for the users? The solution that is proposed in the article would actually mean increased competition in the space, including to Kagi.
cube2222|1 year ago
Lenses are very useful (Reddit lens is on every second search), and I personally really like the AI features they are working on.
The quick assist triggered by a question mark at the end of a search query which makes a quick ai-generated summary of the few top results is something I use constantly.
The new more advanced assistant which is able to do searches, which can also be constrained to lenses, and lets you pick an arbitrary model, is also excellent, and basically means I don't need a chatgpt/claude subscription, as Kagi covers it very well.
All in all, great product which I'm happy to pay for.
baybayblonde|1 year ago
Fire-Dragon-DoL|1 year ago
adamcharnock|1 year ago
The article estimates the Google Search Index at 12.5PB. If Kagi thinks that is a big enough moat to be the primary target then, well, I suppose they should know. But I'm also skeptical. You could fit that on about 50 Hetzner SX295, so about $20k/month. Plus the cost of gathering the data. It is surely a huge resource.
But weighed against the combination of Google Search + AdWords + Android + YouTube + Chrome, all in a single company? To me a 12.5PB search index feels like small change in comparison.
NB: Happy Kagi-paying customer here.
freediver|1 year ago
I realize there was a mistake with the estimated number (thanks for pointing out, should be closer to 180 PB for raw crawl data). Since this is speculative and also does not account for other data needed to actually rank pages, hardware to do it in under 500ms at a scale of billions of queries per day and thus can be misleading in terms of true effort to do it, I edited that datapoint out of the article.
You are right, just crawling large number of pages (millions even billions) is indeed straightforward (eg [1]), it is about creating a searchable index of the web scale that has certain quality level that is simply impossible to do anymore for many reasons that would require another article to explain. Microsoft spent $100bn and last 20 years by their own account trying to match it and most people agree it is still not even close. At some point you reach diminishing returns. To use the analogy from the article, it is akin to someone trying to rebuild all of the US railroad network today. Sounds plausible, but not really in practice. That train has left the station in early 2000s.
[1] https://michaelnielsen.org/ddi/how-to-crawl-a-quarter-billio...
IgorPartola|1 year ago
Moreover, archive.org has all the data and data storage capabilities many times over. What prevents them from creating an open source search engine?
arnaudsm|1 year ago
It's the full Google index history with full HTML that is probably 12PB, but the useful part of the search engine is much smaller.
dmonitor|1 year ago
unknown|1 year ago
[deleted]
ldayley|1 year ago
Edit: wording
Edit 2: Can you imagine a world where Google's Internet Search Index is legally considered an "Essential Facility"!? https://law.stanford.edu/publications/essential-platforms/
somethoughts|1 year ago
Similarly Netflix is just now starting the ad revenue model after years of only subscription based services.
Eventually the temptation for multiple sources of revenue (i.e. subscription AND advertising) will likely be too great due to:
- IPO and Wall Street demands net income growth (i.e. FB/Google)
- Private Equity buys the company and needs to pay back leveraged debt
- The number of customers willing to offer up a credit card for Search stagnates and a lower cost ad tier appears and the ad infrastructure that is built is applied to the paid tiers
pennybanks|1 year ago
throwaway14356|1 year ago
While their index (of other peoples stuff) is enormous it far from includes everything. It is easy to disqualify and people would be screaming if content farms would be included. What even is a content farm nowadays? One can return a reasonable article for any query with llms rich in links to other pages that don't exist but could be indexed and are part of the accessible web
If you make a new website with a few thousand pages and a few thousand images it takes quite a while for google to pick up the entire thing, if it even bothers to.
google tries to fill the result page with a small subset of websites. A good thing for users most of the time and the easiest ad money but horrible for new players.
it use to be quite common for bloggers (and others) to follow everything written about them or of interest. google (blog search) and technorati were very useful for that kind of discovery.
The average user might never have noticed that but when it was killed off the www stopped being a community.
We can pretend the index is still there. If you cant get to it it's s much like the llm content.
pierrefermat1|1 year ago
rychco|1 year ago
- I can blacklist low-value domains (such as geeksforgeeks) that dominate the top of many programming searches.
- I can increase/decrease the priority of domains or pin domains to the top of searches, such as official documentation for languages/libraries.
- I can use “Lenses” to filter results for programming/academic/forum results.
abcdefg12|1 year ago
baggachipz|1 year ago
chrisweekly|1 year ago
andrewstuart|1 year ago
There's no index to the web that I know of apart from Google and DuckDuckGo and maybe this Kagi thing.
I want to explore the web - surely search isn't the only way to use the web?
I imagine it could be fun to explore the web, lists and graphs of interest where I can hop from here to there via list of links or graphs or nodes or something?
Does anyone know of anything like this?
1vuio0pswjnm7|1 year ago
DNS zone files are a decent starting point for exploring the web. Not every registered domain name has an associated website but most do. The largest zone files are available to the public for free.
AndroidKitKat|1 year ago
[1] - https://blog.kagi.com/small-web [2] - https://kagi.com/smallweb
ColinHayhurst|1 year ago
pbronez|1 year ago
https://seirdy.one/posts/2021/03/10/search-engines-with-own-...
firecall|1 year ago
There was an attempt to push a return of Webrings I think...
But funnily enough, whilst browsing this thread, https://news.ycombinator.com/item?id=41389642
I commented to my colleagues:
Remember when people would find good websites and share them!
madrox|1 year ago
I don't think it looks like search today. Google got where they were because they were 10x better than everything else and had an experience focusing on what mattered at the time. I don't think the 10x experience will look like ten blue links. I don't know what that next experience is, but I'll know it when I see it.
icar|1 year ago
bugtodiffer|1 year ago
But instead they use it to build a browser no one wants and an email service no one needs.
Yet their search website is still broken here and there...
poikroequ|1 year ago
I wouldn't take this statement at face value. This is most likely a BS PR excuse for Apple to maintain their current deal with Google. I wouldn't expect anything less from any large corporation looking to protect $20 billion in annual revenue.
jpc0|1 year ago
Bing is objectively worse.
skinkestek|1 year ago
- Current Google was bad compared to original Google (it ignores my keywords, even if I use doublequotes and verbatim)
- DuckDuckGo and Bing managed to be worse
- Kagi is like old Google
moonlion_eth|1 year ago
freefaler|1 year ago
However I think the model will be changed to something more like Perplexity.ai
I've switched to Perplexity and for most of the searches it works better than Kagi.
They'd need to add something like this to survive in the long run, because for exploratory searches tools like Perplexity are really good.
Terretta|1 year ago
We will not, however, pay some four or five figure SSO tax for every SaaS. We'd be bankrupted.
Kagi should do this, or at least enable domain-specific OIDC/Oauth2 — the ubiquitous "Sign in with" or "Continue with" buttons like http://xsplit.com/user/auth or http://id.atlassian.com/login since MSFT and Google accounts hit almost all businesses — and then just bill the same as the individual pro price + usage pricing.
As it stands, we reimburse employees who buy Kagi individually, this costs us more than the cost of Kagi, and means it's only one off.
PS. Don't get me started on the MacOS and iOS apps that have no retail price version available. Apple provides no way for a firm to provide employees with IAP subscription apps whether BYOD or managed devices. We can, and do, provide any retail priced app for both BYOD and MDM. It shows up in a catalog on the device, people install it, you get a retail sale. Thank you to those devs who make a retail version available, even if its 2x - 5x the annual cost. Empowering employees with apps is a no brainer if devs just let firms pay them to do it.
cube2222|1 year ago
[0]: https://help.kagi.com/kagi/ai/assistant.html#research
[1]: https://kagi.com/changelog#4529
aDyslecticCrow|1 year ago
unknown|1 year ago
[deleted]
pbronez|1 year ago
https://www.audiowaveai.com/p/2626-dawn-of-a-new-era-in-sear...
blackeyeblitzar|1 year ago
pbronez|1 year ago
As for perplexity - I got a free year of Perplexity because I bought a Rabbit R1. I tried it, wasn’t impressed. I use Kagi’s AI assistant all the time. It’s my primary way of getting information from the web. I just type a a free form question into my address bar, append !expert for general questions or !code for technical ones, and seconds later my question is answered and I’m back to work.
nucleardog|1 year ago
Google, these days, seems to mostly ignore whatever I've tried to search for and instead return results that I'd call "more popular". So the top results are mostly generic, useless results and below that it's mostly blog spam or wildly unrelated things.
This is especially bad when I'm looking for specific technical documentation or trying to understand unusual or obscure problems. Usually it's returning nothing useful at all.
Kagi returns results actually related to what I'm looking for more often than not.
The thing that convinced me to pay for it was a single search. I kept hitting _something_ that was causing the Apple TV to stop showing how much time was remaining in a show and instead show something else.
I went to Kagi and searched "netflix apple TV showing wrong time remaining" (my incomplete but best understanding of the problem at the time). Kagi surfaced a result that explained what this was and how it was getting triggered as the fourth result.
I went back to Google and searched the same. Top result was "If you can't change the time or time zone on your Apple Device" from Apple. Second was "Netflix audio is out of sync" from Netflix. With the benefit of knowing what the answer was, I did find a single relevant result about 25 results down mixed in with some blog spam on removing a show from "Continue Watching" on Disney Plus, listicles on hidden ways to make the Apple TV app on your phone even better, and a link to a Google Books copy of a 2008 Men's Health Magazine (?!).
Every time I accidentally end up back on Google it's... jarring to say the least.
endisneigh|1 year ago
I do wonder how far one can get charging for search.
mediumsmart|1 year ago
abtinf|1 year ago
But Kagi advocating for using force to destroy its competitors is completely unacceptable to me and an admission that they do not believe they have a viable product.
Antitrust law is arbitrary and evil. If you make more money than your competitors, you have undue market power. If you price below your competitors, you are dumping. If you price the same as your competitors, you are colluding. The whole thing is a naked power grab by politicians and inferior companies.
This is a sad day. Kagi is the best thing that’s happened to the internet in the past decade. And now I have to stop my auto renew.
kingstoned|1 year ago
Keep in mind that it's the Google that was government funded when it got started via NSF and university system. Also, a ton of subsidies, tax breaks for their subsidiaries, direct payments from government (e.g. google cloud gov contracts, military recruitment ads on youtube...).
baggachipz|1 year ago
freediver|1 year ago
Everyone is entitled to their own interpretation, but that is not what the article advocates at all. The article is about what is best for the user given the circumstances, where all other proposed remedies have focused on how to hurt Google, which article argues to be counter-productive.
The ruling has already been made and a remedy will be chosen whether we agree with the ruling or not - so which one is the best for the users? The solution that is proposed in the article would actually mean increased competition in the space, including to Kagi.
Destiner|1 year ago
they are still trying to fight the google by building pretty much the same product
while perplexity is obviously in the lead by being ai-first
freediver|1 year ago