To be fair, if my search engine is anything to go on, about 0.5-1% of the requests I get are from human sources. The rest are from bots, and not like people who haven't found I have an API, but bots that are attempting to poison Google or Bing's query suggestions (even though I'm not backed by either). From what I've heard from other people running search engines, it looks the same everywhere.
I don't know what Google's ratio of human to botspam is, but given how much of a payday it would be if anyone were to succeed, I can imagine they're serving their fair number of automated requests.
Requiring a headless browser to automate the traffic makes the abuse significantly more expensive.
If it's such a common issue, I would've thought Google already ignored searches from clients that do not enable JavaScript when computing results?
Besides, you already got auto-blocked when using it in a slightly unusual way. Google hasn't worked on Tor since forever, and recently I also got blocked a few times just for using it through my text browser that uses libcurl for its network stack. So I imagine a botnet using curl wouldn't last very long either.
My guess is it had more to do with squeezing out more profit from that supposed 0.1% of users.
I run a semi-popular website hosting user-generated content, although it's not a search engine; the attacks on it have surprised me, and I've eventually had to put in the same kinds of restrictions on it.
I was initially very hesitant to restrict any kind of traffic, relying on ratelimiting IPs on critical endpoints that needed low friction, and captchas on the higher friction with higher intents, such as signup and password reset pages.
Other than that, I was very liberal with most traffic, making sure that Tor was unblocked, and even ending up migrating off Cloudflare's free tier to a paid CDN due to inexplicable errors that users were facing over Tor that were ultimately related to how they blocked some specific requests over Tor with 403, even though the MVPs on their community forums would never acknowledge such a thing.
Unfortunately, given that Tor is a free rotating proxy, my website got attacked on one of these critical, compute heavy endpoints through multiple exit nodes totaling ~20,000 RPS. I've reluctantly had to block Tor, and a few other paid proxy services discovered through my own research since then.
Another time, a set of human spammers distributed all over the world started sending out a large volume of spam towards my website; with something like 1,000,000 spam messages every day (I still feel this was an attack coordinated by a "competitor" of some sort, especially given a small percentage of messages entitled "I want to get paid for posting" or along those lines).
There was no meaningful differentiator between the spammers and legitimate users, they were using real Gmail accounts to sign up, analysis of their behaviours showed they were real users as opposed to simple or even browser-based automation, and the spammers were based out of the same residential IPs as legitimate users.
I, again, had to reluctantly introduce a spam filter on some common keywords, and although some legitimate users do get trapped from time to time, this was the only way I could get a handle on that problem.
I'm appalled by some of the discussions here. Was I "enshittifying" my website out of unbridled "greed"? I don't think so. But every time I come here, I find these accusations, which makes me think that as a website with technical users, we can definitely do better.
My impression is that there's less effort for them to go directly to headless browsers. There are several foot guns in using a raw HTML parsing lib and dispatching HTTP requests. People don't care about resource usage, spammers even less and many of them lack the skills.
Maybe you could require hashcash, so that people who wanted to do automated searches could do it at an expense comparable to the expense of a human doing a search manually. Or a cryptocurrency micropayment, though tooling around that is currently poor.
Useful list, thank you!
I much prefer surfing without JavaScript. If I need to enable it for something temporarily, that’s fine. I just don’t want to leave it enabled all the time.
With Google search’s new landing page for those without JavaScript enabled, even if you enable JavaScript and reload, it just gives you the same ‘fail’ page. No matter what you do at that point Google deletes your search string and you need to retype it.
Just changed my default search to DDG and I’ll be looking into Kagi.
I recently discovered how great the ChatGPT web search feature is. Returns live (!) results from the web and usually finds things that Google doesn't - mostly niche searches in natural language that G simply doesn't get.
Of course, it uses JavaScript, which doesn't help with the problem discussed here.
But I do think that Google is internally seeing a huge drop in usage which is why they're currently running for the money. We're going to see this all across their products soon enough (I'm thinking Gmail).
I've been experimenting with creating single-site browsers[1] for all websites I routinely visit, effectively removing navigational queries from search engines; between that and Claude being able to answer technical questions, it's remarkable how rarely I even use browsers for day-to-day tasks anymore (as in web views with tabs and url bars).
We've been using the web (as in documents interconnected with links between servers) for a great number of tasks it was never quite designed to solve, and the result has always been awkward. It's been very refreshing to move away from the web browser-search engine duo for these things.
For one, and it took me a while to notice what was off, but there are like no ads anymore, anywhere. Not because I use adblockers, but because I simply don't end up directed to places where there are ads. And let me tell you, if you've been away from that stuff for a while, and then come back, holy crap what a dumpster fire.
The web browser has been center stage for a long while, coasting on momentum and old habits, but it turns out it doesn't need to be, and if you work to get rid of it, you get a better and more enjoyable computing experience. Given how much better this feels, I can't help but feel we're in for a big shift in how computers are used.
[1] You can just launch 'chrome --app=url' to make one. Or use Electron if you want to customize the UI yourself.
can it find OLD articles? I generally don't like the idea of a search engine which requires me to be logged in to track my search history (and I do mostly use Google in incognito/private browser windows), but I might ignore that if it allows me to do the one thing that Google refuses to do on phones anymore (which might be a sign that they're gonna phase that out from desktop interfaces soon)..
I believe the main intent is to block SERP analysers, which track result positions by keywords. Not that it would help a lot with bot abuse, but will make regular SEO agency life harder and more expensive.
Last month Google have also enstricted YouTube policies which IMHO is a sign, that they are not reaching specific milestones and that'd definitely be reflected over the alphabet stocks
They are going to make Google search even more broken than it is already? Be my guest! Since they are an ads business, I guess they don't really care about their search any longer, or they have sniffed some potential to gather even more information on users using Google, if they require running JS for it to work. Who knows. But anyone valuing their privacy has long left anyway.
You could probably get it working with declarative shadow dom, streaming in the AI generated content at the end of the html document and slotting it into place. There are no doubt a lot of gotchas but at first glance it seems feasible. Here’s a demo I found of something like that: https://github.com/dgp1130/out-of-order-streaming
> Everyone I know under 25 has stopped using Google search altogether.
completely unhinged take. Everyone I know under 25, as someone under 25, uses Google search at least an order of magnitude more than they use AI querying.
Well, I read the HN headline and said to myself, I bet this requirement is pitched as "...to enhance the user experience...", and, yep, it's there.
That's akin with a response to some incident where companies "Take [user security etc.] seriously", when the immediate thought is, yeah, but if you did, that [thing] probably wouldn't have happened.
Dunno why I wrote all that - I don't use Google search, because I wanted to enhance (aka unenshitten) my search experience.
Honestly I wouldn't be surprised that if Google requires some Proof-of-work done on browser's host's CPU/GPU to validate search results and make it infeasible for bots therefore.
That brings up an interesting conundrum. If PoW were implemented, could known-valid (i.e. goodstanding for over a decade) accounts be switched over to PoS instead? Or paying accounts?
PoW could be written into infrequent pages such as the registration page and reset password page. It could run while the user fills in the form. I might implement this on some sites that get attacked.
marginalia_nu|1 year ago
I don't know what Google's ratio of human to botspam is, but given how much of a payday it would be if anyone were to succeed, I can imagine they're serving their fair number of automated requests.
Requiring a headless browser to automate the traffic makes the abuse significantly more expensive.
shiomiru|1 year ago
Besides, you already got auto-blocked when using it in a slightly unusual way. Google hasn't worked on Tor since forever, and recently I also got blocked a few times just for using it through my text browser that uses libcurl for its network stack. So I imagine a botnet using curl wouldn't last very long either.
My guess is it had more to do with squeezing out more profit from that supposed 0.1% of users.
supriyo-biswas|1 year ago
I was initially very hesitant to restrict any kind of traffic, relying on ratelimiting IPs on critical endpoints that needed low friction, and captchas on the higher friction with higher intents, such as signup and password reset pages.
Other than that, I was very liberal with most traffic, making sure that Tor was unblocked, and even ending up migrating off Cloudflare's free tier to a paid CDN due to inexplicable errors that users were facing over Tor that were ultimately related to how they blocked some specific requests over Tor with 403, even though the MVPs on their community forums would never acknowledge such a thing.
Unfortunately, given that Tor is a free rotating proxy, my website got attacked on one of these critical, compute heavy endpoints through multiple exit nodes totaling ~20,000 RPS. I've reluctantly had to block Tor, and a few other paid proxy services discovered through my own research since then.
Another time, a set of human spammers distributed all over the world started sending out a large volume of spam towards my website; with something like 1,000,000 spam messages every day (I still feel this was an attack coordinated by a "competitor" of some sort, especially given a small percentage of messages entitled "I want to get paid for posting" or along those lines).
There was no meaningful differentiator between the spammers and legitimate users, they were using real Gmail accounts to sign up, analysis of their behaviours showed they were real users as opposed to simple or even browser-based automation, and the spammers were based out of the same residential IPs as legitimate users.
I, again, had to reluctantly introduce a spam filter on some common keywords, and although some legitimate users do get trapped from time to time, this was the only way I could get a handle on that problem.
I'm appalled by some of the discussions here. Was I "enshittifying" my website out of unbridled "greed"? I don't think so. But every time I come here, I find these accusations, which makes me think that as a website with technical users, we can definitely do better.
palmfacehn|1 year ago
marcus0x62|1 year ago
kragen|1 year ago
ForHackernews|1 year ago
This seems like yet another example of Google and friends inviting the problem they're objecting to.
nilslindemann|1 year ago
Search engines which require JavaScript:
Google, Bing, Ecosia, Yandex, Qwant, Gibiru, Presearch, Seekr, Swisscows, Yep, Openverse, Dogpile, Waldo
Search engines which do not require JavaScript:
DuckDuckGo, Yahoo Search, Brave Search, Startpage, AOL Search, giveWater, Mojeek
yla92|1 year ago
spectre3d|1 year ago
With Google search’s new landing page for those without JavaScript enabled, even if you enable JavaScript and reload, it just gives you the same ‘fail’ page. No matter what you do at that point Google deletes your search string and you need to retype it.
Just changed my default search to DDG and I’ll be looking into Kagi.
unknown|1 year ago
[deleted]
niutech|1 year ago
anArbitraryOne|1 year ago
phoronixrly|1 year ago
fsflover|1 year ago
puttycat|1 year ago
Of course, it uses JavaScript, which doesn't help with the problem discussed here.
But I do think that Google is internally seeing a huge drop in usage which is why they're currently running for the money. We're going to see this all across their products soon enough (I'm thinking Gmail).
marginalia_nu|1 year ago
We've been using the web (as in documents interconnected with links between servers) for a great number of tasks it was never quite designed to solve, and the result has always been awkward. It's been very refreshing to move away from the web browser-search engine duo for these things.
For one, and it took me a while to notice what was off, but there are like no ads anymore, anywhere. Not because I use adblockers, but because I simply don't end up directed to places where there are ads. And let me tell you, if you've been away from that stuff for a while, and then come back, holy crap what a dumpster fire.
The web browser has been center stage for a long while, coasting on momentum and old habits, but it turns out it doesn't need to be, and if you work to get rid of it, you get a better and more enjoyable computing experience. Given how much better this feels, I can't help but feel we're in for a big shift in how computers are used.
[1] You can just launch 'chrome --app=url' to make one. Or use Electron if you want to customize the UI yourself.
black3r|1 year ago
lemoncookiechip|1 year ago
post-it|1 year ago
ronjouch|1 year ago
at0mic22|1 year ago
Last month Google have also enstricted YouTube policies which IMHO is a sign, that they are not reaching specific milestones and that'd definitely be reflected over the alphabet stocks
zelphirkalt|1 year ago
gazchop|1 year ago
Joeri|1 year ago
bythreads|1 year ago
Object content as lazy
Embed lazy
Image lazy
Link rel=import (not support that widely though)
Heck if you wanted to get REALLY cute you go use multipart-mixed-replace headers.
Or SSE
DanielHB|1 year ago
blindriver|1 year ago
Everyone I know under 25 has stopped using Google search altogether.
I think the only people disabling JavaScript must be GenX graybeards such as myself or security experts.
markasoftware|1 year ago
completely unhinged take. Everyone I know under 25, as someone under 25, uses Google search at least an order of magnitude more than they use AI querying.
elicksaur|1 year ago
nibbles|1 year ago
[deleted]
throeurir|1 year ago
ant6n|1 year ago
kragen|1 year ago
elbowjack65|1 year ago
niutech|1 year ago
unknown|1 year ago
[deleted]
unknown|1 year ago
[deleted]
ChrisArchitect|1 year ago
Google.com search now refusing to search for FF esr 128 without JavaScript
https://news.ycombinator.com/item?id=42719865
EvanAnderson|1 year ago
scotty79|1 year ago
croes|1 year ago
A.k.a ads
1vuio0pswjnm7|1 year ago
jazzyjackson|1 year ago
koakuma-chan|1 year ago
Alifatisk|1 year ago
niutech|1 year ago
linker3000|1 year ago
That's akin with a response to some incident where companies "Take [user security etc.] seriously", when the immediate thought is, yeah, but if you did, that [thing] probably wouldn't have happened.
Dunno why I wrote all that - I don't use Google search, because I wanted to enhance (aka unenshitten) my search experience.
can16358p|1 year ago
dotancohen|1 year ago
PoW could be written into infrequent pages such as the registration page and reset password page. It could run while the user fills in the form. I might implement this on some sites that get attacked.
KingOfCoders|1 year ago
unknown|1 year ago
[deleted]
danielktdoranie|1 year ago
unknown|1 year ago
[deleted]