> Some users may knowingly install this software on their devices, lured by the promise of “monetizing” their spare bandwidth.
Sounds like they’re targeting networks even if the users are ok participating in, precisely what you’re saying is ok.
As for malware enrolling people into the network, it depends if the operator is doing it or if the malware is 3rd parties trying to get a portion of the cash flow. In the latter case the network would be the victim that’s double victimized by Google also attacking them.
Getting rid of malware is good. A private for-profit company exercising its power over the Internet, not so much. We should have appropriate organizations for this.
Many are "compensated" (in the way of software they didn't pay for), so the real question is that of disclosure (in which case many software vendors check the box in the most minimal way possible by including it as fine print during the install)
> Ones which you pay for and which are running legitimately, with the knowledge (and compensation) of those who run them.
The problem is, it is by default unethical to have residential users be exit nodes for VPNs - unless these users are lawyers or technical experts.
No matter what you do as a "residential proxy" company - you cannot prevent your service being used by CSAM peddlers, and thus you cannot prevent that your exit nodes aren't the ones whose IP addresses show up when the FBI comes knocking.
I learn: proxy networks run by large corps are good. True internet is bad. While I understand that often we are talking about Malware/Worms etc that enable this. However, i find it often disturbing to here often a lot of libertarian speech from the tech scene, while on the other hand are feeling themselves very comfortable to take over state power like policing efforts to save the world.
> These efforts to help keep the broader digital ecosystem safe supplement the protections we have to safeguard Android users on certified devices. We ensured Google Play Protect, Android’s built-in security protection, automatically warns users and removes applications known to incorporate IPIDEA SDKs, and blocks any future install attempts.
Nice to see Google Play Protect actually serving a purpose for once.
Residential proxies are the only way to crawl and scrape. It's ironic for this article to come from the biggest scraping company that ever existed!
If you crawl at 1Hz per crawled IP, no reasonable server would suffer from this. It's the few bad apples (impatient people who don't rate limit) who ruin the internet for both users and hosters alike. And then there's Google.
First of: Google has not once crashed one of our sites with GoogleBot. They have never tried to by-pass our caching and they are open and honest about their IP ranges, allowing us to rate-limit if needed.
The residential proxies are not needed, if you behave. My take is that you want to scrape stuff that site owners do not want to give you and you don't want to be told no or perhaps pay a license. That is the only case where I can see you needing a residential proxies.
One thing about Google is that many anti-scraping services explicitly allow access to Google and maybe couple of other search engines. Everybody else gets to enjoy CloudFlare captcha, even when doing crawling at reasonable speeds.
I'd still like the ability to just block a crawler by its IP range, but these days nope.
1 Hz is 86400 hits per day, or 600k hits per week. That's just one crawler.
Just checked my access log... 958k hits in a week from 622k unique addresses.
95% is fetching random links from u-boot repository that I host, which is completely random. I blocked all of the GCP/AWS/Alibaba and of course Azure cloud IP ranges.
It's almost all now just comming of a "residential" and "mobile" IP address space from completely random places all around the world. I'm pretty sure my u-boot fork is not that popular. :-D
Every request is a new IP address, and available IP space of the crawler(s) is millions of addresses.
I don't host a popular repo. I host a bot attraction.
My understanding is that routing through residential IPs is a part of the business of some VPN providers. I don't know how above board they are on this (as in notifying customers that this may happen, however buried in the usage agreement, or even allowing them to opt out).
But, my main point, is that the whole business is "on the up and up" vs some dark botnet.
Anyone could scrape the net, then modern scrapes came along with their shitty code and absolutely no respect. The reason why so many of us block or throttle scrapers is because they miss behave. They don't back off, they try to by-pass caches and if they crash a site they don't adjust, they will just pound it the ground again when it's back. We managed to talk to one large AI company would didn't really want to fix anything, but told us that they'd be fine with us just rate limiting them, as if we somehow owed them anything. They just get a stupid low rps now, even if we'd let them go faster, if they'd just fix they bot.
Some sites don't want you scraping, but it's their content, their rules. We don't really care, but we have to due to the number and quality of the bots we're seeing. This is in my mind a 100% self-imposed problem from the scrapers.
I'm actually a little shocked seeing that there was a WebOS variant of the residential proxying SDK endpoint. Does that mean there might be a bit more unchecked malware lurking behind the scenes in the LG ecosystem?
Personally I'm surprised they didn't have a Samsung option.
I keep my brand new LG C5 totally disconnected from the internet and use my Apple TV for movie watching. I’m not going to trust a company like LG to secure their devices.
Google shows a samaple of the IOCs but Google Trust Services have issued a number of the SSL certs for those domains that have not been revoked (yet?).
Only looking at the:
- a8d3b9e1f5c7024d6e0b7a2c9f1d83e5.com
- af4760df2c08896a9638e26e7dd20aae.com
- cfe47df26c8eaf0a7c136b50c703e173.com
Looks like a standard MD5 hash domain pattern of which currently there are:
They have a robust KYC that appears to serve, at least in large part, as a way to stay off the shit list of companies with the resources to pursue recourse.
Source: went through that process, ended up going a different route. The rep was refreshingly transparent about where they get the data, why the have the kyc process (aside from regulatory compliance).
Ended up going with a different provider who has been cheaper and very reliable, so no complaints.
I've helped multiple people remove residential proxy malware that was turning their network into a brightdata exit node and they had no idea / did not consent to it. Why is google selectively targeting one provider while letting others operate freely?
I've had enough of companies saying "you're connecting from an AWS IP address, therefore you aren't allowed in, or must buy enterprise licensing". Reddit is an example which totally blocks all data to non-residential IP's.
I want exactly the same content visible no matter who you are or where you are connecting from, and a robust network of residential proxies is a stepping stone to achieving that.
If you look at the article, the network they disrupted pays software vendors per-download to sneakily turn their users into residential proxy endpoints. I'm sure that at least some of the time the user is technically agreeing to some wording buried in the ToS saying they consent to this, but it's certainly unethical. I wouldn't want to proxy traffic from random people through my home network, that's how you get legal threats from media companies or the police called to your house.
I live in the UK and can't view a large portion of the internet without having to submit my ID to _every_ site serving anything deemed "not safe the for the children". I had a question about a new piercing and couldn't get info on it from Reddit because of that. I try using a VPN and they're blocked too. Luckily, I work at a copmany selling proxies so I've got free proxies whenever I want, but I shouldn't _need_ to use them.
I find it funny that companies like Reddit, who make their money entirely from content produced by users for free (which is also often sourced from other parts of the internet without permission), are so against their site being scraped that they have to objectively ruin the site for everyone using it. See the API changes and killing off of third party apps.
Obviously, it's mostly for advertising purposes, but they love to talk about the load scraping puts on their site, even suing AI companies and SerpApi for it. If it's truly that bad, just offer a free API for the scrapers to use - or even an API that works out just slightly cheaper than using proxies...
My ideal internet would look something like that, all content free and accessible to everyone.
> I want exactly the same content visible no matter who you are or where you are connecting from
The reason those IP addresses get blocked is not because of "who" is connecting, but "what"
Traffic from datacenter address ranges to sites like Reddit is almost entirely bots and scrapers. They can put a tremendous load on your site because many will try to run their queries as fast as they can with as many IPs as they can get.
Blocking these IP addresses catches a few false positives, but it's an easy step to make botting and scraping a little more expensive. Residential proxies aren't all that expensive, but now there's a little line item bill that comes with their request volume that makes them think twice.
> We need more residential proxies, not less
Great, you can always volunteer your home IP address as a start. There are services that will pay you a nominal amount for it, even.
There's a company that pays you to keep their box connected to your residential router. I assume it sells residential proxy services, maybe also DDoS services, I don't know. It's aptly named Absurd Computing.
I still "run" a small ISP with a few thousand residential ips from my scraping days. The requirements are laughable and costs were negligible in the early 2000s.
This blog post from the company that used promise "don't be evil", one that steals water for data centers from vilages and towns via shady deals, whose whole premise it stealing other people's stuff and claiming it as their own and locking them out and selling their data.. Who made them the arbiter of the internet? No one!!!
They just stole this and get on their high horse to tell people how to use internet? You can eff right off Google.
[+] [-] progbits|1 month ago|reply
Yes, proxies are good. Ones which you pay for and which are running legitimately, with the knowledge (and compensation) of those who run them.
Malware in random apps running on your device without your knowledge is bad.
[+] [-] vlovich123|1 month ago|reply
Sounds like they’re targeting networks even if the users are ok participating in, precisely what you’re saying is ok.
As for malware enrolling people into the network, it depends if the operator is doing it or if the malware is 3rd parties trying to get a portion of the cash flow. In the latter case the network would be the victim that’s double victimized by Google also attacking them.
[+] [-] throwoutway|1 month ago|reply
And ones that have all the indicators of compromise of Russia, Iran, DPRK, PRC, etc
[+] [-] CodeMage|1 month ago|reply
[+] [-] bdcravens|1 month ago|reply
[+] [-] mschuster91|1 month ago|reply
The problem is, it is by default unethical to have residential users be exit nodes for VPNs - unless these users are lawyers or technical experts.
No matter what you do as a "residential proxy" company - you cannot prevent your service being used by CSAM peddlers, and thus you cannot prevent that your exit nodes aren't the ones whose IP addresses show up when the FBI comes knocking.
[+] [-] riedel|1 month ago|reply
[+] [-] bettystaplesmd|1 month ago|reply
[deleted]
[+] [-] xyzzy_plugh|1 month ago|reply
Nice to see Google Play Protect actually serving a purpose for once.
[+] [-] edg5000|1 month ago|reply
If you crawl at 1Hz per crawled IP, no reasonable server would suffer from this. It's the few bad apples (impatient people who don't rate limit) who ruin the internet for both users and hosters alike. And then there's Google.
[+] [-] mrweasel|1 month ago|reply
The residential proxies are not needed, if you behave. My take is that you want to scrape stuff that site owners do not want to give you and you don't want to be told no or perhaps pay a license. That is the only case where I can see you needing a residential proxies.
[+] [-] Ronsenshi|1 month ago|reply
Rules For Thee but Not for Me
[+] [-] megous|1 month ago|reply
1 Hz is 86400 hits per day, or 600k hits per week. That's just one crawler.
Just checked my access log... 958k hits in a week from 622k unique addresses.
95% is fetching random links from u-boot repository that I host, which is completely random. I blocked all of the GCP/AWS/Alibaba and of course Azure cloud IP ranges.
It's almost all now just comming of a "residential" and "mobile" IP address space from completely random places all around the world. I'm pretty sure my u-boot fork is not that popular. :-D
Every request is a new IP address, and available IP space of the crawler(s) is millions of addresses.
I don't host a popular repo. I host a bot attraction.
[+] [-] whartung|1 month ago|reply
But, my main point, is that the whole business is "on the up and up" vs some dark botnet.
[+] [-] scirob|1 month ago|reply
[+] [-] mrweasel|1 month ago|reply
Some sites don't want you scraping, but it's their content, their rules. We don't really care, but we have to due to the number and quality of the bots we're seeing. This is in my mind a 100% self-imposed problem from the scrapers.
[+] [-] a456463|1 month ago|reply
[+] [-] kotaKat|1 month ago|reply
Personally I'm surprised they didn't have a Samsung option.
[+] [-] wincy|1 month ago|reply
[+] [-] dewey|1 month ago|reply
[+] [-] kingforaday|1 month ago|reply
Only looking at the:
- a8d3b9e1f5c7024d6e0b7a2c9f1d83e5.com
- af4760df2c08896a9638e26e7dd20aae.com
- cfe47df26c8eaf0a7c136b50c703e173.com
Looks like a standard MD5 hash domain pattern of which currently there are:
If you look at some of the others (not listed in Google's IOC), they tend to have a pattern with their SSL certs e.g.:- 0e6f931862947ad58bf3d1a0c5a6f91f.com
- 17e4435ad10c15887d1faea64ee7eac4.com would there be any reason any of these would be legitimate?[+] [-] AugustoCAS|1 month ago|reply
The largest companies in this space that do similar this (oxylabs, brighdata,etc) have similar tactics but are based in a different location.
[+] [-] chatmasta|1 month ago|reply
[+] [-] 7thpower|1 month ago|reply
Source: went through that process, ended up going a different route. The rep was refreshingly transparent about where they get the data, why the have the kyc process (aside from regulatory compliance).
Ended up going with a different provider who has been cheaper and very reliable, so no complaints.
[+] [-] walletdrainer|1 month ago|reply
[deleted]
[+] [-] avastel|1 month ago|reply
Note that even after the disruption, I'm still able to route millions of requests/day through IP IDEA's network
[+] [-] Rasbora|1 month ago|reply
You can check if your network is infected here: https://layer3intel.com/is-my-network-a-residential-proxy
[+] [-] niedbalski|1 month ago|reply
[+] [-] tclancy|1 month ago|reply
[+] [-] walletdrainer|1 month ago|reply
When the Chinese do this? Very bad.
[+] [-] VladVladikoff|1 month ago|reply
[+] [-] londons_explore|1 month ago|reply
I've had enough of companies saying "you're connecting from an AWS IP address, therefore you aren't allowed in, or must buy enterprise licensing". Reddit is an example which totally blocks all data to non-residential IP's.
I want exactly the same content visible no matter who you are or where you are connecting from, and a robust network of residential proxies is a stepping stone to achieving that.
[+] [-] ndiddy|1 month ago|reply
[+] [-] JDye|1 month ago|reply
I find it funny that companies like Reddit, who make their money entirely from content produced by users for free (which is also often sourced from other parts of the internet without permission), are so against their site being scraped that they have to objectively ruin the site for everyone using it. See the API changes and killing off of third party apps.
Obviously, it's mostly for advertising purposes, but they love to talk about the load scraping puts on their site, even suing AI companies and SerpApi for it. If it's truly that bad, just offer a free API for the scrapers to use - or even an API that works out just slightly cheaper than using proxies...
My ideal internet would look something like that, all content free and accessible to everyone.
[+] [-] Aurornis|1 month ago|reply
The reason those IP addresses get blocked is not because of "who" is connecting, but "what"
Traffic from datacenter address ranges to sites like Reddit is almost entirely bots and scrapers. They can put a tremendous load on your site because many will try to run their queries as fast as they can with as many IPs as they can get.
Blocking these IP addresses catches a few false positives, but it's an easy step to make botting and scraping a little more expensive. Residential proxies aren't all that expensive, but now there's a little line item bill that comes with their request volume that makes them think twice.
> We need more residential proxies, not less
Great, you can always volunteer your home IP address as a start. There are services that will pay you a nominal amount for it, even.
[+] [-] direwolf20|1 month ago|reply
What will you be proxying? Nobody knows! I haven't had the police at my house yet.
Seems a great way to say "fuck you" to companies that block IP addresses.
You may see a few more CAPTCHAs. If you have a dynamic IP address, not many.
[+] [-] tokyobreakfast|1 month ago|reply
I run a honeypot and the amount of bot traffic coming from AWS is insane. It's like 80% before filtering, and it's 100% illegitimate.
[+] [-] yuliyp|1 month ago|reply
[+] [-] nine_k|1 month ago|reply
[+] [-] crtasm|1 month ago|reply
[+] [-] xg15|1 month ago|reply
[+] [-] BoredPositron|1 month ago|reply
[+] [-] a456463|1 month ago|reply
They just stole this and get on their high horse to tell people how to use internet? You can eff right off Google.
[+] [-] packetslave|1 month ago|reply
[deleted]
[+] [-] direwolf20|1 month ago|reply
[+] [-] brikym|1 month ago|reply
[+] [-] arewethereyeta|1 month ago|reply
[+] [-] g947o|1 month ago|reply
Sounds like "malicious activity" == "scraping activities that don't come from Google"
[+] [-] nubinetwork|1 month ago|reply
[+] [-] buddylw|1 month ago|reply