If it were me, and I wasn't willing to just block the traffic, I might just set a 128 kbps limit on it and call it a day[1]. Eventually, the other side will figure out that their fetchers are all backed up and work out how to do their job without burning so much bandwidth.
[1] Yeah, that can be a bit of a pain to setup depending on the server settings, but some people have to pay for bandwidth and server resources, so it's probably worthwhile.
Why not just return a 429 (Too Many Requests) if the specific repo has been requested by google not too long ago ? (e.g. 1 hour earlier or).
It's a standard response code, and with a bit of luck google will scale the requests accordingly. If not, this will still allow the proxy to operate properly without burning too much server resources.
(I understand that this may leave some customers unhappy since the proxy may be up to 1 hour stale. If that's the case, the server could also add a check to see if there have been pushes since last request, though this is quite more complex)
Or you could mess with a random percentage of the requests: tarpit them, drop random packets, reply with malformed answers etc. If you keep the percentage low they might have a fun time debugging :)
The rate limit is unlikely to cause a problem. Google has been crawling the web since its very start, and the internal services which fetch resources from external web servers are extremely resilient. Some request fails? Some request is slow? It's not going to slow down other requests. Maybe these services aren't being used for Go, but the expertise is on tap.
(These kinds services are also supposed to rate-limit their requests.)
Keeping the rate-limited connection alive eats up resources. In this case, the organisation executing the DDOS attack has virtually limitless resources.
Regardless of whether or not Drew is abrasive (I've never dealt with him so have no opinion), this is on Google.
If a single service (Go's module crawler) is doing this to multiple third parties (and it is, by design) then requiring Drew to implement a workaround is the same thing as relying on every (relevant to the issue) third party to do the same thing. Which is bad design.
This should be fixed at source or switched off.
As for the workaround itself it's true that going on a crawler exclusion list will not stop SourceHut being usable, only make it a little stale, but that doesn't matter. The problem is not at Drew's end. And once again for this workaround to be an effective solution every (relevant to the issue) third party needs to do the same thing. Which is bad design.
As a general rule, with excessive polling the system doing the polling is the one at fault. And as a general rule if one system impacts many fix at the originator. If Google devs don't know this, then why on earth are they getting the salaries they do?
Can you help me understand the problem with the workaround, which is precisely to have Google's proxy not excessively poll DeVault's service? It really seems like DeVault's real argument here isn't about the impact of this on his service, but that he doesn't like the design of the proxy. He has a lot of standing to complain about impacts on his service, but essentially no standing to complain about designs he finds sub-par.
> I was banned from the Go issue tracker for mysterious reasons, so I cannot continue to nag them for a fix.¹ I can’t blackhole their IP addresses, because that would make all Go modules hosted on git.sr.ht stop working for default Go configurations (i.e. without GOPROXY=direct). I tried to advocate for Linux distros to patch out GOPROXY by default, citing privacy reasons, but I was unsuccessful. I have no further recourse but to tolerate having our little-fish service DoS’d by a 1.38 trillion dollar company. But I will say that if I was in their position, and my service was mistakenly sending an excessive amount of traffic to someone else, I would make it my first priority to fix it. But I suppose no one will get promoted for prioritizing that at Google.
> [1]: In violation of Go’s own Code of Conduct, by the way, which requires that participants are notified moderator actions against them and given the opportunity to appeal. I happen to be well versed in Go’s CoC given that I was banned once before without notice — a ban which was later overturned on the grounds that the moderator was wrong in the first place. Great community, guys.
> In the meantime, if you would prefer, we can turn off all refresh traffic for your domain while we continue to improve this on our end. That would mean that the only traffic you would receive from us would be the result of a request directly from a user. This may impact the freshness of your domain's data which users receive from our servers, since we need to have some caching on our end to prevent too frequent fetches.
Disclosure: I was on the Go team at Google until earlier this month. Dealing with DeVault's bad faith arguments is one of the few things I won't miss of that job.
I think they are right not to obey robots.txt in this case. If I tell Go to download a module it shouldn't follow robots.txt because I am a human and I requested it. This is similar to if you had private iCal URLs on your server the right think would be to deny them in robots.txt (in case a crawler found a leaked link) but a service that is monitoring a specific iCal calendar should still fetch it.
Basically robots.txt is more about what should be crawled than how. In this problem it appears that the traffic is desired in general, but it is being done far too often. robots.txt does have primative rate limiting configs but that seems to be a minor part of the file.
Of course like anything there is nuance and there is definitely some middle ground between crawlers and humans.
> Yesterday, GoModuleMirror downloaded 4 gigabytes of data from my server requesting a single module over 500 times (log attached). As far as I know, I am the only person in the world using this Go module.
I like golang as a developer, but this is a terrible implementation. I'm somewhat tempted to say that blocking the Google IP addresses is the correct answer in that it will force some sort of wider action (linux repos setting `GOPROXY=direct`, Google fixing their code, or unfortunately, golang modules moving off sourcehut).
> Anyone who's receiving too much traffic from proxy.golang.org can request that they be excluded from the refresh traffic, as we did for git.lubar.me. Nobody asked for sr.ht be added to the exclusion set, so as far as it's concerned nothing has changed.
one can definitely see what people are promoted for at Google :)
> the Go Module Mirror runs some crawlers that periodically clone Git repositories with Go modules in them to check for updates.
>The service is distributed across many nodes which all crawl modules independently of one another, resulting in very redundant git traffic.
basically slapping together a very inefficient alpha, and obviously people aren't promoted for fixing such glaring inefficiencies to make it even into a half reasonable beta.
And that is just hilarious, like people in Google never heard of CDN, HEAD, git fetch, etc. - of course they know it, and it is really just an arrogance of an 800lb gorilla toward "small-fish" - "https://github.com/golang/go/issues/44577#issuecomment-85692... - with a passive-aggressive blackmail of a cherry on top :
>In the meantime, if you would prefer, we can turn off all refresh traffic for your domain while we continue to improve this on our end. That would mean that the only traffic you would receive from us would be the result of a request directly from a user. This may impact the freshness of your domain's data which users receive from our servers
It sucks having to work around something like this, but maybe the following would work: only allow the first checkout from a given Go node, blackhole later accesses. If the repository is modified or a certain time elapsed, reset and allow a download again.
Also, if you want to escalate, I wonder if there is a way to create a fake git repository that expands to a huge amount of data when cloning, but uses minimal bandwidth on the server side. Set up a bunch of those on some other host, and use them from go, and wait until google notices. Something like this: https://github.com/ilikenwf/git-zlib-bomb
The problem is that a) you want it to work properly for users of Go modules hosted on your Git forge and b) you don't want to effectively waste time on working around a problem that the GoProxy engineers could just solve with a more intelligent (but maybe more complex) design.
I'm wondering how much load this is sending to github. Github has a ton of golang packages, including many non-popular ones that wouldn't otherwise get much traffic. A refresh job running full clones many times a day must be burning up bandwidth and compute over there as well. I suppose it's a drop in the bucket for Github's usage, but it's got to be a huge number.
I wouldn't be surprised if half the value of the Microsoft acquisition was to get GitHub's infra closer to the super-scale pipes and peering that Azure/Office/Update/etc leverage. I would expect they probably run tens to hundreds of PBs per month.
I was banned from the Go issue tracker for mysterious reasons[1], so I cannot continue to nag them for a fix.
[1]In violation of Go’s own Code of Conduct, by the way, which requires that participants are notified moderator actions against them and given the opportunity to appeal. I happen to be well versed in Go’s CoC given that I was banned once before without notice — a ban which was later overturned on the grounds that the moderator was wrong in the first place. Great community, guys. ↩
```
What happened to "don't be evil" I wonder? (And I know, it's that Google has now become a big grown up corporation)
What I find shocking is not Google doing this or even how Go team didn't care about the issue and apparently banished him, but how much people here think this is an acceptable behavior and how much people take their time to defend Google and to justify this behavior blaming not Google but Drew because "it is hard to deal with him".
The fact that a programming language calls home to by Google by default should make it a non-starter for most sane developers. The fact that it calls home so it can DDoS other sites is low-key hilarious. And you'd think Google would know how to like... operate an efficient CDN, perhaps?
Like, if this was managed by a competent company, you'd think this service would be akin to putting Cloudflare in front of your servers: It should minimize the heck out of your traffic because Google is able to cache it and serve it to the masses more efficiently. But it's Google, so it Googles.
Being blocked from the issue tracker for not being okay with being DDoS'd is peak Google. I am sure you were accused of a Code of Conduct violation, because declaring a CoC violation is much, much cheaper than fixing their infrastructure.
You can phone anywhere you want; the proxy is configurable, and there are (from what I can see) several independent implementations of the proxy itself. Go defaults to Google's proxy; few people change the default because Google's proxy is very good.
Go was originally advertised as Google "sponsoring" Rob Pike & Co. to create a programming language. For a while it even sounded like the Java, Python, C++ devs at Google didn't even like it.
The website focused on the programming language, was basically in Plan 9 style, no nonsense, the programming language sane and very cross-platform.
Things changed. Now one has to scroll through big company brands and marketing blabla to even get a glimpse of the language, there's big Google logos, everything is JavaScript heavy, really well written[1] tutorials got hidden and there is now tutorials with mistakes and bad code that got pointed out[2], but are "won't fix", a programming language where people used to admit that new() is kind of redundant and maybe wasn't the idea tries to pull in random programming language features[3]. On top of that it was a programming language starting out to embrace the fact that there are many different operating systems officially supporting even Plan 9, DragonFly, etc., but after already having started to hollow that out creating tiers it seems they now completely want to shift away from that[4]. And then of course there's things like the module proxy causing issues.
People got their hopes up, but it almost feels like Google is pulling an embrace, extend, extinguish on Go - or at least its original design. But I still hope that I am just seeing things that aren't there.
> The fact that a programming language calls home to by Google by default should make it a non-starter for most sane developers.
How is it any different from Rubygems, NPM, PyPi, or any other package repository? In most you can bypass it by using git repos, but almost no one does that. And the GOPROXY does offer real benefits, such as preventing left-pad problems.
As others have said, if you really think this is a huge problem for you then it's easy to disable, which is actually easier than what most other package managers offer, but I don't really see the harm in the first place.
Are you going to get upset at node for calling home to Microsoft (npm owned by github owned by microsoft) when using the supplied package management too?
that's what big organisations do, it' their DNA to flood and exhaust smaller ones. They just don't care about anybody not their scale. Ignorant egoists.
I'm not a Go user so I don't really know how to Go proxy works, but if webmasters had a way to opt out via robots.txt or specific status codes (429) and then the proxy at Google sends a response to the client meaning "try it yourself" (i.e., try again in direct mode) then it could have been an adequate solution for this scenario.
Athens, a self-hosted free Go proxy implementation, implements rate-limiting for GitHub (with a pretty horrific Github-specific behavior). This means GitHub does implement rate-limiting to cope with aggressive Go proxies.
There is a very simple way to get them to stop sending the .5 qps that is described as a ddos.
The linked bug show another user successfully saying "please opt me out", and Google building the feature to do that in a week.
Drew has for some reason chosen not to ask for an opt out, even though it appears trivial and would probably be fixed by the weekend if he asked for it.
Calling it .5 QPS to downplay the severity is willfully ignoring the complaint in the article. It's not just a query, it's a full Git clone of the entire repo with it's history. That's a huge difference.
" I can’t blackhole their IP addresses, because that would make all Go modules hosted on git.sr.ht stop working for default Go configurations (i.e. without GOPROXY=direct)"
[+] [-] toast0|3 years ago|reply
[1] Yeah, that can be a bit of a pain to setup depending on the server settings, but some people have to pay for bandwidth and server resources, so it's probably worthwhile.
[+] [-] mrighele|3 years ago|reply
It's a standard response code, and with a bit of luck google will scale the requests accordingly. If not, this will still allow the proxy to operate properly without burning too much server resources.
(I understand that this may leave some customers unhappy since the proxy may be up to 1 hour stale. If that's the case, the server could also add a check to see if there have been pushes since last request, though this is quite more complex)
[+] [-] ShowalkKama|3 years ago|reply
[+] [-] klodolph|3 years ago|reply
(These kinds services are also supposed to rate-limit their requests.)
[+] [-] mpol|3 years ago|reply
[+] [-] WhyNotHugo|3 years ago|reply
[+] [-] tmp_anon_22|3 years ago|reply
[+] [-] kcartlidge|3 years ago|reply
If a single service (Go's module crawler) is doing this to multiple third parties (and it is, by design) then requiring Drew to implement a workaround is the same thing as relying on every (relevant to the issue) third party to do the same thing. Which is bad design.
This should be fixed at source or switched off.
As for the workaround itself it's true that going on a crawler exclusion list will not stop SourceHut being usable, only make it a little stale, but that doesn't matter. The problem is not at Drew's end. And once again for this workaround to be an effective solution every (relevant to the issue) third party needs to do the same thing. Which is bad design.
As a general rule, with excessive polling the system doing the polling is the one at fault. And as a general rule if one system impacts many fix at the originator. If Google devs don't know this, then why on earth are they getting the salaries they do?
[+] [-] tptacek|3 years ago|reply
[+] [-] grawlinson|3 years ago|reply
This is absolutely shocking behaviour, and I’m mortified at the precedent that it sets.
[+] [-] rsstack|3 years ago|reply
[+] [-] threatofrain|3 years ago|reply
> [1]: In violation of Go’s own Code of Conduct, by the way, which requires that participants are notified moderator actions against them and given the opportunity to appeal. I happen to be well versed in Go’s CoC given that I was banned once before without notice — a ban which was later overturned on the grounds that the moderator was wrong in the first place. Great community, guys.
[+] [-] FiloSottile|3 years ago|reply
https://github.com/golang/go/issues/44577#issuecomment-85692...
> "EFAIL" is an alarmist puff piece written by morons to slander PGP and inflate their egos. [...]
https://github.com/golang/go/issues/30141#issuecomment-46427...
Disclosure: I was on the Go team at Google until earlier this month. Dealing with DeVault's bad faith arguments is one of the few things I won't miss of that job.
[+] [-] downsidesabound|3 years ago|reply
Being right isn’t the only most important thing in the world, nor does it wash away all transgressions.
[+] [-] unknown|3 years ago|reply
[deleted]
[+] [-] unknown|3 years ago|reply
[deleted]
[+] [-] dmm|3 years ago|reply
This seems wrong. I guess I always assumed that robots.txt applied to non-humans.
[+] [-] kevincox|3 years ago|reply
Basically robots.txt is more about what should be crawled than how. In this problem it appears that the traffic is desired in general, but it is being done far too often. robots.txt does have primative rate limiting configs but that seems to be a minor part of the file.
Of course like anything there is nuance and there is definitely some middle ground between crawlers and humans.
[+] [-] AshamedCaptain|3 years ago|reply
I suspect it's just a cultural thing inside Google.
[+] [-] jaywalk|3 years ago|reply
[+] [-] 8organicbits|3 years ago|reply
> Yesterday, GoModuleMirror downloaded 4 gigabytes of data from my server requesting a single module over 500 times (log attached). As far as I know, I am the only person in the world using this Go module.
From https://github.com/golang/go/issues/44577#issuecomment-78924...
> yes we make a fresh clone every time
I like golang as a developer, but this is a terrible implementation. I'm somewhat tempted to say that blocking the Google IP addresses is the correct answer in that it will force some sort of wider action (linux repos setting `GOPROXY=direct`, Google fixing their code, or unfortunately, golang modules moving off sourcehut).
[+] [-] theamk|3 years ago|reply
> Anyone who's receiving too much traffic from proxy.golang.org can request that they be excluded from the refresh traffic, as we did for git.lubar.me. Nobody asked for sr.ht be added to the exclusion set, so as far as it's concerned nothing has changed.
[+] [-] unknown|3 years ago|reply
[deleted]
[+] [-] trhway|3 years ago|reply
> the Go Module Mirror runs some crawlers that periodically clone Git repositories with Go modules in them to check for updates.
>The service is distributed across many nodes which all crawl modules independently of one another, resulting in very redundant git traffic.
basically slapping together a very inefficient alpha, and obviously people aren't promoted for fixing such glaring inefficiencies to make it even into a half reasonable beta.
And that is just hilarious, like people in Google never heard of CDN, HEAD, git fetch, etc. - of course they know it, and it is really just an arrogance of an 800lb gorilla toward "small-fish" - "https://github.com/golang/go/issues/44577#issuecomment-85692... - with a passive-aggressive blackmail of a cherry on top :
>In the meantime, if you would prefer, we can turn off all refresh traffic for your domain while we continue to improve this on our end. That would mean that the only traffic you would receive from us would be the result of a request directly from a user. This may impact the freshness of your domain's data which users receive from our servers
[+] [-] captainmuon|3 years ago|reply
Also, if you want to escalate, I wonder if there is a way to create a fake git repository that expands to a huge amount of data when cloning, but uses minimal bandwidth on the server side. Set up a bunch of those on some other host, and use them from go, and wait until google notices. Something like this: https://github.com/ilikenwf/git-zlib-bomb
[+] [-] minus7|3 years ago|reply
[+] [-] verdverm|3 years ago|reply
[+] [-] 8organicbits|3 years ago|reply
[+] [-] exikyut|3 years ago|reply
[+] [-] tete|3 years ago|reply
[+] [-] yowlingcat|3 years ago|reply
```
I was banned from the Go issue tracker for mysterious reasons[1], so I cannot continue to nag them for a fix.
[1]In violation of Go’s own Code of Conduct, by the way, which requires that participants are notified moderator actions against them and given the opportunity to appeal. I happen to be well versed in Go’s CoC given that I was banned once before without notice — a ban which was later overturned on the grounds that the moderator was wrong in the first place. Great community, guys. ↩
```
What happened to "don't be evil" I wonder? (And I know, it's that Google has now become a big grown up corporation)
[+] [-] waych|3 years ago|reply
[+] [-] PufPufPuf|3 years ago|reply
[+] [-] akagusu|3 years ago|reply
[+] [-] ocdtrekkie|3 years ago|reply
Like, if this was managed by a competent company, you'd think this service would be akin to putting Cloudflare in front of your servers: It should minimize the heck out of your traffic because Google is able to cache it and serve it to the masses more efficiently. But it's Google, so it Googles.
Being blocked from the issue tracker for not being okay with being DDoS'd is peak Google. I am sure you were accused of a Code of Conduct violation, because declaring a CoC violation is much, much cheaper than fixing their infrastructure.
[+] [-] tptacek|3 years ago|reply
[+] [-] tete|3 years ago|reply
The website focused on the programming language, was basically in Plan 9 style, no nonsense, the programming language sane and very cross-platform.
Things changed. Now one has to scroll through big company brands and marketing blabla to even get a glimpse of the language, there's big Google logos, everything is JavaScript heavy, really well written[1] tutorials got hidden and there is now tutorials with mistakes and bad code that got pointed out[2], but are "won't fix", a programming language where people used to admit that new() is kind of redundant and maybe wasn't the idea tries to pull in random programming language features[3]. On top of that it was a programming language starting out to embrace the fact that there are many different operating systems officially supporting even Plan 9, DragonFly, etc., but after already having started to hollow that out creating tiers it seems they now completely want to shift away from that[4]. And then of course there's things like the module proxy causing issues.
People got their hopes up, but it almost feels like Google is pulling an embrace, extend, extinguish on Go - or at least its original design. But I still hope that I am just seeing things that aren't there.
[1] https://go.dev/doc/effective_go
[2] https://groups.google.com/g/golang-dev/c/kC7YZsHTw4Y/m/u0_d9...
[3] https://github.com/golang/go/issues/21498#issuecomment-11322...
[4] https://github.com/golang/go/discussions/53060
[+] [-] Beltalowda|3 years ago|reply
How is it any different from Rubygems, NPM, PyPi, or any other package repository? In most you can bypass it by using git repos, but almost no one does that. And the GOPROXY does offer real benefits, such as preventing left-pad problems.
As others have said, if you really think this is a huge problem for you then it's easy to disable, which is actually easier than what most other package managers offer, but I don't really see the harm in the first place.
[+] [-] sascha_sl|3 years ago|reply
[+] [-] mro_name|3 years ago|reply
[+] [-] lbschenkel|3 years ago|reply
[+] [-] garaetjjte|3 years ago|reply
[+] [-] kS6n6GN8If|3 years ago|reply
https://github.com/gomods/athens/blob/723c06bd8c13cc7bd238e6...
Food for thoughts.
[+] [-] joshuamorton|3 years ago|reply
The linked bug show another user successfully saying "please opt me out", and Google building the feature to do that in a week.
Drew has for some reason chosen not to ask for an opt out, even though it appears trivial and would probably be fixed by the weekend if he asked for it.
[+] [-] Rantenki|3 years ago|reply
[+] [-] mig39|3 years ago|reply
[+] [-] bryan_w|3 years ago|reply
[+] [-] gen3|3 years ago|reply
[+] [-] VWWHFSfQ|3 years ago|reply
[+] [-] nurettin|3 years ago|reply