2) Publish those commits under your name, with your email address, and GitHub will prominently display it as if you made the commit (most do not use GPG signatures, and most do not know to look for "Verified" anyway)
It seemed only a matter of time before this behavior got abused for something (anti-DMCA action is perhaps the best outcome of this situation I can imagine..)
I'm not quite sure what you're saying here. Are you claiming that you can push a commit to userA/project and then view it under the Github web interface for userB/project (assuming one repo is forked from the other)?
If so it seems like that's a relatively easy fix for Github, just check if the commit is actually contained in userB's fork of the repo.
> If you commit or post content to this repository that violates our Terms of Service, we will delete that content and may suspend access to your account as well
I don't quite see where github is banning certain commit hashes or tree hashes on sight.
I certainly respect the authors desire to not publish their repos to github (and indeed one of the nice things about git is how easy it is to self host). And I can understand not wanting others to fork the repos to github (due to objections about the company itself), but I'm not sure how I feel about adding deliberate boobytraps to repos that seems to be purely punitive.
It's currently posssible to make a PR to the github dmca repo that merges youtube-dl, and so violate their TOS.
When you make that PR, you do not transfer any of youtube-dl to github. Their repo already has those objects.
So all you have done is pushed a repo containing a specific hash, which is now a TOS violation, in that specific case at least. I wrote "Certain commit hashes are rapidly heading toward being illegal" because of that specificity.
> I'm not sure how I feel about adding deliberate boobytraps to repos that seems to be purely punitive.
It also feels like it would violate the license of the repository. I see that one of the author's repos is LGPL [1], which allows redistribution of the source. Restricting how that source is distributed seems to me like a violation of that license, if not in letter at least in spirit.
(Of course, it goes without saying that I'm not a lawyer, which is why I said it feels like a violation.)
I'd find it incredibly childish and needlessly divisive to make your repo impossible to host on a given provider just because you don't like the provider. This does nothing but make the tech community worse for everyone and less accessible. Newcomers shouldn't have to navigate petty politics to get started with tech and people already here shouldn't have to have their time wasted with it when they're just trying to be productive.
The author claims the commit hash is bad links to an article... but that claim seems entirely unsubstantiated? Its not the hash that is bad, its the files that are bad. If you manage to hack git to get the commit without the files (clever trick by him), you don't hit any legal issues.
The hash is probably an excellent heuristic for spotting commits from the original YouTube-DL Git log, and so GitHub probably scans for that when searching for rehosted copies.
Totally down with the potential conflict of interest or implications of GitHub, but it’s not GitHub specifically that’s zealous about DMCA. Any server and any host of that server is going to be subject to it. The only appeal of a smaller or private server is less visibility, but legally it’s the same. DMCA isn’t going anywhere.
Well, GitHub/Microsoft could go on a PR campaign and say "we're not going to honor this RIAA DMCA since we know yt-dl isn't violating the DMCA!" but then GitHub/Microsoft would opening themselves up to a lawsuit against (basically) the entire music industry. The amount of goodwill MSFT loses over this (hopefully isolated) incident has to be worth a few orders of magnitude less than the tens of millions of dollars that would be burned to actually fight the RIAA.
Smaller hosts could get away with not honoring DMCAs since the RIAA likely isn't going to waste resources actually filing a lawsuit, but this yt-dl situation seems like the perfect setup for the RIAA to set a precedent outlawing video/music downloaders if someone were to actually fight them on it (and until then, they can continue to take down video/music downloaders until someone does counter it).
Does github actually have to honor every request that comes in? I thought the youtubes of the world did that because of the volume, but hypothetically, couldn't they do some due diligence and push back on requests they don't think are valid instead of just taking them down and requiring the repository owner to appeal. I'm sure it would be more expensive for them, but it's still a choice.
This sounds like that social engineering challenge one guy did - where he said "I lose if you convince me to post <token> anywhere in my github repo", and someone won it by saying "you should put that challenge on your website" since the website code was in his github repo.
I read the post twice but don't quite get what the author wants to share. Is it some git(hub) trick I don't understand, just a way of saying Github's/Microsoft's monopoly is bad, or something else altogether?
Github complies with the DMCA, recently famously through the DMCA takedown of youtube-dl. The writer, Joey Hess (a relatively well-known OSS programmer, moreutils, for instance), is choosing to interpret this as their banning any repo that hosts the commit hash for youtube-dl (which they banned).
So he is jokingly showing you a way to create a Github poison pill. The joke is that if you include this 'illegal' hash you subject the repo to being DMCA'd out (not really, but it's sort of an 'imagine if'). Which means his repo is now unpublishable on Github (for long anyway).
It's kind of hacker humour in the sort of vein that you'd expect from an old-schooler like him. You sort of have to get the joke to get the joke.
It's a way to sneak in the hash of a commit (the latest youtube-DL commit) in a non-obvious way.
A git repo is fundamentally composed of 3 object types:
* Blob (effectively a file)
* Tree (which refers to collection of Blobs and Trees, effectively a dir)
* Commit (comment/ author info and various other metadata, a Tree, and a previous Commit)
This is what the git tools are used to dealing with. However git submodules changed things. With submodules, a Tree object can also contain a commit object! Together with the .gitmodules metadata file, this allows you to include another git repo inside your repo.
Joey leveraged that ability to add the youtube-dl latest master commit into his repo as a submodule, but deleted the .gitmodules file so that git wouldn't be verbose about it.
And that's how you sneak a commit into a repo.
This additionally leverages GitHub's data model (which is really the git data model), where they just have this huge DB of all the objects of all the repos on GitHub. So effectively by including this commit in a repo on GH, you're making it refer to the the entire (buried) youtube-dl repo, but you need to be sneaky to be able to see it.
They're using the fact the YouTube-DL (sp?) is being removed by GitHub automatically when rehosted in new repositories, due to DMCA complaint (and likely overreach). If someone rehosts his repository onto GitHub, the repository will be taken down, and the rehosting account likely banned.
It's Git performance art/protest art, more or less.
Git has a mechanism called "submodules" which allows one repo to reference another repo. It doesn't actually include any of the content from the second repo: all that's included is the URL (in the file .gitmodules) and a commit hash (in Git's view of the directory structure).
When you clone a Git repository with submodules and you pass an option to git clone, or if you run a git submodule command later, Git will make a nested clone at whatever subdirectory path contains the submodule, and check out that commit. If you make commits in the submodule (and, hopefully, remember to push them), the outer repo will appear to be modified, and you can git add the changes, which will record the new commit hash, but only the hash.
The purpose is to deal with vendoring or sometimes making changes to large repositories owned by someone else without copying the whole source code into your own repository. (It's also occasionally used as a mechanism to split up very large repositories that are owned by the same group/organization, e.g., if you have a test suite or large data files or something, or a shared library used by multiple projects, you might find it helpful to use submodules.) https://github.com/ceph/ceph is an example repo that uses them - "ceph-object-corpus" and "ceph-erasure-code-corpus" at the top level are submodules, and if you click on the .gitmodules file, you'll find that there are other submodules inside src/, too. This is also an example of how GitHub handles submodules also hosted on GitHub - it will link you to the submodule at that commit.
So, GitHub does at least some parsing of submodules. The author is claiming that if you include a banned commit hash as a submodule in the history of your own project, and then promptly delete it, it won't noticeably increase the size of your Git repo, nor will you actually include any of the banned content yourself, but GitHub will potentially prevent that history from being pushed. As a result, you have a Git repo that GitHub would not accept.
Since the author both does not use GitHub and wants to encourage others to do the same, this would be a fairly effective way of forcing that.
(The article doesn't show GitHub actually refusing to accept the history containing this submodule, though - it only shows the submodule having been created, locally. Although maybe the author's argument is simply that pushing it violates the Terms of Service even if it's not blocked by automated means.)
Has anyone read github's terms of service? Github is the one violating them, not the users.
Yes, github is disabling accounts to people who post youtube-dl. No, this is not in compliance with github's ToS.
github says: “Please note that re-posting the exact same content that was the subject of a takedown notice without following the proper process is a violation of GitHub’s DMCA Policy and Terms of Service." Ref: https://torrentfreak.com/github-warns-users-reposting-youtub...
github's DMCA policy says: "One of the best features of GitHub is the ability for users to "fork" one another's repositories. What does that mean? In essence, it means that users can make a copy of a project on GitHub into their own repositories. As the license or the law allows, users can then make changes to that fork to either push back to the main project or just keep as their own variation of a project. Each of these copies is a "fork" of the original repository, which in turn may also be called the "parent" of the fork.
GitHub will not automatically disable forks when disabling a parent repository. This is because forks belong to different users, may have been altered in significant ways, and may be licensed or used in a different way that is protected by the fair-use doctrine. GitHub does not conduct any independent investigation into forks. We expect copyright owners to conduct that investigation and, if they believe that the forks are also infringing, expressly include forks in their takedown notice." Ref: https://docs.github.com/en/free-pro-team@latest/github/site-...
I have very little sympathy for github here. They ought to follow the DMCA process. Per the DMCA process, I can't dispute the takedown of the main Youtube-dl repo, since I'm not a party to the process. If I post youtube-dl to github, it should stay up, pending a DMCA notice from RIAA. If such a notice is sent, I should then have the right to then dispute that.
There in lies the problem. Encouraging people to put their resume on centralized platforms (Be it Facebook, LinkedIn, or Github) is putting your entire career at risk. It's better to have your own domain and self-host, that way you aren't locked in and at risk of loosing your entire professional history if you get banned or a service goes offline.
Why not call Github Microsoft knowing that Microsoft owns Github?
Microsoft has refused to challenge a highly dubious DMCA notice when they have more than enough resources to do so. Isn't Microsoft the party at fault given that their lawyers know the DMCA notice doesn't have a leg to stand on.
probably doing the ol' delayed switcheroo like facebook is doing with whatsapp - maintain product as is for a couple of years to avoid immediate backlash while slowly enmeshing and branding all the tools in the product.
then boom one day it's called azure officeserver or some similar microsoft style name and monetised like crazy with github being a deprecated sub-project left to wither and die.
it is NOT in their interest to defend github in any way. any negative press attached to the product literally doesn't matter to them since it's not associated with microsoft
That said, this is a very clever use of / exposure of the current systems in place. Unintended consequences of bad laws should be highlighted like this more regularly.
joeyh, if you're reading this, some feedback: on Android Chrome, at least, your top navbar overlays your article. Suggest you update your CSS for mobile views
"If you use one of my open-source repository, you can get in trouble, mouhahaha"
It feels like a dick move to me.
But then, I realize that the RIAA bans can now be used as an anti RIAA partnership against GitHub and others, which is... pretty cool.
So why not just write a tiny script that goes through all youtube-dl commits, recommit them with an added a space character to some file. So you will have a clone with different hashes, but the same content.
Since Git's DAG is a hash tree where each commit points to its parent(s) and its own hash is derived from that, then presumably it would suffice to do this only with the first commit.
first if all GitHub is nobodies "résumé" and secondly you can put a git repo anywhere you like there are any number of hosting sites or easy ways to just host it yourself.
There are a LOT of companies just spamming github repo owners with job interviews nowadays
my last job was through an automated email i got about my github resume hashtags. iirc all they look for is certain tags and a regular commit history after which the recruiter takes a brief look at the repo and fires off the email if the criteria is met.
I feel good for Joey because he's really been one of the few who stuck to his principles and kept from hosting on Github even though it probably would have been way easier to do so.
The git-annex project (and I'm sure many of his other ones) have been going strong without Github and I appreciate him for that. He's finally being vindicated I believe (and eventually many of the warners against these free-as-in-beer but not free-as-in-software services have been proven right)
I think we need to work towards the future where DVCS's are _distributed_ and federated. Every time you give anyone centralized control, they will abuse it. I'm not aware of any exceptions.
[+] [-] slimsag|5 years ago|reply
1) Publish arbitrary commits under your https://github.com/my/project URL, e.g. a fake https://github.com/my/project/blob/<faked_commit>/README.md in your project describing how to install it that actually describes installing malware.
2) Publish those commits under your name, with your email address, and GitHub will prominently display it as if you made the commit (most do not use GPG signatures, and most do not know to look for "Verified" anyway)
It seemed only a matter of time before this behavior got abused for something (anti-DMCA action is perhaps the best outcome of this situation I can imagine..)
[+] [-] eslaught|5 years ago|reply
If so it seems like that's a relatively easy fix for Github, just check if the commit is actually contained in userB's fork of the repo.
[+] [-] eminence32|5 years ago|reply
> If you commit or post content to this repository that violates our Terms of Service, we will delete that content and may suspend access to your account as well
My reading of this is that github wants to stem the flood of garbage pull requests to their [dmca](https://github.com/github/dmca/pulls?q=is%3Apr+is%3Aclosed) repo, which I guess is probably reasonable.
I don't quite see where github is banning certain commit hashes or tree hashes on sight.
I certainly respect the authors desire to not publish their repos to github (and indeed one of the nice things about git is how easy it is to self host). And I can understand not wanting others to fork the repos to github (due to objections about the company itself), but I'm not sure how I feel about adding deliberate boobytraps to repos that seems to be purely punitive.
[+] [-] joeyh|5 years ago|reply
When you make that PR, you do not transfer any of youtube-dl to github. Their repo already has those objects.
So all you have done is pushed a repo containing a specific hash, which is now a TOS violation, in that specific case at least. I wrote "Certain commit hashes are rapidly heading toward being illegal" because of that specificity.
[+] [-] sillysaurusx|5 years ago|reply
https://github.com/github/dmca/pull/8210
https://github.com/github/dmca/pull/8207
https://github.com/github/dmca/pull/8203
https://github.com/github/dmca/pull/8202
DMCAs would be much better served OwO'ified.
[+] [-] d3nj4l|5 years ago|reply
It also feels like it would violate the license of the repository. I see that one of the author's repos is LGPL [1], which allows redistribution of the source. Restricting how that source is distributed seems to me like a violation of that license, if not in letter at least in spirit.
(Of course, it goes without saying that I'm not a lawyer, which is why I said it feels like a violation.)
1: https://git.joeyh.name/index.cgi/haskell-mountpoints.git/tre...
[+] [-] WhyNotHugo|5 years ago|reply
"Hey, GitHub has stupid practices which are just terribly harmful for the community. Also, here's my source code."
Someone comes along and pushes that same repo to GitHub. GitHub bans them. This is exactly what you'd been warned of!
[+] [-] jimmaswell|5 years ago|reply
[+] [-] dgrin91|5 years ago|reply
Am I missing something?
[+] [-] Fellshard|5 years ago|reply
[+] [-] bloaf|5 years ago|reply
[+] [-] erling|5 years ago|reply
[+] [-] judge2020|5 years ago|reply
Smaller hosts could get away with not honoring DMCAs since the RIAA likely isn't going to waste resources actually filing a lawsuit, but this yt-dl situation seems like the perfect setup for the RIAA to set a precedent outlawing video/music downloaders if someone were to actually fight them on it (and until then, they can continue to take down video/music downloaders until someone does counter it).
[+] [-] mayneack|5 years ago|reply
[+] [-] feanaro|5 years ago|reply
[+] [-] unknown|5 years ago|reply
[deleted]
[+] [-] Qwertious|5 years ago|reply
[+] [-] Aachen|5 years ago|reply
[+] [-] renewiltord|5 years ago|reply
So he is jokingly showing you a way to create a Github poison pill. The joke is that if you include this 'illegal' hash you subject the repo to being DMCA'd out (not really, but it's sort of an 'imagine if'). Which means his repo is now unpublishable on Github (for long anyway).
It's kind of hacker humour in the sort of vein that you'd expect from an old-schooler like him. You sort of have to get the joke to get the joke.
[+] [-] AceJohnny2|5 years ago|reply
A git repo is fundamentally composed of 3 object types:
* Blob (effectively a file)
* Tree (which refers to collection of Blobs and Trees, effectively a dir)
* Commit (comment/ author info and various other metadata, a Tree, and a previous Commit)
This is what the git tools are used to dealing with. However git submodules changed things. With submodules, a Tree object can also contain a commit object! Together with the .gitmodules metadata file, this allows you to include another git repo inside your repo.
Joey leveraged that ability to add the youtube-dl latest master commit into his repo as a submodule, but deleted the .gitmodules file so that git wouldn't be verbose about it.
And that's how you sneak a commit into a repo.
This additionally leverages GitHub's data model (which is really the git data model), where they just have this huge DB of all the objects of all the repos on GitHub. So effectively by including this commit in a repo on GH, you're making it refer to the the entire (buried) youtube-dl repo, but you need to be sneaky to be able to see it.
[+] [-] Fellshard|5 years ago|reply
[+] [-] geofft|5 years ago|reply
Git has a mechanism called "submodules" which allows one repo to reference another repo. It doesn't actually include any of the content from the second repo: all that's included is the URL (in the file .gitmodules) and a commit hash (in Git's view of the directory structure).
When you clone a Git repository with submodules and you pass an option to git clone, or if you run a git submodule command later, Git will make a nested clone at whatever subdirectory path contains the submodule, and check out that commit. If you make commits in the submodule (and, hopefully, remember to push them), the outer repo will appear to be modified, and you can git add the changes, which will record the new commit hash, but only the hash.
The purpose is to deal with vendoring or sometimes making changes to large repositories owned by someone else without copying the whole source code into your own repository. (It's also occasionally used as a mechanism to split up very large repositories that are owned by the same group/organization, e.g., if you have a test suite or large data files or something, or a shared library used by multiple projects, you might find it helpful to use submodules.) https://github.com/ceph/ceph is an example repo that uses them - "ceph-object-corpus" and "ceph-erasure-code-corpus" at the top level are submodules, and if you click on the .gitmodules file, you'll find that there are other submodules inside src/, too. This is also an example of how GitHub handles submodules also hosted on GitHub - it will link you to the submodule at that commit.
So, GitHub does at least some parsing of submodules. The author is claiming that if you include a banned commit hash as a submodule in the history of your own project, and then promptly delete it, it won't noticeably increase the size of your Git repo, nor will you actually include any of the banned content yourself, but GitHub will potentially prevent that history from being pushed. As a result, you have a Git repo that GitHub would not accept.
Since the author both does not use GitHub and wants to encourage others to do the same, this would be a fairly effective way of forcing that.
(The article doesn't show GitHub actually refusing to accept the history containing this submodule, though - it only shows the submodule having been created, locally. Although maybe the author's argument is simply that pushing it violates the Terms of Service even if it's not blocked by automated means.)
[+] [-] BoorishBears|5 years ago|reply
[+] [-] woofie11|5 years ago|reply
Yes, github is disabling accounts to people who post youtube-dl. No, this is not in compliance with github's ToS.
github says: “Please note that re-posting the exact same content that was the subject of a takedown notice without following the proper process is a violation of GitHub’s DMCA Policy and Terms of Service." Ref: https://torrentfreak.com/github-warns-users-reposting-youtub...
github's DMCA policy says: "One of the best features of GitHub is the ability for users to "fork" one another's repositories. What does that mean? In essence, it means that users can make a copy of a project on GitHub into their own repositories. As the license or the law allows, users can then make changes to that fork to either push back to the main project or just keep as their own variation of a project. Each of these copies is a "fork" of the original repository, which in turn may also be called the "parent" of the fork.
GitHub will not automatically disable forks when disabling a parent repository. This is because forks belong to different users, may have been altered in significant ways, and may be licensed or used in a different way that is protected by the fair-use doctrine. GitHub does not conduct any independent investigation into forks. We expect copyright owners to conduct that investigation and, if they believe that the forks are also infringing, expressly include forks in their takedown notice." Ref: https://docs.github.com/en/free-pro-team@latest/github/site-...
youtube-dl doesn't fall under any of the restrictions in github's ToS either. ref: https://docs.github.com/en/free-pro-team@latest/github/site-...
I have very little sympathy for github here. They ought to follow the DMCA process. Per the DMCA process, I can't dispute the takedown of the main Youtube-dl repo, since I'm not a party to the process. If I post youtube-dl to github, it should stay up, pending a DMCA notice from RIAA. If such a notice is sent, I should then have the right to then dispute that.
[+] [-] blitblitblit|5 years ago|reply
There in lies the problem. Encouraging people to put their resume on centralized platforms (Be it Facebook, LinkedIn, or Github) is putting your entire career at risk. It's better to have your own domain and self-host, that way you aren't locked in and at risk of loosing your entire professional history if you get banned or a service goes offline.
[+] [-] monksy|5 years ago|reply
Create an empty repo else where.
With your environment do a: git remote rm origin; git remote add origin <new environment> ; git push -u origin master
You've filled up the new repo with what you have, history and all.
[+] [-] vfclists|5 years ago|reply
Microsoft has refused to challenge a highly dubious DMCA notice when they have more than enough resources to do so. Isn't Microsoft the party at fault given that their lawyers know the DMCA notice doesn't have a leg to stand on.
[+] [-] kkarakk|5 years ago|reply
then boom one day it's called azure officeserver or some similar microsoft style name and monetised like crazy with github being a deprecated sub-project left to wither and die.
it is NOT in their interest to defend github in any way. any negative press attached to the product literally doesn't matter to them since it's not associated with microsoft
[+] [-] Fellshard|5 years ago|reply
That said, this is a very clever use of / exposure of the current systems in place. Unintended consequences of bad laws should be highlighted like this more regularly.
[+] [-] rendall|5 years ago|reply
[+] [-] tirz|5 years ago|reply
"If you use one of my open-source repository, you can get in trouble, mouhahaha"
It feels like a dick move to me. But then, I realize that the RIAA bans can now be used as an anti RIAA partnership against GitHub and others, which is... pretty cool.
Definitively a feature that we need \o/ !
[+] [-] lk23jlkoij|5 years ago|reply
[+] [-] pwdisswordfish0|5 years ago|reply
[+] [-] booleandilemma|5 years ago|reply
This is why aliens don't visit us.
[+] [-] joshenders|5 years ago|reply
[+] [-] jbb67|5 years ago|reply
I mean really. what a stupid article
[+] [-] kkarakk|5 years ago|reply
my last job was through an automated email i got about my github resume hashtags. iirc all they look for is certain tags and a regular commit history after which the recruiter takes a brief look at the repo and fires off the email if the criteria is met.
seems enough to call github a resume to me.
[+] [-] timmit|5 years ago|reply
[+] [-] mlindner|5 years ago|reply
[+] [-] unknown|5 years ago|reply
[deleted]
[+] [-] notsureaboutpg|5 years ago|reply
The git-annex project (and I'm sure many of his other ones) have been going strong without Github and I appreciate him for that. He's finally being vindicated I believe (and eventually many of the warners against these free-as-in-beer but not free-as-in-software services have been proven right)
[+] [-] m0zg|5 years ago|reply