top | item 5581791

Finding Unlicensed Repos on Github

22 points| niggler | 13 years ago |aniggler.tumblr.com | reply

27 comments

order
[+] jcr|13 years ago|reply
Your script is a great step in the right direction, but unfortunately, having a file specifically named "LICENSE" is not universal. It's almost as common to see a file named "COPYING" and plenty of other alternatives are used.

More importantly, any license stated in the actual source files supersedes the additional "LICENSE" file. There is also the headaches of binary files like images and similar where verifying the license is just painful.

I wish it was as easy as running a script. It would be a good idea if github enforced some convention to make sure the free repo's they provide for open source really are licensed under an OSI approved license.

[+] niggler|13 years ago|reply
"It would be a good idea if github enforced some convention to make sure the free repo's they provide for open source really are licensed under an OSI approved license."

I wholeheartedly agree. Every once in a while you stumble upon a landmine: https://github.com/stephen-hardy/xlsx.js/issues/8

[+] wereHamster|13 years ago|reply
I use "UNLICENSE" for projects licensed under that license, as recommended.

From http://unlicense.org/: You would traditionally put the above statement into a file named COPYING or LICENSE. However, to explicitly distance yourself from the whole concept of copyright licensing, we recommend that you put your unlicensing statement in a file named UNLICENSE. Doing so also means that your project can more easily be found on e.g. GitHub or Bitbucket, enabling others to reuse your code in their own unencumbered public domain projects.

[+] geraldcombs|13 years ago|reply
The license specified in a LICENSE or COPYING file may not match the rest of the source code. Debian's devscripts package has a utility called "licensecheck" that will scan a directory and report the license used by each file. Chromium has a wrapper called "checklicenses.py" that checks the output of licensecheck against a list of incompatible licenses. If someone were to take the next step of letting you point checklicenses.py at a remote repository the world would be a better place.
[+] mh-|13 years ago|reply
is checklicenses.py suitably licensed for that? :)
[+] graue|13 years ago|reply
This is a really important problem. I can't count the number of times I have seen something promising on GitHub only to notice on closer inspection that it has no license attached at all. Or, almost as bad, it just says "License: MIT" at the end of the readme with no link or actual license text, which I doubt (IANAL) is legally meaningful.

Sometimes I respond by filing an issue, titled "License?", where I gently suggest applying the MIT or Apache license or something similar. Usually that's what the authors intended, they just forgot, and are receptive to adding it. But I still kind of hate to be That Guy. Even though it's important, it feels like I am trying to correct someone's grammar. I wish more people would step up and be That Person so I wouldn't have to be.

I do generally use MIT-LICENSE.txt on my repos (and some other variations on older ones) so I agree that a slightly more general script would be nice if we are to solve this programmatically.

[+] niggler|13 years ago|reply
"But I still kind of hate to be That Guy. Even though it's important, it feels like I am trying to correct someone's grammar. I wish more people would step up and be That Person so I wouldn't have to be."

I can assure you that this is probably worse than anything you've done: https://github.com/stephen-hardy/DOCX.js/issues/1

" Many people assume that code on github is open source, but that is far from the truth. In fact, the Microsoft Office Extensible File License exemplifies Open Source Trolling: each clause an insult to the diligent readers' intellect.

The real problem with all code under the license is that it muddies the water. Microsoft could use your code as a reason to take legal action against others who genuinely try to innovate and use DOCX as a data format. You hurt the community far more than you help by releasing pseudopen source code. Ever think that those lawyers might want you to do this so that they can go after others later on?

@stephen-hardy I think you are a reasonable person, and I might be niggling a bit, but neither of us want to see innovation stifled by myriads of lawsuits because one person's effort to release code created a miasma around a beloved software product. Let this be a clarion call, and please share with your coworkers and superiors: unless the code can be released in a proper open source format, it's better that you don't release it. "

[+] dottrap|13 years ago|reply
I agree. And what's worse is often it is hard to find a way to contact the author to ask about the license, or they flat out ignore you.
[+] orta|13 years ago|reply
We have this issue in the obj-c community for Cocoapods, one of the the best choices we've made lately is to refuse libraries that do not have a license. Definitely wish that github would make you put some kind of license on a repo if you are going to make it public.
[+] niggler|13 years ago|reply
Is that an automated process or do people manually inspect proposed libraries?
[+] mkelley|13 years ago|reply
I agree - GitHub being the great repository it is for Open Source projects, it really wouldn't be a bad idea to have some sort of reminder to users when creating a repo to add a file detailing the license the code is being released under.
[+] CodeCube|13 years ago|reply
I really wish github had an automated tool to add a license file (by easily choosing from a list of existing licenses, of course). I always neglect to include a license on my projects ... and then procrastinate doing it afterwards.
[+] niggler|13 years ago|reply
" automated tool to add a license file (by easily choosing from a list of existing licenses, of course)"

I see the merits of that (it would be nice to see that option in the "Add Repository" page), but I worry that licenses would then be set by autopilot (without actually considering whether the license is applicable), creating even more problems.

[+] Zolomon|13 years ago|reply
Cool stuff! But you forgot to license your code. ;)
[+] niggler|13 years ago|reply
Added :) See, even for small things it's easy to forget. Hopefully someone makes a script to remind you for licensing gists.
[+] gsiener|13 years ago|reply
Quick plug for License Audit: http://licenseaudit.pivotallabs.com

We developed that at Pivotal Labs since we need to pay attention to licenses while working w/ our clients. Just connect your repos, add licenses to your whitelist, and get updates if you're not in compliance. Feedback welcome!

[+] rubbingalcohol|13 years ago|reply
This is a great post and a good crack at addressing the problem. Your post raising awareness of the importance of clear licensing is probably a more valuable contribution than your script itself. I've lost track of how many times I had to pass up on a good project on Github because of an unclear licensing situation.
[+] niggler|13 years ago|reply
"I've lost track of how many times I had to pass up on a good project on Github because of an unclear licensing situation."

A situation many of us have experienced. Until today, I thought I was alone in my concerns regarding licensing.

" your script itself"

It's a gist for a reason. If I truly thought it was the best starting point for a proper "license niggler", I would have made it a proper repo :) This fits my particular licensing scheme (only using a LICENSE file).

[+] NathanKP|13 years ago|reply
You should also check for the existence of a package.json file at the root, which is the Node.js style. These files can contain licensing information for Node modules and projects, and personally I think this is a better pattern than making a separate LICENSE file.
[+] niggler|13 years ago|reply
I don't disagree, but that's only applicable for node.js code. There's no tradition of using package.json for fortran or C.

Although I do like the overall theme of developing a language-agnostic way of indicating licenses (because, as also mentioned by jcr, checking for LICENSE doesn't cut it)

[+] nevir|13 years ago|reply
Is that legally binding? Many licenses have an explicit clause that they are to be duplicated with the source.

Here's the clause from the MIT License, for example:

> The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

[+] nevir|13 years ago|reply
You should check for any root level file with all caps LICENSE in it. Other common examples:

LICENSE.txt LICENSE.md MIT-LICENSE MIT-LICENSE.txt etc.

[+] niggler|13 years ago|reply
A general tool would do that :) I try to use LICENSE for my projects.