top | item 9789960

Show HN: Effort to clone unmaintained SourceForge projects to GitHub

146 points| hydragit | 10 years ago |a-sf-mirror.github.io | reply

82 comments

order
[+] pavlov|10 years ago|reply
Why Github? Copying from one commercial provider to another doesn't solve the fundamental problem. Using git helps, but most of those old repos will never get cloned.

In 10 years time, Github may be the tired old service that gets acquired by a hedge fund that decides to monetize their repos. Such things are part of the corporate lifecycle.

[+] scrollaway|10 years ago|reply
> In 10 years time, Github may be the tired old service that gets acquired by a hedge fund that decides to monetize their repos. Such things are part of the corporate lifecycle.

So fix it in 10 years. Git makes that easy.

Point me to a good alternative to github that matches all your ideals. A free alternative to github - free as in beer, unless you're willing to fund this effort yourself, of course?

We migrated one of our projects from Sourceforge to Github, and all the stallmen came out of their rock to tell us how Github is evil, how Savannah is the only true alternative, pah. "Absolute freedom of software" is nice but it's not the only requirement. Savannah has the usability of a rusty wrench and will probably shut down without warning long before Github "turns evil".

Some people are just so far detached from reality when suggesting that stuff isn't perfect. Github is pretty damn amazing. If you want to use foss alternative like Gitlab, more power to you, but that doesn't make them ideal in every situation.

[+] goldfeld|10 years ago|reply
Ten years from now for all we know we could all have so much cheap storage and bandwidth and good, open p2p software that all coders get to archive their own full copy of github's repos. So the focus should be on getting today's job done now.
[+] Klathmon|10 years ago|reply
Do you have any other suggestions? Hosting these repos on donated/personal machines is (IMO) significantly less likely to stand the test of time.

At least with a commercial entity there is a bit more "trust" involved that they won't disappear out of the blue one day. And if the time comes that Github starts to collapse, the process can be repeated.

Just because something isn't permanent doesn't mean it's pointless.

[+] hydragit|10 years ago|reply
For now, Github is not ad-ridden as SourceForge is. Github is monetizing some repositories: https://github.com/pricing I don't know if they're sustainable, but from my naive point of view, closed-source projects on github pay for the hosting of open-source projects on github.
[+] duskwuff|10 years ago|reply
Github may be a commercial provider, but at least it's a commercial provider based on an open protocol. If things do start going wrong at Github, escape is a "git clone" and a "git push" away.
[+] jlarocco|10 years ago|reply
Who cares?

If that happens, the projects can be re-hosted somewhere else. For the time being Github is the best option.

Sometimes the hypothetical situations free software people bring up hurt their cause more than they help.

[+] readme|10 years ago|reply
I doubt that. Github is a paid service and has several enterprise level clients. If it's ever going to flip flop, there will be quite a few warnings before hand.
[+] hippich|10 years ago|reply
somebody have to pay for git hosting. who will be better alternative in your opinion?
[+] usaphp|10 years ago|reply
I like people like you , always slashing ideas and not suggesting your own...
[+] TazeTSchnitzel|10 years ago|reply
Why aren't you mirroring the binaries? These are vital for people in the future who do not have the time to set up a build environment for software from a decade ago.

I'd also echo the concerns of others about GitHub.

Proper archivists should do for SourceForge what they did for other projects. Archive Team, maybe? Looks like they have a wiki page: http://www.archiveteam.org/index.php?title=SourceForge

[+] bentpins|10 years ago|reply
This was in progress, 830GB was downloaded before a Sourceforge guy popped onto the IRC and said he's ok with the archiving, but that the robots.txt should be respected. This would put things at a practical standstill. So the downloading was paused, I'm not really sure what's happened in the week since.

Right now Xfire's videos, several URL shortners' links, and Toshiba Support material are being archived. If you have spare cycles and bandwidth, and want to contribute, running an instance of the "ArchiveTeam Warrior" is pretty easy through docker or a VM. http://archiveteam.org/index.php?title=Warrior

[+] hydragit|10 years ago|reply
Regarding binaries, I know these could be useful and I'd like to provide them, but I'm afraid some "not (yet?) very popular mirroring project" can't show how we can trust it regarding binaries. After all, a known site like SF is untrustable, so why would an unknown site would be more?
[+] Osmium|10 years ago|reply
Honestly, this is a serious issue for my field. There are so many obscure academic binaries hosted on SF... I hope someone manages to mirror them. [The fact that a lot of the scientific community is so backwards in adopting modern coding standards is another conversation for another day.]
[+] estrabd|10 years ago|reply
Sourceforge is on the radar here, but maybe it's time to step it up.

http://www.archiveteam.org/index.php?title=Fire_Drill

Update: seems others have linked to archiveteam.org, so maybe that's the best route. Is the OP part of the AT effort or do they know about each other? Maybe they should.

[+] lcswi|10 years ago|reply
Nice! But in my opinion better help archiveteam with their efforts!
[+] hydragit|10 years ago|reply
I confess my ignorance regarding archive.org's various collections. There seem to be a lot of them, which one are you referring to?
[+] coliveira|10 years ago|reply
Agreed. This article sounds more like an advertisement for GH. Also, why not using other platforms like bitbucket? Centralizing everything at GH is the worst scenario for open source.
[+] jmkni|10 years ago|reply
Nice.

I agree with what the others are saying, there's a lot of source code for solving obscure programs that is only on Sourceforge.

One example I found recently is a program called QLumEdit. I recently had to figure out how to work with EuLumdat files, and if it wasn't for the source code for this program on Sourceforge I would have been completely stumped (well not quite, but it would have taken me ages).

If SF goes down the toilet, a lot of knowledge goes with it so this is awesome to see!

If anybody is interested, I was converting this code from C++ to .net, my horrible hacky unrefactored effort is here - https://github.com/bumblebeeman/eulum.net

I am planning to make this code nicer, and develop it into a WPF app when I have time!

I am getting pretty close too, here is my .net generated version of the images this program produces: http://imgur.com/PCmpnJ2

[+] ksherlock|10 years ago|reply
That's great. I started doing that myself (my own git server, not github) for some projects I care about. This effort seems include a very narrow list, though.

For CVS, though, I suspect cvs-fast-export [1] will do a better job than git-cvsimport.

1 http://www.catb.org/esr/cvs-fast-export/

[+] jaytaylor|10 years ago|reply
What about creating a torrent containing all these unmaintained SF projects (with binary downloads included)?

This would dramatically increase the odds that the content is never lost.

[+] nextweek2|10 years ago|reply
The problem with torrents is the lack of incremental update support. If the base torrent gets updated it gets a new hash identifier. How do you know its been changed to ensure you get the latest version. When you do the swarm effectively gets diluted because some are on the new architecture and some are on older versions.
[+] frik|10 years ago|reply
> Currently, for each cloned project, we mirror its CVS repository and its website.

Please add "SVN" (Subversion)

[+] coliveira|10 years ago|reply
This seems much more like a temporary fix, not really a solution. A few years from now GitHub can do the same thing that sf did. This after all seems to be the fate of commercial companies that explore open source, once they start to lose users to new competitors.
[+] egsec|10 years ago|reply
Note 1: Moving things to GitHub or elsewhere does not remove them from SourceForge. So SF can continue to host and enjoy links on unmaintained websites, search engines etc.

Note 2: If their business model is offering popular binaries and source, they can just copy these from other sites and repackage them. Open source software allows you to do this. If no one else is interesting in bundling and monetizing, then they can buy traffic and still succeed.

Note 3: Remember that academy award winning movie from 1943? Not so great it today's light. While perhaps one of the goals of the Internet and cheap storage is to keep a copy of everything, and its often better to not re-invent the wheel, if something fall by the wayside, and its needed, it will be created.

Note 4: There are plenty of websites which catalog useful abandonware, that someone had to find a physical disk drive from. If the software has value, chances are someone will eventually repost it somewhere without a massive organized effort.

----

There is clearly value in moving over some project to GitHub or elsewhere, but if some things are not migrated or moved life will go on.

[+] GFK_of_xmaspast|10 years ago|reply
Per the historical record, that "academy award winning movie from 1943" was 'Casablanca'.