This seems a bit ridiculous. 900TB costs $22000 in hard drives (assuming $100/4TB HDD), without any redundancy. I wonder what their storage solution is like.
Their website seems to be down: I'm just wondering, are they downloading everything accessible through the player, or just songs marked "Download"?
Even given that they could restrict themselves to songs marked okay to download, how much of that will be DJ mixes containing copyrighted songs?
I'm just wondering because Soundcloud actually has support to specify your copyright terms, which does not default to "everyone can download this", so it's an interesting case..
The website works now, it just says "selective content"
I'm sure SC serves up a lot of content per day, but how do you think they will react by suddenly having someone download all of their 900 TB or whatever it is in one day? How much will Archiveteam be contributing to SC's downfall by suddenly causing them a huge unexpected bill?
As someone who really wants the SC content backed up properly, I nonetheless see how this raises some interesting legal issues.
I think in most cases ArchiveTeam's actions have been copyright-infringement on some level. They just don't care and find that keeping user content safe from unrepentant deletion to be more important.
If you've ever seen a Jason Scott talk, he isn't the sort of guy who gives a shit if you DMCA him while he's sucking up all of your bandwidth archiving your content two days before your servers shut down.
> I'm sure SC serves up a lot of content per day, but how do you think they will react by suddenly having someone download all of their 900 TB or whatever it is in one day? How much will Archiveteam be contributing to SC's downfall by suddenly causing them a huge unexpected bill?
Obviously, there's more than just the bandwidth cost, but assuming they pay $0.02/GB for CDN traffic, we're talking about $18k. It's not nothing, but I doubt it'd change their outlook in any meaningful way.
I should add that the ArchiveTeam doesn't download everything in a day, but rather uses a distributed crawler (ArchiveTeam Warrior) run by volunteers. They rate-limit the crawling rate as needed in order not to overwhelm the site being archived.
In response to your concerns; I think ArchiveTeam simply doesn't care. They are very firm in their convictions, and they don't exactly listen to requests to not archive things.[0]
If you're curious, here[1] is the initial discussion that AT had. People bring up copyright concerns.
Hmm, I've been playing with IPFS lately, and just had an idea: Since IPFS is perfect for archival, Archiveteam could put their files on IPFS, and users could help out by pinning stuff on their local nodes. For example, I could ask their website to give me a 10 GB list of files to pin (if I wanted to "donate" 10 GB to them), and I'd keep them available.
The only problem is that I don't know whether IPFS has any way to gauge availability, so I'm not sure if the team could tell which files were only hosted by a few people.
ArchiveTeam is putting the files in the Internet Archive archives. While IPFS is great, I'm not sure I agree it's good for archival because it depends on the availability of the IPFS network. The Internet Archive does work to make sure that there are sufficent backups to mean they can recreate their archive.
I don't believe this is true. IPFS doesn't have any built in way of easily distributing parts of an archive, doesn't support (as far as I know) any form of erasure coding, making overhead quite high and requires that you use its own weird block + hash scheme for integrity.
It's also very immature, we don't know if IPFS will be around in 10 years and we don't know what kinds of bugs it will have.
IPFS is a great tool and it has its uses but I don't think archiving is one of them yet.
Why doesn't SoundCloud want my money ? There are no ads, and no paid plans for listeners. A lot of songs in my library disappear once the artist gets big and wants some cash from iTunes. I would have no problem paying for access to these songs but it's just not possible. I would also like to buy some band posters / t-shirts, vinyls, cds, show tickets etc, not possible either. It's like they are actively avoiding revenue streams. I don't get it.
Wondering this too. Why don't they sell tracks directly? I'm always wondering what various tracks are in the house mixes I listen to and would love to add them to my cart right there.
It's a node tool built a few years ago to download the playlists of users through your command line. Might be helpful for a situation where you'd like to back up your own playlists.
You'll need to get an API key - no sure how feasible that is at this moment.
Archiveteam seems like a really cool project, what I was wondering (and couldn't find in the FAQ) is who is paying for all the storage? Is it donated by big tech companies?
The more I think about this, the more convinced I am that Archiveteam are actually detrimental (in the long run) to the well-being of the Internet.
Don't get me wrong, I appreciate the work they do, and without them lots of content would simply disappear. But solving this problem should be at the core of the protocol itself (Xanadu, anyone?[1]), not depending on the resources and goodwill of a single team.
Just like IPv6, I don't think the problem will be solved as long as there's a patch that somehow works.
Apt username. Yes, you're wrong. See, we don't have Xanadu and we don't have an in-protocol solution so somebody needs to do the dirty work here and AT stepped up and does an absolutely incredible job.
So as long as there's a patch that somehow works we have at least a solution. If that patch wouldn't be there we'd have nothing.
>But solving this problem should be at the core of the protocol itself (Xanadu, anyone?[1]), not depending on the resources and goodwill of a single team.
I disagree. Expecting networks and software to act as an immutable medium is a fool's errand. The internet was never meant to provide a permanent cultural archive, and it's not actually a "problem" that it doesn't, because that's not what it's for.
Backups should be a service, not a feature of the network or the protocol itself. I think that what Archive Team does represents the correct way to approach the issue.
I downvoted this. Maybe we will get to this point one day, but in the meantime I think we should all appreciate the amount of conservation work they do for the future generations. Exactly how NGOs help feed people hoping that one day the system will be fixed.
I'm interested in people's opinions on the legality of this. They mention "Archive Team considers the SoundCloud service in danger and, as it hosts a lot of original content, finds it important to prepare to save it selectively (a full grab would be too big and would raise concerns of mass copyright infringement).", but how is downloading any portion of artist's music not copyright infringement?
I've written my own Soundcloud offline audio player, but didn't distribute it because it was against their TOS.
> but how is downloading any portion of artist's music not copyright infringement?
I had the same issue with backing up Geocities when it went down. I figured better safe than sorry, established a very easy deletion procedure for the copyright holders and have received only a very small number of nastygrams compared to an absolutely enormous number of messages from people that were happy their content got saved.
So at a guess, yes it is copyright infringement, no, it will not lead to trouble because most people are able to recognize a good faith effort when they see it.
To me that means "stuff that's most likely fine to preserve and most likely isn't found on other places".
Also according to Sound Cloud's ToS by using them you are granting all users rights to "to use, copy, listen to offline, repost, transmit or otherwise distribute" your content. So if Archive Team downloads everything they can (that does not in itself violate copyright (i.e. they are not Metallica songs)) there should be no copyright issues.
It's really not much different from the Internet Archive as a whole. The vast majority of the content that the Internet Archive stores is copyrighted and not under CC, etc. The Archive mostly gets away with it because it is/was all public facing material and they bend over backwards (with retroactive robots.txt, etc.) to remove anything that the owner objects to.
Copyright only covers distribution. Merely having a copy of something is fine. When you download something from Soundcloud, that's them distributing a copy to you; presumably they have permission to do that. If you then distribute copies of it to other people, then it might be a copyright violation (there are exceptions, and varying laws, and, etc). Just holding on to something that was distributed to you isn't a copyright violation.
I don't know whether SoundCloud does not have a huge, Wikipediaesque donation banner on every page illustrating the severity of their financial situation; it's embarassing for them, but think of it like this: the artists have a right to know that the platform that manages their life's work needs their support.
"I don't know whether SoundCloud does not have a huge, Wikipediaesque donation banner on every page illustrating the severity of their financial situation"
Maybe because they are profit-oriented company that raised hundreds of million USD in venture capital. Asking for donations would be kinda unethical, and the founders would know that.
However, if SoundCloud somehow would be transformed into a non-profit organization...
If these kind of entities actually want to preserve resources, they shouldn't be generating a petabyte of bandwidth charges. Contact soundcloud and come to an agreement that will be responsible.
Cool!! Based on the Dat project (http://datproject.org). These guys might become a good alternative to IPFS, especially if they reposition themselves as a message-based system, instead of just file sharing.
Just from a legal standpoint: isn't „...considers the SoundCloud service in danger...“ slander? Especially since the Internet Archive isn't a nobody (with maybe some inside information?).
Just remembered cases like Deutsche Bank vs Leo Kirch which are legal nightmares.
[+] [-] rnhmjoj|8 years ago|reply
[+] [-] ipsum2|8 years ago|reply
[+] [-] agumonkey|8 years ago|reply
[+] [-] spurlock|8 years ago|reply
[deleted]
[+] [-] radarsat1|8 years ago|reply
Even given that they could restrict themselves to songs marked okay to download, how much of that will be DJ mixes containing copyrighted songs?
I'm just wondering because Soundcloud actually has support to specify your copyright terms, which does not default to "everyone can download this", so it's an interesting case..
The website works now, it just says "selective content"
I'm sure SC serves up a lot of content per day, but how do you think they will react by suddenly having someone download all of their 900 TB or whatever it is in one day? How much will Archiveteam be contributing to SC's downfall by suddenly causing them a huge unexpected bill?
As someone who really wants the SC content backed up properly, I nonetheless see how this raises some interesting legal issues.
[+] [-] cyphar|8 years ago|reply
If you've ever seen a Jason Scott talk, he isn't the sort of guy who gives a shit if you DMCA him while he's sucking up all of your bandwidth archiving your content two days before your servers shut down.
[+] [-] pfg|8 years ago|reply
Obviously, there's more than just the bandwidth cost, but assuming they pay $0.02/GB for CDN traffic, we're talking about $18k. It's not nothing, but I doubt it'd change their outlook in any meaningful way.
I should add that the ArchiveTeam doesn't download everything in a day, but rather uses a distributed crawler (ArchiveTeam Warrior) run by volunteers. They rate-limit the crawling rate as needed in order not to overwhelm the site being archived.
[+] [-] Qub3d|8 years ago|reply
http://webcache.googleusercontent.com/search?q=cache:eCl5VSB...
In response to your concerns; I think ArchiveTeam simply doesn't care. They are very firm in their convictions, and they don't exactly listen to requests to not archive things.[0]
If you're curious, here[1] is the initial discussion that AT had. People bring up copyright concerns.
[0]:http://webcache.googleusercontent.com/search?q=cache:fmhU2zS...
[1]:http://archive.fart.website/bin/irclogger_log/archiveteam?da...
[+] [-] StavrosK|8 years ago|reply
The only problem is that I don't know whether IPFS has any way to gauge availability, so I'm not sure if the team could tell which files were only hosted by a few people.
[+] [-] rakoo|8 years ago|reply
You can participate in the effort of course. Have a few hundreds of GB and a good connection ? Head over to http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK/g... and follow the steps !
[+] [-] cyphar|8 years ago|reply
[+] [-] Veratyr|8 years ago|reply
I don't believe this is true. IPFS doesn't have any built in way of easily distributing parts of an archive, doesn't support (as far as I know) any form of erasure coding, making overhead quite high and requires that you use its own weird block + hash scheme for integrity.
It's also very immature, we don't know if IPFS will be around in 10 years and we don't know what kinds of bugs it will have.
IPFS is a great tool and it has its uses but I don't think archiving is one of them yet.
[+] [-] unicornporn|8 years ago|reply
[1] https://www.lockss.org/
[+] [-] cetra3|8 years ago|reply
https://github.com/cetra3/rustcloud
It would be an absolute shame if Soundcloud disappears. There has been so much music I have discovered on this service.
[+] [-] fvdessen|8 years ago|reply
[+] [-] skymt|8 years ago|reply
https://soundcloud.com/go
[+] [-] recursive|8 years ago|reply
[+] [-] kpennell|8 years ago|reply
[+] [-] Raphmedia|8 years ago|reply
There are ads. Audio ads between songs.
[+] [-] xtian|8 years ago|reply
[+] [-] kevinmannix|8 years ago|reply
It's a node tool built a few years ago to download the playlists of users through your command line. Might be helpful for a situation where you'd like to back up your own playlists.
You'll need to get an API key - no sure how feasible that is at this moment.
[+] [-] naturalgradient|8 years ago|reply
[+] [-] skeletonjelly|8 years ago|reply
> The website is temporarily unable to service your request as it exceeded resource limit. Please try again later.
I suppose I prefer an archive over the blog being unavailable
[+] [-] probably_wrong|8 years ago|reply
Don't get me wrong, I appreciate the work they do, and without them lots of content would simply disappear. But solving this problem should be at the core of the protocol itself (Xanadu, anyone?[1]), not depending on the resources and goodwill of a single team.
Just like IPv6, I don't think the problem will be solved as long as there's a patch that somehow works.
[1] https://en.wikipedia.org/wiki/Project_Xanadu
[+] [-] jacquesm|8 years ago|reply
So as long as there's a patch that somehow works we have at least a solution. If that patch wouldn't be there we'd have nothing.
[+] [-] robert_foss|8 years ago|reply
There is no perfect solution in sight and this a good one until a better one comes along.
[+] [-] krapp|8 years ago|reply
I disagree. Expecting networks and software to act as an immutable medium is a fool's errand. The internet was never meant to provide a permanent cultural archive, and it's not actually a "problem" that it doesn't, because that's not what it's for.
Backups should be a service, not a feature of the network or the protocol itself. I think that what Archive Team does represents the correct way to approach the issue.
[+] [-] Buetol|8 years ago|reply
[+] [-] manigandham|8 years ago|reply
[+] [-] ipsum2|8 years ago|reply
I've written my own Soundcloud offline audio player, but didn't distribute it because it was against their TOS.
[+] [-] jacquesm|8 years ago|reply
I had the same issue with backing up Geocities when it went down. I figured better safe than sorry, established a very easy deletion procedure for the copyright holders and have received only a very small number of nastygrams compared to an absolutely enormous number of messages from people that were happy their content got saved.
So at a guess, yes it is copyright infringement, no, it will not lead to trouble because most people are able to recognize a good faith effort when they see it.
[+] [-] nextlevelwizard|8 years ago|reply
To me that means "stuff that's most likely fine to preserve and most likely isn't found on other places".
Also according to Sound Cloud's ToS by using them you are granting all users rights to "to use, copy, listen to offline, repost, transmit or otherwise distribute" your content. So if Archive Team downloads everything they can (that does not in itself violate copyright (i.e. they are not Metallica songs)) there should be no copyright issues.
[+] [-] ghaff|8 years ago|reply
[+] [-] unknown|8 years ago|reply
[deleted]
[+] [-] db48x|8 years ago|reply
[+] [-] fgandiya|8 years ago|reply
[+] [-] sk0g|8 years ago|reply
[+] [-] _pmf_|8 years ago|reply
[+] [-] imartin2k|8 years ago|reply
Maybe because they are profit-oriented company that raised hundreds of million USD in venture capital. Asking for donations would be kinda unethical, and the founders would know that.
However, if SoundCloud somehow would be transformed into a non-profit organization...
[+] [-] Sami_Lehtinen|8 years ago|reply
[+] [-] chinathrow|8 years ago|reply
[+] [-] Steeeve|8 years ago|reply
[+] [-] simonhfrost|8 years ago|reply
[+] [-] rapnie|8 years ago|reply
[+] [-] eatbitseveryday|8 years ago|reply
[+] [-] voltagex_|8 years ago|reply
Although, if anyone could tell me how to do that with AES-SAMPLE HLS video, I'd be very happy.
[+] [-] random_calc|8 years ago|reply
[+] [-] philfrasty|8 years ago|reply
Just remembered cases like Deutsche Bank vs Leo Kirch which are legal nightmares.
[+] [-] imartin2k|8 years ago|reply
[+] [-] omarforgotpwd|8 years ago|reply