top | item 21737696

Verizon/Yahoo Blocking Attempts to Archive Yahoo Groups – Deletion: Dec. 14

1393 points| Diagon | 6 years ago |modsandmembersblog.wordpress.com | reply

405 comments

order
[+] Diagon|6 years ago|reply
Extensive history is about to be lost. Despite being broken, many organizations still use it. Examples from that post:

A police cooperative in Washington DC that was using them as a network to communicate with their respective neighborhoods with over 17,000 members.

A phone company in the UK that assigns phone numbers using the groups and now will lose all those phone designations when it’s deleted.

A Birding group in new Delhi with 2,000 members that has collected data and research on birds for TWO DECADES.

An Adoption group in France, that has been using it for years and years to communicate and share history and photos and more.

They also would have found: Numerous support groups for people who are suicidal or depressed.

Numerous medical groups for people to communicate more effectively with their doctors.

Numerous Vet groups with 24 hr care advice for sick pets.

Numerous support and help groups for the Elderly.

Numerous Historical groups for WW2 Veterans, Vietnam Veterans, and etc.

Numerous science groups that have used them for years and have all their research there.

Numerous fan fiction groups or arts groups that have shared their work for years.

[+] tedunangst|6 years ago|reply
> A phone company in the UK that assigns phone numbers using the groups and now will lose all those phone designations when it’s deleted.

Wow, somebody invented a database that's even worse than an Excel file on a network share.

(Also, how are they going to assign new numbers when archive.org takes over? Is archive.org going to give them write access?)

[+] ec109685|6 years ago|reply
Not to be flippant, but wouldn't one of the members of these groups have a copy of the group in their email? Given gmail and whatnot store things virtually indefinitely, couldn't the contents be recovered that way?

-EJ

[+] betamaxthetape|6 years ago|reply
disclaimer: I'm a Member of Archive Team who's helping coordinate the joining of Yahoo Groups in preparation for archival.

Yahoo's banning of a large amount of the accounts we were using is a huge setback for us. In total we lost over access to over 55,000 Yahoo Groups, many of these will now not be archived and will be lost when Yahoo deletes everything on December 14.

Particularly disastrous was the loss of access to all of the 30,000 Fandom (fanfic / fanart / etc..) groups that were requested to be archived by members of the fandom community. We're back to square one now, and it is looking increasingly likely that we're only going to be able to re-join (and therefore archive) a small percentage of these groups before December 14.

(And now for the inevitable, shameless plug...) We could really use some help! If you've got an hour or so, we could really use people to come and complete CAPTCHAs for us. (A CAPTCHA is needed to join every group). Instructions at: https://github.com/davidferguson/yahoogroups-joiner

[+] jstanley|6 years ago|reply
I tried to do this but upon clicking the purple "Join Group" button Yahoo is giving me an error saying my email address is not linked to a Yahoo account:

> Your email address is not linked to a Yahoo ID. To join this group, you need to link your email address to a Yahoo account.

When I click "link your email address", it just takes me to a page called "Personal info" which doesn't have any obvious way to link my email address.

So I'm not sure how to proceed.

EDIT: Solved it. I had initially only "verified" the account with a phone number, but you have to add an email address as well. It's now working.

For anyone who, like me, signed up for this and filled in the Google form, but then couldn't find the leaderboard URL after closing the tab, it is https://df58.host.cs.st-andrews.ac.uk/yahoogroups/leaderboar...

It seems to be working through a list in reverse alphabetical order. Watching the progress being made is quite satisfying. When I started it was on groups like "sciencefiction" and now it's moved on to "petzluverz".

[+] Diagon|6 years ago|reply
While the above post is concerned with Fandom groups, my concern is with groups that started doing early community driven biohacking type research. There are medical tests results and discussions of medical interventions. While that's my focus, I'm sure there's additiona important material. We really need to save this data.
[+] rkagerer|6 years ago|reply
Thanks for fighting the good fight!

I assumed I could help by going to a web page and solving a bunch of captchas for you, but when I read those instructions I found there's more involved (forging a Yahoo account, installing an extension) and it turned me off.

If captcha's are the bottleneck, maybe some generous soul here could figure out a way to automate the rest and just give me a page I can go solve captchas? Further reducing the friction might help get you some more uptick from the community - more monkeys like me banging at typewriters.

Sorry I wasn't more help, and best of luck with your efforts.

[+] scarejunba|6 years ago|reply
I imagine you guys already know this but considering we’re up against the timeline, I’d use the captcha solving service (easy to google yourself) and Luminati to distribute the IP addresses while swallowing my ethical qualms.
[+] wanderer2323|6 years ago|reply
It went pretty good for the first 10-20 or so groups but now I get the multiples of the really annoying captchas (click until none remain) per group... Damnit yahoo...
[+] citruscomputing|6 years ago|reply
Shoutout to https://github.com/dessant/buster by the way!

`Buster is a browser extension which helps you to solve difficult captchas by completing reCAPTCHA audio challenges using speech recognition. Challenges are solved by clicking on the extension button at the bottom of the reCAPTCHA widget.`

[+] yots|6 years ago|reply
FYI: The extension offers many private groups that I can't join without approval and that seems to disrupt the flow of the extensions.
[+] Aeolun|6 years ago|reply
Hah, this is fun! I've so far stumbled on a fantastic group with Sims 1 houses (pictures, and the actual lots), and a Dream Street fan-club, which of course prompted me to see who the hell they were.

I confess I'm doing this mostly to see what people posted on the internet at some point in time :)

Edit: All groups have around 1600 members... what causes this...

[+] tootie|6 years ago|reply
Is there any cited reason for the groups they're blocking?
[+] ar-jan|6 years ago|reply
btw, maybe Mechanical Turk could help with the captcha part?
[+] ar-jan|6 years ago|reply
Just solved a bunch of captchas, but Chrome crashed a few times during. Due to the addon?
[+] dartdartdart|6 years ago|reply
As an aside, is there anyway to recover emails if I didn't sign into Yahoo for a year? I and a lot of others had up to 15 years of sentimental mail exchanged during that period :(
[+] john_moscow|6 years ago|reply
Forgive my naivety, but why would blocking of your accounts delete the data you have already backed up? This sounds like you are doing it the wrong WAY, IMO.
[+] Angostura|6 years ago|reply
Have you posted this on Reddit anywhere? Possibly /technology?

You might even get the admins to make an announcement.

[+] mehhh|6 years ago|reply
Have you considered using NordVPN for CAPTCHA bypass? They are a shady company, but their network of residential VPNs is impressive.
[+] pmoriarty|6 years ago|reply
There have to be some Verizon or Yahoo employees on HN who are reading this.

Can any of you shed some light on why Verizon and Yahoo aren't cooperating with the Archive Team to archive this valuable historical content?

(If you don't feel comfortable commenting with your regular HN account, maybe you could do so with a throwaway account?)

Also, is it possible for any of you to bring this issue to the attention of upper management and help them understand how important it is to archive this?

You Verizon/Yahoo employees have much more power to make a difference here than anyone of us from the outside can.

[+] ygthrowaway|6 years ago|reply
Probably not very helpful/informational but:

I work for VzM, but not historically directly on Yahoo products (product teams have been merged/consolidated etc. over the past few years, but there's still strong tendencies toward products people came from).

So I wouldn't be very clued into what's happening with Yahoo Groups internally. And I've heard nothing about this internally. At all.

As it stands, it's 2:30pm in SV, VzM is top of the HN frontpage, and not a single soul has mentioned it yet on internal Slack.

Will see if I can find out more.

[+] john_moscow|6 years ago|reply
Pure speculation, but if you publish something created by another person without an explicit permission by them, it may open you up for a lawsuit. If some groups required explicit approval by a moderator in order to read the posts, I would take it as they didn't want the content to go public.

So technically, some legal troll could post some copyrighted information, wait for it to be published on Archive, and then sue Archive for copyright infringement and Verizon for assisting it. As a non-profit, Archive will likely get away with just taking it down, but a for-profit Verizon is a wholly different story.

[+] logicallee|6 years ago|reply
how much storage do you think in total all of the Yahoo Groups content takes?
[+] Thorentis|6 years ago|reply
I'm genuinely curious from an ideological perspective, why archivists think all this material is worth saving?

People often compare the shutting down of sites or the banning of content (e.g. When Tumblr banned porn, or now yahoo shutting down groups) to the burning of the Library of Alexandria. But there is a huge difference. The LoA held knowledge collated and collected by the best thinkers of the time. The Internet is not that. The Internet is an open platform where anybody can say anything like that. Most comment sections are filled with all sorts of material ranging from factual to entirely fictional.

I realise it is hard to decide what is worth keeping (and therefore erring on the side of saving it all), but I'd wager that the vast majority of archived content is not useful at all. The Wayback machine is a perfect example. Lots of great stuff, but that's a drop in the bucket compared to the vast amounts of useless, or even redundant information stored.

It is a lot of resources thrown at saving, not the equivalent of the Library of Alexandria, but the public toilet block graffiti wall.

Anybody want to share what drives them to do this?

[+] pariahHN|6 years ago|reply
Even if we still had the Library of Alexandria, it may have shed zero light on the actual lives of citizens. Archiving content on the internet means capturing thousands of individual level perspectives and experiences. We don't know what will end up being important to historians 50 or 100 years from now. I would bet there are dozens if not hundreds of historians that would give anything for a record of their favorite time period that contains even a fraction of the amount of content today's archive efforts are storing.

It's also not horrendously expensive - we are getting better and better at storage as well data analysis techniques, so stuff that seems useless today may be useful 50 years from now and cost less to store than it does now. The key thing again being that we can't benefit from hindsight.

Even graffiti can give insight into a time period, even if that insight is that that time period had an unusually high number of graffiti artists.

[+] Nition|6 years ago|reply
Step 1: We only need to archive the genuinely good content.

Step 2: It will take a long time to look through all this content and determine which parts deserve keeping.

Step 3: We will inevitably leave out something that someone else thinks is worth keeping anyway.

Step 4: Let's just archive everything.

[+] shortformblog|6 years ago|reply
One man’s public toilet block graffiti wall is another’s Library of Alexandria. Let the historians and journalists decide what’s important and the archivists take their best crack at saving it.

I write a lot of historical content and often the most useful stuff I find—for example, old flyers or ads from the 1950s or 1960s—would have been considered trash by someone at the time.

So an archivist’s job isn’t to make a judgment. It’s to protect the data as they see fit.

[+] WalterBright|6 years ago|reply
> I'm genuinely curious from an ideological perspective, why archivists think all this material is worth saving?

It's easier to just save it all and let gawd sort it out.

You never know what some future person might find interesting. For example, my father took lots and lots of pictures, but they're all set in the living room and kitchen. No pictures of the rest of the house. I'm sure the thought of photographing other rooms simply never occurred to him as being interesting.

For another example, many people are interested in where/when/why certain words first appeared, like the origin of "OK". Massive archives of text that are searchable would help with this.

[+] CamperBob2|6 years ago|reply
It is a lot of resources thrown at saving, not the equivalent of the Library of Alexandria, but the public toilet block graffiti wall.

Ask an antiquarian about the value of graffiti in the ruins of Pompeii and other archaeological sites sometime. The great historians of the day wrote about their contemporary culture, while the vandals and miscreants and lowlifes and commoners contributed to that culture. Having access to both sources gives us a much more complete picture.

You don't know what's worth saving at the time you save it.

[+] empath75|6 years ago|reply
When I was at aol I tried to get them to open source the q link server code from the 1980s. Someone actually got it on DVD for me and everything but after the Verizon merger they fired the entire legal team that was responsible for authorizing open source release and it just stalled.
[+] jedberg|6 years ago|reply
It's like the burning of the Library of Alexandria all over again.

We don't know exactly what was in the library when it burned. We assume it was all great works of intellectualism, but it could very well have been the fanfics of their time.

[+] frustyycomb|6 years ago|reply
there are a few groups i was a member of like lifters https://groups.yahoo.com/neo/groups/Lifters/info which was an intensive technical development group in the field on propellerless, rocketless, jetless flight using only electronic high voltage.

also some of the politics groups were a great time capuslue for around the clinton/bush election era

a lo to f eartthquake researchers gathered on several earthquake groups as well including caltech seismologistics and advanced amatuers many of whom arent around anymore.

also some of the info in these groups can be used to defeat patent applications as they show evidence of public prior concepts and art.

yahoogroups consisted of somewhat more technically advanced users than modern website users like reddit etc because they were earlier and somewhat harder to use.

its a lot of good quality content.

also in the early days on these groups spam and massive controlled astroturfing account groups was pretty rare.

this is like losing 15 years of ancient Sumerian writings in a very interesting early time for the Internet.

[+] lazzlazzlazz|6 years ago|reply
This is a wake-up call to the entire world: we cannot take internet history for granted. We need affordable, decentralized means with long-term economic incentives to archive the digital world.

In a way, the digital world is far more fragile than the physical world. And the time to solve this is now.

[+] 8bitsrule|6 years ago|reply
Tragedy of The Cloud.

IIRC, Archive.org is still running its fundraiser today.

We need LOTS of publicly-sponsored and paid-for digital archival centers that, like libraries, are maintained for the common welfare. Or we could, you know, add that duty (and funding) to existing libraries! With -paid- archivists!

[+] dessant|6 years ago|reply
What prevents Verizon from donating the Yahoo Groups database to the Internet Archive? What does Verizon have to gain from preventing the archival of Yahoo Groups?
[+] Cougher|6 years ago|reply
We have examples of content that was destroyed because it was deemed trivial at the time, one example being the BBC's policy of erasing its television shows so the tape could be used for new shows. The policy began with the idea that a television broadcast was a temporary communication like radio, and really, what possible reason could there be for people in the future to want to watch things like comedy shows. Dr Who, or news programs from the 60s, or the BBC's coverage of the Apollo moon landing? Surely the value of these cultural artifacts was not as great as the cost of video tape? https://en.wikipedia.org/wiki/Wiping#BBC
[+] userbinator|6 years ago|reply
The "dark side" of web scrapers has always been one step ahead with things like IP bans and CAPTCHA solvers, maybe it's time to get their assistance... as the old saying goes, "an enemy of an enemy is a friend".
[+] egfx|6 years ago|reply
In the early 2000’s there existed two main ecosystems in mobile software J2ME and BREW (not counting Symbian) the latter BREW, operated by Verizon. I had cofounded a QA consulting company that heavily based itself off BREW’s highly extensive developer portal. Then one day without warning, the developer portal disappeared. Luckily I had the foresight to download all the documentation a week before. My cofounder, a Microsoft developer was dumbfounded.
[+] Diagon|6 years ago|reply
Call For Action

https://modsandmembersblog.wordpress.com/taking-action/

Don't miss the sidebar with these links:

https://modsandmembersblog.wordpress.com/media-contacts/

https://modsandmembersblog.wordpress.com/contacting-verizon-...

https://modsandmembersblog.wordpress.com/contacting-verizon-...

Also, you can add these emails to the media contacts:

  "Reporter Katyanna Quach" <[email protected]>,
   "Managing editor Gavin Clarke" <[email protected]>,
   "Corey Wilson & Rachel Janc; Senior Director, Communications" <[email protected]>,
   "Pitches" <[email protected]>,
   "Rich Woods" <[email protected]>,
   "Paul Thurrott" <[email protected]>,
   "Brad Sams" <[email protected]>,
    "Kate Rayford, Media Inquiries" <[email protected]>,
    "Bryan Lowder (LGBTQ issues/culture)" < [email protected]>,
    "Torie Bosch (emerging technology effects on public policy and society)" <[email protected]>,
    "Jonathan Fischer (big tech, cities, media/internet culture)" <[email protected]>,
    "Susan Matthews, Health & Science" <[email protected]>,
    "Erika Allen, Executive Managing Editor" <[email protected]>,
    "Katie Drummond, SVP, Global Content" <[email protected]>,
    "Press, US" <[email protected]>,
    "Press, Canada" <[email protected]>,
    "Press, UK" <[email protected]>,
    "Pitches, Culture" <[email protected]>,
    "Pitches, Tech" <[email protected]>,
    "Issues" <[email protected]>
[+] zfxfr|6 years ago|reply
There must be something I am missing somewhere.

1) I have been a member of a group for many years (Gann study group) . Last Friday I received a notification from the owner who was explaining the group was closing so he set up a new one somewhere else. I thought it would be nice if I made a backup. So I found a python script on github (there are dozen of scripts in various languages which can be used to backup a yahoo group there). It took me a couple of minute to get it working and then a while later. Voila ! I had it nicely packed on my hard drive. So why is it so hard to back up a group? I don't understand the problem.

2) "A phone company in the UK that assigns phone numbers using the groups and now will lose all those phone designations when it’s deleted."

What? Well OK why not.. But? They are a phone company. There must be someone able to scrape all this data? I don't get it? There are so many ways to extract data from yahoo group.

[+] gatherhunterer|6 years ago|reply
The current administration put Verizon’s chief counsel into the position of FCC Chairman. I would not expect Verizon to answer to anyone.

Also, it is shame that the person in direct contact with Yahoo over this is sending angry emails in all caps. The Internet Archive deserves better.

[+] a3n|6 years ago|reply
Don't use free corporate services for shit you care about. Or think you may care about later.

Don't use any service that suffers from a single point of control.

How much anguish when Facebook inevitably either goes away or pivots entirely?

Or HN, for that matter?

[+] rthomas6|6 years ago|reply
Things like this are a good answer to when people question why internet centralization and walled gardens matter. If these things were hosted across thousands of servers, federated, or under a license that made them able to be copied, there would be no issue. This is only an issue in the first place because people posted content in a place and manner that made them give up ownership to it. One day, perhaps decades from now, Facebook is going to face the same problem. Twitter would, too, if it wasn't being archived by the Library of Congress.
[+] oieoeireoes|6 years ago|reply
Verizon claimed that the archivists violated the "terms of service" [1], but I couldn't find any reference to automation, downloading, crawling, or denial of service attacks that might apply.

Does anyone have an idea of exactly what term or terms were violated by the archivists?

[1] https://www.verizonmedia.com/policies/us/en/verizonmedia/ter...

[+] arianestrasse|6 years ago|reply
Just playing a devil's advocate here. The way archivists are downloading the data can be said to disrupt the services, which is mentioned in the terms of service:

2. d. viii: "interfere with or disrupt the Services or servers, systems or networks connected to the Services in any way."

I'd also like to point out that the apparent spokesperson Brenda Fowler said in her open letter to Verizon, that "If the problem is that all our attempts to rescue our archives in the time we have left is causing an overload or strain on your servers, then stop making us HAVE to work around the clock, and GIVE US MORE TIME. ..." Probably not the wisest thing to say right now.

Also, archiving the groups with automated tools is against the Use of Services rule, that states the following:

2. e: "Use of Services. You must follow any guidelines or policies associated with the Services. You must not misuse or interfere with the Services or try to access them using a method other than the interface and the instructions that we provide. ..."

As I mentioned in another comment, I really support the cause and am a big fan of archiving myself but it's unfortunately quite clear that Verizon is right at calling out the violations of "terms of service".

[+] kevingadd|6 years ago|reply
AFAIK they hadn't started doing mass-archiving either. They were still setting up.
[+] fl0under|6 years ago|reply
I had just recently been reading about Arweave [0], a sort of distributed file storage that claims to permanently store files/webpages using various incentives.

Seems like something like this would be a good way to archive this sort of information or build sites like Yahoo groups on top of this file storage in the first place.

[0] https://www.arweave.org/