Historical programming-language groups disappearing from Google

[+] jedberg|5 years ago|reply

It's funny, when I took a tour of the US Geological Survey, the curator of the collection hated Google (which was just a few blocks away). He said Google is great now, with all their maps, which were far more accurate and had better coverage than the USGS.

But what happens when they get bored with map data and get rid of it?

He had been ordered to turn over all of their historical arial archives for scanning by Google, and then told the USGS would no longer do arial scanning since Google was doing it. But there was no agreement for Google to turn over their arial scans back to the USGS.

At the time we all told him not to worry, Google would never remove data it had collected. Looks like he was a lot smarter than us.

[+] coliveira|5 years ago|reply

Well, that's the problem with the whole internet. Remember those pages created in the 90s/early 2000s? People thought they were sharing information to the whole world. It turns out that most pages created in the 90s are now inaccessible or have been siloed by big corporations. The fact that we allowed corporations to take over the internet made it an inhospitable place for everyone else without corporate backing.

[+] kerkeslager|5 years ago|reply

> He had been ordered to turn over all of their historical arial archives for scanning by Google, and then told the USGS would no longer do arial scanning since Google was doing it. But there was no agreement for Google to turn over their arial scans back to the USGS.

Jeez, that's horrifying. Literally just giving public assets to private corporations.

[+] tingletech|5 years ago|reply

> But there was no agreement for Google to turn over their arial scans back to the USGS.

That was poor negotiation by USGS Solicitor's Office. Libraries participating in google digitization programs negotiated to keep copies of their scanned materials in the Hathi Trust Digital Library https://www.hathitrust.org

[+] est31|5 years ago|reply

There are laws for book publishers, requiring that they send copies to your local government's central library. In the US it's the library of congress. Some of the books they don't keep, but they do filter them by which books are important and which aren't. Maybe the same should be done for "viral" posts, such arial scans, and other data deemed important.

[+] jeffbee|5 years ago|reply

The USGS is currently in the middle of an 8-year 1.1-billion-dollar program to develop a nationwide digital elevation model from aerial lidar. The data, which is freely accessible, is hosted on AWS. Cute story though. The hackernewses are going to eat it up.

https://www.usgs.gov/core-science-systems/ngp/3dep/3dep-data...

[+] ponker|5 years ago|reply

Companies are necessarily managed for the quarter and countries should be managed for the century.

[+] GekkePrutser|5 years ago|reply

And it's not even true. At least here in Europe (I can't comment on the US as I've never visited), Google Maps is really poor.

It's fine when you travel by car, but when I'm hiking through the hills I'm just walking through an empty square on Google Maps. Volunteer-driven OpenStreetMap is MUCH better. And there the data is actually open and safeguarded.

Governments should support that kind of project instead of corporate privacy-invading playtoys like Google Maps.

[+] wyclif|5 years ago|reply

In another life, I was a land surveyor and I did a lot of LiDAR work as well as heavy use of USGS data. Almost anybody except the most blinkered in that industry would have seen this coming, I think. It's just one more data point that convinces me that Google should be broken up or at least not allowed to silo previously public categories of data.

[+] parksy|5 years ago|reply

I had a similar almost to the letter conversation when I did some web work for a much smaller GIS firm back in the day, but wanted to add that in my experience this isn't just a google thing but an issue with governments and outsourcing in general.

Anecdotally, a close relative (and many others in her institute) designed entire curricula of learning modules for a government-owned nationwide technical college, back when online learning was newish, ~20 years ago (I think back when SCORM was fresh). These were tightly integrated into the traditional in-class offerings. A couple of years later a "trim the fat" government slashed internal capabilities and outsourced all "IT" hosting, management, etc.

All of the online learning modules (which would have cost millions in man-hours to develop) were literally handed over as "content" to a company who to this day offers them back to her institute under per-student licenses (that far exceed any "hosting" costs of these basically static resources) over a decade later. This company also profits off licensing to an array of pop-up online "institutes" that don't even approach the pedagogical context needed to ensure quality education outcomes from these resources.

Like a comedy of errors, from time to time some lecturer at her college will want to ask some question about the materials, their boss directs them to the company support (which is a paid service), after the issue escalates through the support tiers and they realise they need the expert knowledge of the author she'll get an email with the question, a process that can take days or weeks when the lecturer could have walked into the office next door and asked her directly, if the company hadn't stripped all author credits from the materials.

If the company decides to shift business models, or goes out of business, or is acquired and scuttled, these assets get blown to the winds.

There's a lot I could say about this situation, but essentially governments in general seem to devalue their assets at taxpayer expense, the IP of these assets could have been better handled rather than just giving it directly to the first company to win the contract all those years ago.

[+] raldi|5 years ago|reply

> historical arial archives

A font of knowledge

[+] pwdisswordfish2|5 years ago|reply

Is this a valid reason for not using Go?

I am a holdout.

(Not suggesting I am "smarter" than Go users, but I can forsee issues with Go being controlled by Google.)

[+] cosmodisk|5 years ago|reply

I think most of us are young enough to live up to the point where we would look into the mirror asking ourselves what did we do 20 years ago and nobody will really remember because you know,a few bits here,a few bits there,and all it disappeared...

[+] globular-toast|5 years ago|reply

Not smarter. Wiser.

It sounds awful that Google has the best mapping data in the US. In the UK Google's data is awful, worse than OpenStreetMap and much worse than Ordnance Survey, the national mapping agency.

[+] Spooky23|5 years ago|reply

The funny thing is that this happened already when Google bought DejaNews and broke the interface after a year.

[+] elmo2you|5 years ago|reply

The underlying problem here might as well be considered a fundamental shortcoming of pure/fundamental capitalism. I make no claims about the value of alternatives, or even if there are any (better ones, that is).

Anything that is (no longer) of commercial value will be "phased out" and dismantled/destroyed. One might still stretch it a bit, by arguing that the commercial value of something can include its future potential value. But I personally know not a single commercial companies that ever choose that over short term cost reductions and "profit optimizations".

Luckily, there are governments who acknowledge this shortcoming and build structures to compensate for it. But when governments decide to leave (almost) everything to commercial markets, then the importance of anything and everything can and will only be measured by it's commercial (contemporary) value/profitability.

People have every right to vote for and support such a system. But then don't complain, when all that you will get is only what such system supports/provides.

[+] unknown|5 years ago|reply

[deleted]

[+] irrational|5 years ago|reply

Isn't killing projects Google's key strength?

[+] TheSpiceIsLife|5 years ago|reply

Like we have whitewashing and greenwashing, I propose the term:

Googlewashing - to proclaim “Google would never ...”

[+] stcredzero|5 years ago|reply

He said Google is great now, with all their maps, which were far more accurate and had better coverage than the USGS...But what happens when they get bored with map data and get rid of it?

Looks like he was a lot smarter than us.

If you would've asked me back when Google was new, and we all believed in "Don't be Evil," I would never have thought that Big Tech would end up being the Ministry of Truth and The Memory Hole.

[+] synack|5 years ago|reply

Just recently I collected all of the archives of comp.lang.ada I could find and imported them into a public-inbox repository. There's a gap around 1992 that I couldn't find a copy of, but it's otherwise complete. It took a few days to get everything into the right format and get SpamAssassin dialed in, but it would certainly be possible to do this for the other comp.* groups if one had the patience.

https://archive.legitdata.co/

https://archive.legitdata.co/comp.lang.ada/

https://public-inbox.org/README.html

[+] sneeuwpopsneeuw|5 years ago|reply

I would personally very much appreciate it if the ada recources could be placed or archived again on the internet. Lately I had the feeling even books where a better option for finding information about the language.

[+] kazinator|5 years ago|reply

The vast majority of the spam content is injected into these newsgroups via Google Groups itself, and is not even seen on other NNTP servers.

Blocking posting access to these newsgroups from GG is generally a good thing for those newsgroups.

Not being able to search the archive is the unfortunate collateral damage though. Google is not obliged to provide a Usenet archive, I suppose.

Formerly obtained deep links to the content also do not work!

If you formely cited a comp.lang.lisp article by giving a direct link into Google Groups, people navigating it now get a permission error.

[+] dependenttypes|5 years ago|reply

What would be a good free NNTP server or NNTP archive?

[+] _kp6z|5 years ago|reply

Google's handling of these critical archives they were given is pretty abhorrent. The usenet archives should really be made public since there is no business value to them and they don't care about usenet.

[+] neilv|5 years ago|reply

When Google started, there was maybe an overall altruistic, visionary, principled culture among many pre-Web Internet-y people, and it looked like Google was of that same school of thought.

(This was at the same time that there was a gold rush of IPO plays, hiring anyone who could spell "HTML", and plopping them down in slick office space, Aerons for everyone, and lavish launch parties, with tons of oblivious posturing and self-congratulating. But Google stood out as looking technically smart, at least I believed the "Don't Be Evil", since that was the OG culture, and it seemed a savvy reference to behaviors in industry and awareness of the power that it was clear they would probably have.)

That might be why it wasn't surprising to hear of things like someone entrusting a bunch of old university backup tapes to Google's stewardship.

This has played out with mixed results, and I think Google could be doing much better for humanity and for techie culture.

[+] enneff|5 years ago|reply

Google didn’t kill Usenet; it was already pretty much dead. Web forums had all but taken their place (and where are their archives now? So much is lost).

If you look at the history, Google basically rescued the data from a collapsing Deja News, and made it available again. A nice gesture, which didn’t serve to benefit Google much in the long term.

If we want to preserve history then we can’t rely on for-profit companies. We need to instead fund non-profits whose specific charter is archival and preservation, like the Internet Archive.

[+] dragonwriter|5 years ago|reply

> The usenet archives should really be made public

Given the nature of Usenet, they were if anyone wanted them.

[+] eternalban|5 years ago|reply

> they don't care about usenet.

They cared enough about to kill it.

[+] HenryKissinger|5 years ago|reply

Controversial question: Why should we preserve code that no one uses anymore? Why should we not allow some information to be simply lost?

[+] jeffbee|5 years ago|reply

The fact that nobody had enough fucks to give to archive these groups tells you everything you need to know about decentralized peer-to-peer proof-of-work blockchain nerd hobbies. This content exists on a completely open peer-to-peer content distribution network and here you are whining that one company -- the company that already rescued this archive in a midnight U-Haul run 20 years ago -- failed to archive it.

[+] dabockster|5 years ago|reply

Seriously! I have the same issue with a lot of modern online communities/projects too. They all assume whatever platform they're currently publishing on will be there forever.

Brb archiving my Twitter posts

[+] none10287|5 years ago|reply

Google has bought dejanews and has profited immensely from open source and open information.

So I do think they have an obligation either a) to make the whole archive available for anyone or b) maintain it properly.

Properly means restoring the fast UI from around 2004.

[+] icheishvili|5 years ago|reply

This type of behavior is why I can never consider GCP. How many people have been burned at this point by Google randomly shutting down something they rely on?

[+] john-shaffer|5 years ago|reply

I've had two Google accounts shut down in the last six months with no explanation. There is no appeal. The consumer services I've used (Feed Reader, Play Music) have been shut down, and the cloud service I was most interested in was luckily shut down before I was able to use it. (They used to have a service to resize & manipulate images in Blob Storage. I found a good AWS alternative[1] instead). I cannot rely on Google for anything at all, and definitely not for something as important as cloud services.

[1] https://github.com/awslabs/serverless-image-handler

[+] userbinator|5 years ago|reply

One thing that's become extremely clear to me over the last decade or so is that almost all tech companies simply do not care about the past, and I suspect at least part of that is so their narrative of progress can be subjected to fewer challenges from those who look back and compare.

Also, and this may be a bit of a tangential point, but the "deny the past because it has something bad" that Google has effectively done here is uncomfortably close to the set of recent and far more political events.

[+] SyneRyder|5 years ago|reply

> do not care about the past...

You just reminded me of a quote from an electronic music documentary 25 years ago. One of the Detroit techno artists insisted on taking the filmmakers to a historic theatre that had been left to crumble & turned into a car park:

"In America especially, nobody tends to care about these kinds of things. People in America tend to let this shit just die, let it go. No respect for the history. I, being a techno, electronic, high-tech futurist musician, I totally believe in the future! But as well, I believe in a historic and well kept past. I believe there are some things that are important. Now, maybe this is more important like this, because in this atmosphere, you can realize how much people don't care, how much they don't respect. And it can make you realize how much you should respect."

- Derrick May, DJ/Composer, Universal Techno (1996)

https://youtube.com/watch?v=tdox6H7FJBU&t=955s

The segment starts at 16:00 in the video and is about 2 minutes long.

[+] jolmg|5 years ago|reply

> almost all tech companies simply do not care about the past

You may be surprised that it's not just companies. It's not hard to find people who think it's better for old stuff to just be deleted.

[+] Animats|5 years ago|reply

"He who controls the present controls the past. He who controls the past controls the future" - Orwell, "1984"

[+] fmajid|5 years ago|reply

> Usenet predates Google's spam handling tools

In fact Usenet predates spam itself, since the first spam (Canter & Siegel) was on Usenet itself in 1994 (I was there).

[+] aidenn0|5 years ago|reply

Anyone know if anyone not google has newsgroup archives publicly accessible (The Internet Archive maybe?)

[+] CrankyBear|5 years ago|reply

No, no, no. These groups and other Usenet groups archives must be preserved. They're our history.

[+] imhoguy|5 years ago|reply

Anyone looking for a hobby? It is time to become a data hoarder https://www.reddit.com/r/DataHoarder/

[+] rdiddly|5 years ago|reply

Either those Usenet groups are not part of the world, or they don't consist of information, or Google just failed at "organizing the world's information."

[+] WoodenChair|5 years ago|reply

I read the article and I read the threads here, and maybe I missed it—but why did these groups disappear? Were they banned due to bad words or a mistaken spam filter?

[+] summerlight|5 years ago|reply

https://www.lumendatabase.org/notices/search?utf8=%E2%9C%93&...

Looks like there has been (likely automated, nearly all of them are the same Italian phrase) mechanical legal complaints and it probably caused this instance of automated blocking going wild.

As an engineer I can understand the desire to automate everything, but please at least have some heuristics to detect this kind of easy-to-detect mechanical behavior before giving the model a full authority to block anyone it doesn't like.

[+] jolmg|5 years ago|reply

> since there is no other comprehensive archive after Google's purchase of Dejanews around 20 years ago

Was I naive in thinking that The Internet Archive would have long archived this type of thing?

[+] msie|5 years ago|reply

WTF Google? Are you now so full of young programmers who have no respect for programming history? You’ve lost all greek cred that’s for sure.

[+] mark_l_watson|5 years ago|reply

Too many people and companies don’t appreciate culture enough. Maintaining a cultural record should apparently not be left to just one company.

Thanks for posting this, it reminded me to donate again to archive.org, which I just did.

I use ‘culture’ to include anything creative, anything that we experience as humans. Everything should be preserved, schools should be well funded, as should the arts.

[+] lkirk|5 years ago|reply

Is this something that the internet archive would preserve?

332 comments