It's funny, when I took a tour of the US Geological Survey, the curator of the collection hated Google (which was just a few blocks away). He said Google is great now, with all their maps, which were far more accurate and had better coverage than the USGS.
But what happens when they get bored with map data and get rid of it?
He had been ordered to turn over all of their historical arial archives for scanning by Google, and then told the USGS would no longer do arial scanning since Google was doing it. But there was no agreement for Google to turn over their arial scans back to the USGS.
At the time we all told him not to worry, Google would never remove data it had collected. Looks like he was a lot smarter than us.
Well, that's the problem with the whole internet. Remember those pages created in the 90s/early 2000s? People thought they were sharing information to the whole world. It turns out that most pages created in the 90s are now inaccessible or have been siloed by big corporations. The fact that we allowed corporations to take over the internet made it an inhospitable place for everyone else without corporate backing.
> He had been ordered to turn over all of their historical arial archives for scanning by Google, and then told the USGS would no longer do arial scanning since Google was doing it. But there was no agreement for Google to turn over their arial scans back to the USGS.
Jeez, that's horrifying. Literally just giving public assets to private corporations.
> But there was no agreement for Google to turn over their arial scans back to the USGS.
That was poor negotiation by USGS Solicitor's Office. Libraries participating in google digitization programs negotiated to keep copies of their scanned materials in the Hathi Trust Digital Library https://www.hathitrust.org
There are laws for book publishers, requiring that they send copies to your local government's central library. In the US it's the library of congress. Some of the books they don't keep, but they do filter them by which books are important and which aren't. Maybe the same should be done for "viral" posts, such arial scans, and other data deemed important.
The USGS is currently in the middle of an 8-year 1.1-billion-dollar program to develop a nationwide digital elevation model from aerial lidar. The data, which is freely accessible, is hosted on AWS. Cute story though. The hackernewses are going to eat it up.
And it's not even true. At least here in Europe (I can't comment on the US as I've never visited), Google Maps is really poor.
It's fine when you travel by car, but when I'm hiking through the hills I'm just walking through an empty square on Google Maps. Volunteer-driven OpenStreetMap is MUCH better. And there the data is actually open and safeguarded.
Governments should support that kind of project instead of corporate privacy-invading playtoys like Google Maps.
In another life, I was a land surveyor and I did a lot of LiDAR work as well as heavy use of USGS data. Almost anybody except the most blinkered in that industry would have seen this coming, I think. It's just one more data point that convinces me that Google should be broken up or at least not allowed to silo previously public categories of data.
I had a similar almost to the letter conversation when I did some web work for a much smaller GIS firm back in the day, but wanted to add that in my experience this isn't just a google thing but an issue with governments and outsourcing in general.
Anecdotally, a close relative (and many others in her institute) designed entire curricula of learning modules for a government-owned nationwide technical college, back when online learning was newish, ~20 years ago (I think back when SCORM was fresh). These were tightly integrated into the traditional in-class offerings. A couple of years later a "trim the fat" government slashed internal capabilities and outsourced all "IT" hosting, management, etc.
All of the online learning modules (which would have cost millions in man-hours to develop) were literally handed over as "content" to a company who to this day offers them back to her institute under per-student licenses (that far exceed any "hosting" costs of these basically static resources) over a decade later. This company also profits off licensing to an array of pop-up online "institutes" that don't even approach the pedagogical context needed to ensure quality education outcomes from these resources.
Like a comedy of errors, from time to time some lecturer at her college will want to ask some question about the materials, their boss directs them to the company support (which is a paid service), after the issue escalates through the support tiers and they realise they need the expert knowledge of the author she'll get an email with the question, a process that can take days or weeks when the lecturer could have walked into the office next door and asked her directly, if the company hadn't stripped all author credits from the materials.
If the company decides to shift business models, or goes out of business, or is acquired and scuttled, these assets get blown to the winds.
There's a lot I could say about this situation, but essentially governments in general seem to devalue their assets at taxpayer expense, the IP of these assets could have been better handled rather than just giving it directly to the first company to win the contract all those years ago.
I think most of us are young enough to live up to the point where we would look into the mirror asking ourselves what did we do 20 years ago and nobody will really remember because you know,a few bits here,a few bits there,and all it disappeared...
It sounds awful that Google has the best mapping data in the US. In the UK Google's data is awful, worse than OpenStreetMap and much worse than Ordnance Survey, the national mapping agency.
The underlying problem here might as well be considered a fundamental shortcoming of pure/fundamental capitalism. I make no claims about the value of alternatives, or even if there are any (better ones, that is).
Anything that is (no longer) of commercial value will be "phased out" and dismantled/destroyed. One might still stretch it a bit, by arguing that the commercial value of something can include its future potential value. But I personally know not a single commercial companies that ever choose that over short term cost reductions and "profit optimizations".
Luckily, there are governments who acknowledge this shortcoming and build structures to compensate for it. But when governments decide to leave (almost) everything to commercial markets, then the importance of anything and everything can and will only be measured by it's commercial (contemporary) value/profitability.
People have every right to vote for and support such a system. But then don't complain, when all that you will get is only what such system supports/provides.
He said Google is great now, with all their maps, which were far more accurate and had better coverage than the USGS...But what happens when they get bored with map data and get rid of it?
Looks like he was a lot smarter than us.
If you would've asked me back when Google was new, and we all believed in "Don't be Evil," I would never have thought that Big Tech would end up being the Ministry of Truth and The Memory Hole.
Just recently I collected all of the archives of comp.lang.ada I could find and imported them into a public-inbox repository. There's a gap around 1992 that I couldn't find a copy of, but it's otherwise complete. It took a few days to get everything into the right format and get SpamAssassin dialed in, but it would certainly be possible to do this for the other comp.* groups if one had the patience.
I would personally very much appreciate it if the ada recources could be placed or archived again on the internet. Lately I had the feeling even books where a better option for finding information about the language.
Google's handling of these critical archives they were given is pretty abhorrent. The usenet archives should really be made public since there is no business value to them and they don't care about usenet.
When Google started, there was maybe an overall altruistic, visionary, principled culture among many pre-Web Internet-y people, and it looked like Google was of that same school of thought.
(This was at the same time that there was a gold rush of IPO plays, hiring anyone who could spell "HTML", and plopping them down in slick office space, Aerons for everyone, and lavish launch parties, with tons of oblivious posturing and self-congratulating. But Google stood out as looking technically smart, at least I believed the "Don't Be Evil", since that was the OG culture, and it seemed a savvy reference to behaviors in industry and awareness of the power that it was clear they would probably have.)
That might be why it wasn't surprising to hear of things like someone entrusting a bunch of old university backup tapes to Google's stewardship.
This has played out with mixed results, and I think Google could be doing much better for humanity and for techie culture.
Google didn’t kill Usenet; it was already pretty much dead. Web forums had all but taken their place (and where are their archives now? So much is lost).
If you look at the history, Google basically rescued the data from a collapsing Deja News, and made it available again. A nice gesture, which didn’t serve to benefit Google much in the long term.
If we want to preserve history then we can’t rely on for-profit companies. We need to instead fund non-profits whose specific charter is archival and preservation, like the Internet Archive.
The fact that nobody had enough fucks to give to archive these groups tells you everything you need to know about decentralized peer-to-peer proof-of-work blockchain nerd hobbies. This content exists on a completely open peer-to-peer content distribution network and here you are whining that one company -- the company that already rescued this archive in a midnight U-Haul run 20 years ago -- failed to archive it.
Seriously! I have the same issue with a lot of modern online communities/projects too. They all assume whatever platform they're currently publishing on will be there forever.
This type of behavior is why I can never consider GCP. How many people have been burned at this point by Google randomly shutting down something they rely on?
I've had two Google accounts shut down in the last six months with no explanation. There is no appeal. The consumer services I've used (Feed Reader, Play Music) have been shut down, and the cloud service I was most interested in was luckily shut down before I was able to use it. (They used to have a service to resize & manipulate images in Blob Storage. I found a good AWS alternative[1] instead). I cannot rely on Google for anything at all, and definitely not for something as important as cloud services.
One thing that's become extremely clear to me over the last decade or so is that almost all tech companies simply do not care about the past, and I suspect at least part of that is so their narrative of progress can be subjected to fewer challenges from those who look back and compare.
Also, and this may be a bit of a tangential point, but the "deny the past because it has something bad" that Google has effectively done here is uncomfortably close to the set of recent and far more political events.
You just reminded me of a quote from an electronic music documentary 25 years ago. One of the Detroit techno artists insisted on taking the filmmakers to a historic theatre that had been left to crumble & turned into a car park:
"In America especially, nobody tends to care about these kinds of things. People in America tend to let this shit just die, let it go. No respect for the history. I, being a techno, electronic, high-tech futurist musician, I totally believe in the future! But as well, I believe in a historic and well kept past. I believe there are some things that are important. Now, maybe this is more important like this, because in this atmosphere, you can realize how much people don't care, how much they don't respect. And it can make you realize how much you should respect."
- Derrick May, DJ/Composer, Universal Techno (1996)
Either those Usenet groups are not part of the world, or they don't consist of information, or Google just failed at "organizing the world's information."
I read the article and I read the threads here, and maybe I missed it—but why did these groups disappear? Were they banned due to bad words or a mistaken spam filter?
Looks like there has been (likely automated, nearly all of them are the same Italian phrase) mechanical legal complaints and it probably caused this instance of automated blocking going wild.
As an engineer I can understand the desire to automate everything, but please at least have some heuristics to detect this kind of easy-to-detect mechanical behavior before giving the model a full authority to block anyone it doesn't like.
Too many people and companies don’t appreciate culture enough. Maintaining a cultural record should apparently not be left to just one company.
Thanks for posting this, it reminded me to donate again to archive.org, which I just did.
I use ‘culture’ to include anything creative, anything that we experience as humans. Everything should be preserved, schools should be well funded, as should the arts.
[+] [-] jedberg|5 years ago|reply
But what happens when they get bored with map data and get rid of it?
He had been ordered to turn over all of their historical arial archives for scanning by Google, and then told the USGS would no longer do arial scanning since Google was doing it. But there was no agreement for Google to turn over their arial scans back to the USGS.
At the time we all told him not to worry, Google would never remove data it had collected. Looks like he was a lot smarter than us.
[+] [-] coliveira|5 years ago|reply
[+] [-] kerkeslager|5 years ago|reply
Jeez, that's horrifying. Literally just giving public assets to private corporations.
[+] [-] tingletech|5 years ago|reply
That was poor negotiation by USGS Solicitor's Office. Libraries participating in google digitization programs negotiated to keep copies of their scanned materials in the Hathi Trust Digital Library https://www.hathitrust.org
[+] [-] est31|5 years ago|reply
[+] [-] jeffbee|5 years ago|reply
https://www.usgs.gov/core-science-systems/ngp/3dep/3dep-data...
[+] [-] ponker|5 years ago|reply
[+] [-] GekkePrutser|5 years ago|reply
It's fine when you travel by car, but when I'm hiking through the hills I'm just walking through an empty square on Google Maps. Volunteer-driven OpenStreetMap is MUCH better. And there the data is actually open and safeguarded.
Governments should support that kind of project instead of corporate privacy-invading playtoys like Google Maps.
[+] [-] wyclif|5 years ago|reply
[+] [-] parksy|5 years ago|reply
Anecdotally, a close relative (and many others in her institute) designed entire curricula of learning modules for a government-owned nationwide technical college, back when online learning was newish, ~20 years ago (I think back when SCORM was fresh). These were tightly integrated into the traditional in-class offerings. A couple of years later a "trim the fat" government slashed internal capabilities and outsourced all "IT" hosting, management, etc.
All of the online learning modules (which would have cost millions in man-hours to develop) were literally handed over as "content" to a company who to this day offers them back to her institute under per-student licenses (that far exceed any "hosting" costs of these basically static resources) over a decade later. This company also profits off licensing to an array of pop-up online "institutes" that don't even approach the pedagogical context needed to ensure quality education outcomes from these resources.
Like a comedy of errors, from time to time some lecturer at her college will want to ask some question about the materials, their boss directs them to the company support (which is a paid service), after the issue escalates through the support tiers and they realise they need the expert knowledge of the author she'll get an email with the question, a process that can take days or weeks when the lecturer could have walked into the office next door and asked her directly, if the company hadn't stripped all author credits from the materials.
If the company decides to shift business models, or goes out of business, or is acquired and scuttled, these assets get blown to the winds.
There's a lot I could say about this situation, but essentially governments in general seem to devalue their assets at taxpayer expense, the IP of these assets could have been better handled rather than just giving it directly to the first company to win the contract all those years ago.
[+] [-] raldi|5 years ago|reply
A font of knowledge
[+] [-] pwdisswordfish2|5 years ago|reply
I am a holdout.
(Not suggesting I am "smarter" than Go users, but I can forsee issues with Go being controlled by Google.)
[+] [-] cosmodisk|5 years ago|reply
[+] [-] globular-toast|5 years ago|reply
It sounds awful that Google has the best mapping data in the US. In the UK Google's data is awful, worse than OpenStreetMap and much worse than Ordnance Survey, the national mapping agency.
[+] [-] Spooky23|5 years ago|reply
[+] [-] elmo2you|5 years ago|reply
Anything that is (no longer) of commercial value will be "phased out" and dismantled/destroyed. One might still stretch it a bit, by arguing that the commercial value of something can include its future potential value. But I personally know not a single commercial companies that ever choose that over short term cost reductions and "profit optimizations".
Luckily, there are governments who acknowledge this shortcoming and build structures to compensate for it. But when governments decide to leave (almost) everything to commercial markets, then the importance of anything and everything can and will only be measured by it's commercial (contemporary) value/profitability.
People have every right to vote for and support such a system. But then don't complain, when all that you will get is only what such system supports/provides.
[+] [-] unknown|5 years ago|reply
[deleted]
[+] [-] irrational|5 years ago|reply
[+] [-] TheSpiceIsLife|5 years ago|reply
Googlewashing - to proclaim “Google would never ...”
[+] [-] stcredzero|5 years ago|reply
Looks like he was a lot smarter than us.
If you would've asked me back when Google was new, and we all believed in "Don't be Evil," I would never have thought that Big Tech would end up being the Ministry of Truth and The Memory Hole.
[+] [-] synack|5 years ago|reply
https://archive.legitdata.co/
https://archive.legitdata.co/comp.lang.ada/
https://public-inbox.org/README.html
[+] [-] sneeuwpopsneeuw|5 years ago|reply
[+] [-] kazinator|5 years ago|reply
Blocking posting access to these newsgroups from GG is generally a good thing for those newsgroups.
Not being able to search the archive is the unfortunate collateral damage though. Google is not obliged to provide a Usenet archive, I suppose.
Formerly obtained deep links to the content also do not work!
If you formely cited a comp.lang.lisp article by giving a direct link into Google Groups, people navigating it now get a permission error.
[+] [-] dependenttypes|5 years ago|reply
[+] [-] _kp6z|5 years ago|reply
[+] [-] neilv|5 years ago|reply
(This was at the same time that there was a gold rush of IPO plays, hiring anyone who could spell "HTML", and plopping them down in slick office space, Aerons for everyone, and lavish launch parties, with tons of oblivious posturing and self-congratulating. But Google stood out as looking technically smart, at least I believed the "Don't Be Evil", since that was the OG culture, and it seemed a savvy reference to behaviors in industry and awareness of the power that it was clear they would probably have.)
That might be why it wasn't surprising to hear of things like someone entrusting a bunch of old university backup tapes to Google's stewardship.
This has played out with mixed results, and I think Google could be doing much better for humanity and for techie culture.
[+] [-] enneff|5 years ago|reply
If you look at the history, Google basically rescued the data from a collapsing Deja News, and made it available again. A nice gesture, which didn’t serve to benefit Google much in the long term.
If we want to preserve history then we can’t rely on for-profit companies. We need to instead fund non-profits whose specific charter is archival and preservation, like the Internet Archive.
[+] [-] dragonwriter|5 years ago|reply
Given the nature of Usenet, they were if anyone wanted them.
[+] [-] eternalban|5 years ago|reply
They cared enough about to kill it.
[+] [-] HenryKissinger|5 years ago|reply
[+] [-] jeffbee|5 years ago|reply
[+] [-] dabockster|5 years ago|reply
Brb archiving my Twitter posts
[+] [-] none10287|5 years ago|reply
So I do think they have an obligation either a) to make the whole archive available for anyone or b) maintain it properly.
Properly means restoring the fast UI from around 2004.
[+] [-] icheishvili|5 years ago|reply
[+] [-] john-shaffer|5 years ago|reply
[1] https://github.com/awslabs/serverless-image-handler
[+] [-] userbinator|5 years ago|reply
Also, and this may be a bit of a tangential point, but the "deny the past because it has something bad" that Google has effectively done here is uncomfortably close to the set of recent and far more political events.
[+] [-] SyneRyder|5 years ago|reply
You just reminded me of a quote from an electronic music documentary 25 years ago. One of the Detroit techno artists insisted on taking the filmmakers to a historic theatre that had been left to crumble & turned into a car park:
"In America especially, nobody tends to care about these kinds of things. People in America tend to let this shit just die, let it go. No respect for the history. I, being a techno, electronic, high-tech futurist musician, I totally believe in the future! But as well, I believe in a historic and well kept past. I believe there are some things that are important. Now, maybe this is more important like this, because in this atmosphere, you can realize how much people don't care, how much they don't respect. And it can make you realize how much you should respect."
- Derrick May, DJ/Composer, Universal Techno (1996)
https://youtube.com/watch?v=tdox6H7FJBU&t=955s
The segment starts at 16:00 in the video and is about 2 minutes long.
[+] [-] jolmg|5 years ago|reply
You may be surprised that it's not just companies. It's not hard to find people who think it's better for old stuff to just be deleted.
[+] [-] Animats|5 years ago|reply
[+] [-] fmajid|5 years ago|reply
In fact Usenet predates spam itself, since the first spam (Canter & Siegel) was on Usenet itself in 1994 (I was there).
[+] [-] aidenn0|5 years ago|reply
[+] [-] CrankyBear|5 years ago|reply
[+] [-] imhoguy|5 years ago|reply
[+] [-] rdiddly|5 years ago|reply
[+] [-] WoodenChair|5 years ago|reply
[+] [-] summerlight|5 years ago|reply
Looks like there has been (likely automated, nearly all of them are the same Italian phrase) mechanical legal complaints and it probably caused this instance of automated blocking going wild.
As an engineer I can understand the desire to automate everything, but please at least have some heuristics to detect this kind of easy-to-detect mechanical behavior before giving the model a full authority to block anyone it doesn't like.
[+] [-] jolmg|5 years ago|reply
Was I naive in thinking that The Internet Archive would have long archived this type of thing?
[+] [-] msie|5 years ago|reply
[+] [-] mark_l_watson|5 years ago|reply
Thanks for posting this, it reminded me to donate again to archive.org, which I just did.
I use ‘culture’ to include anything creative, anything that we experience as humans. Everything should be preserved, schools should be well funded, as should the arts.
[+] [-] lkirk|5 years ago|reply