I have had a Google sheet I created years ago and edit periodically that has also been flagged but not removed. It has a number of URLs. I suppose it's possible at least one of the URLs is no longer legitimate.
There is a yellow banner "This file looks suspicious.
It might be used to steal your personal information" and an option to "Request a review".
It also says "This file can still be viewed, edited, and shared, but users will see a warning that alerts them that the content may be harmful. These restrictions were put in place because this content violates Google Drive's Phishing policy."
There is no indication of why the file was flagged.
Despite my concerns about a human looking at my personal file, I bit the bullet and clicked review several times but the banner remains after several months. Google support hasn't been helpful even though I'm a paid user.
I have now received 5 emails from "Google Drive Safety" notifying me of this alleged violation.
screw this technocratic neofeudalist garbage. ive been encrypting my entire google drive for a while now for exactly this kind of overreach in the past.
If I were you I'd back the whole of your Google account up ASAP. It wouldn't be unheard of for Google to suspend your entire account for a violation on a single product.
Far more scary is that Google already has automated systems to classify content and probably very closely works with government to keep a close eye on citizen.
> It has a number of URLs. I suppose it's possible at least one of the URLs is no longer legitimate.
> "These restrictions were put in place because this content violates Google Drive's Phishing policy."
> There is no indication of why the file was flagged.
Isn't your answer in the warning (albeit lacking some specificity)? If you have a bunch of URLs in there and admit not knowing about their legitimacy, it seems reasonable that at least one of them matches a database of known phishing URLs. Isn't it also reasonable to expect (and in many cases want) a company hosting publicly-shared files to notify users when content matches some heuristics for unsafe content, especially given the prevalence of phishing attempts?
That said, there may still be a case to argue that the review process, heuristic, lack of transparency, and other implementation details are flawed. But I don't have a problem with them posting a warning message on content that looks suspicious.
Complete removal, on the other hand (as in the case of OP) is another story, especially given Google's laughable process (or lack thereof) for appealing such cases.
It's a shame that Google are choosing to focus on policing what people think of as their private data.
Meanwhile, they seem to be taking their eye off the ball when it comes to the spam filtering that many of their users might wish for. In just the past week, I've received noticeable amounts of spam via: Gmail, Google Drive, Google Calendar, and Google Photos.
I'm puzzled about how Google are choosing to allocate their resources. It doesn't seem likely that governments would be asking Google to specifically police spreadsheets for possible phishing data. The owners of the files definitely aren't asking for their access to their own data to be cut off. So what are the origins of this effort?
I have to admit, I originally thought that Google Drive was the obvious choice over every alternative that existed, but I can see now that I would prefer an offline or privacy-driven alternative. The risk of losing files to a Google Drive machine learning black hole & then facing Google's customer service black hole might be small, but it's also nightmarish.
I taught data journalism at the graduate school level (not to the OP) and I regularly used Google Sheets as a channel for hosting data, which were typically U.S. public datasets, like city crime reports, Census demographics, election results, etc. This is in addition to hosting CSV and .sqlite files on the class homepage and Github repo.
The reason why Google Sheets was so handy was because it's a great interface for data exploration. I could show students features about the data, and sort/filter/highlight, without having to distribute/create spreadsheet files. I guess if Google Sheets ends up being draconian with its content filters, I could link to Github-hosted CSVs, but it's not quite the same.
Haven't checked my classroom Google sheets recently, but if they're untouched, it might have helped that they were all set to public view. Maybe the flagging algorithm looks at private sheets with much more suspicion.
Have you tried using anything non-Google recently? Everything else is much, much worse IMO. Remember we all mostly moved to Google because they were the only ones who could actually deal with spam, gaming, and the more sophisticated attacks at scale. Now that they're the only game in town, adversaries are focusing hard on getting past their filters etc, and some are getting through.
When you have literally millions of people whose livelihood depends on gaming algorithms, or getting past content filters, then some highly intelligent and motivated individuals are going to get through, that's a fact of life.
When someone figures out how to train their robots better than Google, then Google will have some competition, but they seem to be about a decade ahead of everyone else, and only accelerating relative to the rest of the field, so I'm doubtful that anyone is going to catch up any time soon.
"Sadly it won't even be the next generation as they're all rusted into the ecosystem with Chromebooks provided by their school districts. "
This is so true that it hurts. I went back to college later in my life starting in 2018. It's astonishing to me how ingrained Google services are to the kids I'm in school with. Even though we have Office 365 and a full suite of apps available to us for free, when we do group work the de-facto decision that students come to is to create Google drive shares, or shared Google docs that are tied to personal accounts. It gets so stupidly messy and the apps just seem so inferior. Everyone uses the same Slides template so everything looks the same. Ugh.
> Is this something researchers using Google Drive should be aware of?
This is something EVERYONE using Google Drive should be aware of. Your data on Google Drive is scanned by Google (and not only for policy violations), meaning it is not secured FROM Google, and false positives can and will cause your data to be lost and cause your account to be flagged. Using Google Sheets means that is your only copy of the data, so you also have no backup.
This is why I use local storage as my backup methods. Just upgraded my total storage capacity to 28TB with 3 copies. I may add cloud backups once I figure out how to make sure my uploads are encrypted, but owning your own storage is the only surefire way to make sure nobody else can play games with your data
The key point, to me, is that Google monitors all content. I don't think it's much of a surprise, but in context of Google Drive, it's pretty terrible. Drive is a anti-product like this. It's like if Western Digital decided to update the disk firmware to just ship your data to them for scrutiny.
Stab in the dark but I imagine some of those campaign website domains lapsed, were picked up by domain expiry people because they probably have very good visibility (i.e. a lot of links to those pages from official/.gov sites), and had phishing or sketchy landing pages put on them.
Google then prunes these malicious pages from their search and then also flags any docs that have those links in them.
The cloud is someone elses computer. Act accordingly.
It's a shame this person lost their work. More and more stories like this need to come to light and we need to convince others the cloud is a risk, not a solution.
It was a solution. Google Drive was great until they turned user/customer hostile. Arguably, no one could really have seen this coming, because this doesn't even seem to make business sense. Google are seemingly acting against the reputation of their own services.
One of the killer features of a QNAP NAS is the built in cloud backup utilities for Google and Microsoft - it always amazes me how often I have to fight with people to turn them on.
You'd think after the original Photobucket meltdown people would be more cognizant of stuff like this, but here we (still) are :(
Part of the response is for users to trust the cloud less, but I wish part of the response would be to increasingly hold cloud storage companies accountable such that in the future they can't get away with this kind of abuse without severe consequences for them.
It's getting more and more apparent moving Office/Productivity apps from your local system to web based is a huge mistake. In 20 years, I'm going to be a like the guy who writes manuscripts in Word Perfect 4.
It's always been apparent to some of us. Unless I have an explicit need to actively collaborate with another person on some document I've continued to use locally saved LibreOffice documents for everything, and even then I still download a copy from Google Drive when finished/periodically.
Am thinking that a few of the domains on the list expired and were snapped up by malware/scammer sites causing the spreadsheet to seem super sus to a machine scanning/checking urls.
How do you know 'google takes election stuff seriously'? Causation!=effect. TBH I'm surprised the OP even asked the question 'why did an algorithm remove my stuff?'. No google employee is going to tell you (even if they knew, which is quite doubtful given the complexities of machine learning). If you sign up for a service which is machine-managed, you are subject to the whims of the machine. Occams razor suggests it is unlikely there is a conspiracy, or malice at play. The machine is just doing its job for its masters, possibly badly, possibly not, but the general populace will never know one way or the other.
what did your voting tool do? I suspect google like many other tech companies are trying to avoid having their platforms used to manipulate election activities and the definition of manipulate might be very broad. It being funded by a campaign might not matter to google at all.
This is some heavy slippery slope shit. If you told people that this would be happening in 15 years back in 2007 you'd be called a paranoid conspiracy nut.
> It has a number of URLs. I suppose it's possible at least one of the URLs is no longer legitimate
Just a guess, but since it is a file which lists website urls one of those domains was probably flagged as a phishing site, and since your doc had a url with the same domain it got flagged as well. It wouldn't surprise me to find that one of those candidates' sites had actually been hijacked by some phisher. Unfortunately just visiting each site wouldn't necessarily reveal which one it was since they'd likely have tucked the php shell or other phishing stuff at some path away from the root of the site to avoid detection from the site's maintainer.
This is our regular reminder from Google that "the Cloud" means "someone else's computer". Whatever you store in the cloud is not under your control.
The cloud can be convenient, but it's not under your control, so always also keep a copy that's under your control. Of course your local copy isn't secure either, so having a copy in the cloud is still a good idea, but it shouldn't be your only copy.
I've found mega.nz to be a great alternative to Google Drive. I've never had issues with feature parity and it even has a proper Linux client.
The combination of flag-happy AI and the way Google will nuke your entire account without recourse* or a human ever being in the loop makes them a non-starter for me.
*Google might give you recourse if you can get a public uproar going on Twitter or HN
Lol, if only there were an open office like suite available with a libre license which could generate files in some sort of open documented format which could be shared by all through a clickable interface within a browser...
[+] [-] computer23|3 years ago|reply
There is a yellow banner "This file looks suspicious. It might be used to steal your personal information" and an option to "Request a review".
It also says "This file can still be viewed, edited, and shared, but users will see a warning that alerts them that the content may be harmful. These restrictions were put in place because this content violates Google Drive's Phishing policy."
There is no indication of why the file was flagged.
Despite my concerns about a human looking at my personal file, I bit the bullet and clicked review several times but the banner remains after several months. Google support hasn't been helpful even though I'm a paid user.
I have now received 5 emails from "Google Drive Safety" notifying me of this alleged violation.
[+] [-] nimbius|3 years ago|reply
https://dev.to/petarov/store-encrypted-files-in-google-drive...
[+] [-] kypro|3 years ago|reply
[+] [-] bsedlm|3 years ago|reply
I'd wager the "reviewer" it's merely a more computationally expensive process
[+] [-] raxxorraxor|3 years ago|reply
[+] [-] curiousgeorgio|3 years ago|reply
> "These restrictions were put in place because this content violates Google Drive's Phishing policy."
> There is no indication of why the file was flagged.
Isn't your answer in the warning (albeit lacking some specificity)? If you have a bunch of URLs in there and admit not knowing about their legitimacy, it seems reasonable that at least one of them matches a database of known phishing URLs. Isn't it also reasonable to expect (and in many cases want) a company hosting publicly-shared files to notify users when content matches some heuristics for unsafe content, especially given the prevalence of phishing attempts?
That said, there may still be a case to argue that the review process, heuristic, lack of transparency, and other implementation details are flawed. But I don't have a problem with them posting a warning message on content that looks suspicious.
Complete removal, on the other hand (as in the case of OP) is another story, especially given Google's laughable process (or lack thereof) for appealing such cases.
[+] [-] cube00|3 years ago|reply
80/20 incomplete machine learning model.
One day people and institutions will get the message and stop using Google for important things.
Sadly it won't even be the next generation as they're all rusted into the ecosystem with Chromebooks provided by their school districts.
[+] [-] jfoster|3 years ago|reply
Meanwhile, they seem to be taking their eye off the ball when it comes to the spam filtering that many of their users might wish for. In just the past week, I've received noticeable amounts of spam via: Gmail, Google Drive, Google Calendar, and Google Photos.
I'm puzzled about how Google are choosing to allocate their resources. It doesn't seem likely that governments would be asking Google to specifically police spreadsheets for possible phishing data. The owners of the files definitely aren't asking for their access to their own data to be cut off. So what are the origins of this effort?
I have to admit, I originally thought that Google Drive was the obvious choice over every alternative that existed, but I can see now that I would prefer an offline or privacy-driven alternative. The risk of losing files to a Google Drive machine learning black hole & then facing Google's customer service black hole might be small, but it's also nightmarish.
[+] [-] danso|3 years ago|reply
The reason why Google Sheets was so handy was because it's a great interface for data exploration. I could show students features about the data, and sort/filter/highlight, without having to distribute/create spreadsheet files. I guess if Google Sheets ends up being draconian with its content filters, I could link to Github-hosted CSVs, but it's not quite the same.
Haven't checked my classroom Google sheets recently, but if they're untouched, it might have helped that they were all set to public view. Maybe the flagging algorithm looks at private sheets with much more suspicion.
[+] [-] px43|3 years ago|reply
When you have literally millions of people whose livelihood depends on gaming algorithms, or getting past content filters, then some highly intelligent and motivated individuals are going to get through, that's a fact of life.
When someone figures out how to train their robots better than Google, then Google will have some competition, but they seem to be about a decade ahead of everyone else, and only accelerating relative to the rest of the field, so I'm doubtful that anyone is going to catch up any time soon.
[+] [-] dillutedfixer|3 years ago|reply
This is so true that it hurts. I went back to college later in my life starting in 2018. It's astonishing to me how ingrained Google services are to the kids I'm in school with. Even though we have Office 365 and a full suite of apps available to us for free, when we do group work the de-facto decision that students come to is to create Google drive shares, or shared Google docs that are tied to personal accounts. It gets so stupidly messy and the apps just seem so inferior. Everyone uses the same Slides template so everything looks the same. Ugh.
[+] [-] chaosharmonic|3 years ago|reply
I've taken to telling people for years now, for this exact reason, that friends don't let friends use Google's password manager.
[+] [-] mysterydip|3 years ago|reply
[+] [-] FredPret|3 years ago|reply
[+] [-] bedast|3 years ago|reply
This is something EVERYONE using Google Drive should be aware of. Your data on Google Drive is scanned by Google (and not only for policy violations), meaning it is not secured FROM Google, and false positives can and will cause your data to be lost and cause your account to be flagged. Using Google Sheets means that is your only copy of the data, so you also have no backup.
Google does not provide support for this service.
[+] [-] silicon2401|3 years ago|reply
[+] [-] danso|3 years ago|reply
- The specific policy violated is "Phishing"
- Sheet was titled "Boston Election candidate websites"
- OP says it's a list of candidate names and their campaign websites
- Created in Oct 2021, removed in July 2022
- OP says a similar worksheet with many more records has not been removed
[+] [-] ibejoeb|3 years ago|reply
[+] [-] mmanfrin|3 years ago|reply
Google then prunes these malicious pages from their search and then also flags any docs that have those links in them.
[+] [-] ABeeSea|3 years ago|reply
[+] [-] wolpoli|3 years ago|reply
[+] [-] andybak|3 years ago|reply
[+] [-] vorpalhex|3 years ago|reply
It's a shame this person lost their work. More and more stories like this need to come to light and we need to convince others the cloud is a risk, not a solution.
[+] [-] jfoster|3 years ago|reply
[+] [-] EricE|3 years ago|reply
You'd think after the original Photobucket meltdown people would be more cognizant of stuff like this, but here we (still) are :(
[+] [-] domador|3 years ago|reply
[+] [-] golem14|3 years ago|reply
It's still pretty sucky.
[+] [-] jeffwask|3 years ago|reply
[+] [-] S201|3 years ago|reply
[+] [-] tholman|3 years ago|reply
[+] [-] TIPSIO|3 years ago|reply
I made a voting tool (legitimate and funded by a campaign) with a throw away email as the contact:
<state><voting>@gmail.com
It was removed. For the OP, this could be an auto flag error or this could be something else.
(Edit: I bet) Google takes the election stuff seriously.
[+] [-] pomatic|3 years ago|reply
[+] [-] LegitShady|3 years ago|reply
[+] [-] luxuryballs|3 years ago|reply
FTFY
[+] [-] lizardactivist|3 years ago|reply
[+] [-] colordrops|3 years ago|reply
[+] [-] bborud|3 years ago|reply
Update: thanks for tips. Will have a look when I get home.
[+] [-] layer8|3 years ago|reply
It integrates with NextCloud: https://nextcloud.com/collaboraonline/
[+] [-] haunter|3 years ago|reply
https://www.onlyoffice.com/download-workspace.aspx?from=defa...
[+] [-] sofixa|3 years ago|reply
[+] [-] jfoster|3 years ago|reply
[+] [-] xyclos|3 years ago|reply
Just a guess, but since it is a file which lists website urls one of those domains was probably flagged as a phishing site, and since your doc had a url with the same domain it got flagged as well. It wouldn't surprise me to find that one of those candidates' sites had actually been hijacked by some phisher. Unfortunately just visiting each site wouldn't necessarily reveal which one it was since they'd likely have tucked the php shell or other phishing stuff at some path away from the root of the site to avoid detection from the site's maintainer.
[+] [-] EricE|3 years ago|reply
[+] [-] njharman|3 years ago|reply
[+] [-] bks|3 years ago|reply
[+] [-] ptman|3 years ago|reply
It's possible to self-host for free if you use google services because of the price: https://paul.totterman.name/posts/free-clouds/
[+] [-] mcv|3 years ago|reply
The cloud can be convenient, but it's not under your control, so always also keep a copy that's under your control. Of course your local copy isn't secure either, so having a copy in the cloud is still a good idea, but it shouldn't be your only copy.
[+] [-] pilgrimfff|3 years ago|reply
The combination of flag-happy AI and the way Google will nuke your entire account without recourse* or a human ever being in the loop makes them a non-starter for me.
*Google might give you recourse if you can get a public uproar going on Twitter or HN
[+] [-] nmstoker|3 years ago|reply
Could understand certain kinds of causes being applied to public / shared files but private files ought to be pretty much untouched.
Of course, that's probably not how it really works when you're relying on cloud providers.
[+] [-] rob_c|3 years ago|reply
Nah that's stupid...