I worked on the original (long since superseded) implementation of the metadata store for Google Drive, i.e. the system which was responsible for tracking file / folder relationships. The requirement to allow an item to appear in multiple locations was a huge complication, in part because of the way it interacted with permissions being inherited from a folder to the items in that folder. I imagine this change may be motivated by a desire to move away from that complex model, and whatever team owns that system now may be very happy to see it going away.
(IIRC, the requirement stemmed from the need to support the various applications that were being folded into / integrated with Google Drive, such as Photos which of course allows a photo to appear in multiple albums.)
This was my understanding as well. The original Drive was built effectively as a directed graph (with cycles allowed). Any file or folder could be stored in multiple locations. And permissions were at a per-file basis, so 2 people viewing the same folder may see different sets of files.
And permissions were definitely a hard part of it, as if you applied new permissions to a folder and all children, it had to walk the entire graph to update the permissions.
This is the advantage of the Team Drive style structure that the Drive team put out. It follows the classic filesystem design of a tree, which allows for easier permissions modeling, among other things. It's also why all "hard links" are now becoming shortcuts / Soft-links.
I worked on another sync client's representation of filesystem structure, and came to the same conclusion. Hard links enable some cool behavior, but in retrospect added more complexity than anyone expected. Migrating to shortcuts / soft links seems very reasonable - I wish I had started there.
> various applications that were being folded into / integrated with Google Drive
The Photos/Drive integration was removed a long time ago. What other integrations were behind the original requirement? I'm curious to know if the extra complication was worth it in the long run and how long the integrations that needed this feature hung around for.
Shouldn't de-duping be an implementation detail of the virtual file system, completely hidden from the user?
Isn't that how file sharing/syncing services have worked since the MegaUpload days?
If I upload the same file to two separate folders it's because I want two separate copies. If I change one of the copies, I don't want it to change the other copy.
> If I upload the same file to two separate folders it's because I want two separate copies. If I change one of the copies, I don't want it to change the other copy.
If that's what you were doing, you won't get affected by this. This is about files/folders that were "hardlinked", which was difficult to do by accident. I think you had to hold Ctrl while dragging the file into another folder, or something like that. (The key to notice is that they're talking about one file being in multiple directories, not multiple files with identical contents.)
>If I upload the same file to two separate folders it's because I want two separate copies. If I change one of the copies, I don't want it to change the other copy.
That's precisely the behavior you'll get. You were allowed in the previous implementation to upload a file to one location and then put it in 2 locations, such that you would have changes in either location reflected the same way. This wasn't 'copying' a file, it was multiparenting it.
This is a concern of presentation, not the backing implementation: How to present (approximately) hardlinked files to users, with the possible complications of varying support for hard and soft links or other mechanisms across the various client implementations of Google Drive.
There's advantages and disadvantages to using either.
Precisely. I can understand if they want to change the wording on their interfaces going forward to promote folks to use "shortcuts" but it sounds awful if they're going to do this to existing files/directories.
If someone has a Drive desktop client installed, has two source-code directories with some identical files in them, and modifies one of the identical files in just one of the directories, I can imagine they'd be very surprised when the other copy in the untouched directory also changes.
I'm on Linux where there's no official Drive client, so this won't happen to me. (I use Syncthing instead.)
What you are describing sounds like two distinct files with the same content. The change only affects the same file that has been "hard linked" into two separate folders. Copies of files are unaffected.
Google Drive changing causing issues like this is why I moved to Syncthing. In google drive every so often I would have to de-duplicate a bunch of files appended with (1).
I think this is a good move – the cloning UX experience was a nightmare. I've moved many shared files to Team Drives because the language is easier for most of understand.
I imagine this was a tough call for a PM, with a lot of cases to consider and account for given this is so embedded in the Drive product DNA.
"The process will replace all but one location of files and folders that are currently in multiple locations. The files and folders will be replaced with shortcuts."
Is there any way for the user to specify that they want a full copy of a file?
What happens if another user makes a copy of the file and alters it? Are both copies changed?
"The replacement decision will be based on original file and folder ownership, and will also consider access and activity on all other folders to ensure the least possible disruption for collaboration."
"You can’t opt-out of the replacement."
This might be a deal-breaker for some users. Why not just ask the user if they want a replacement versus a full copy?
> This might be a deal-breaker for some users. Why not just ask the user if they want a replacement versus a full copy?
Shortcut preserves semantics: working on the original file or working on a shortcut to the original file will both modify the same document. Fully copy (create a new document with same contents as original document at a point in time) would bring new semantics.
I couldn't figure this out from reading the article but perhaps someone here knows. Say I want to create a new edited version of a document without changing the original, so I first duplicate that doc and then edit the new copy of it. Does this new Drive behavior mean that the original document I copied from will be changed as well?
Google have been working up to this change for quite a while now. Rclone supports shortcuts from version v1.54.0 released on 2021-02-02. I've been impressed with the communication from Google and the care they've taken to keep things working in what must be a difficult transition.
I hope that this change can finally unlock the API to be able to return all the children of a given node recursively. Multiple parents make this much harder.
The drive API doesn't have that at the moment and it makes traversing deep directory trees really painful.
An API search term to find the objects which have a given ID as an ancestor at any depth would be fantastic.
I use Google Drive as a trailing safety backup. Client installed on a machine that updates once a month with all the files from drive. Manual process. In case something happens and a folder gets wiped off of Google Drive I can walk over and transfer the files from the trailing safety backup since Google Drive actually copies the files.
Under the new process the files will no longer exist on the drive, but will now be links to files on Google Drive. Is that correct?
If you're actually copying the files then no; this removes the ability for a file ID to be able to exist in two places at once, but if you're copying drive->another medium->drive a new file ID is being created on the user's drive. This is regardless of whether or not the file is being de-duped behind the scenes.
Until today, I didn't know you could create hard links (ie "multi-parent" files/folders) in Google Drive.
How did one achieve this? I would like to know, because I'm wondering if I have unintentionally done so. I gather this is completely different from "Make a copy" in the web UI. So how did you do it?
So, if I intentionally have the same photo in two different directories, because I want to edit one while keeping the other as a fallback, GDrive has just killed off my failsafe. Do I understand this correctly?
If so, what more compelling reason could there be to go migrate right away to Dropbox (or a similar service)?
No you did not understand that correctly. You are confusing the same file stored under two locations and two different files stored under two locations with the same contents. This has been discussed multiple times in other comments on this thread.
Slightly related, has anyone moved off Google Drive and into NextCloud or similar and been happy with it?
I'm losing access to the unlimited Google Drive storage that my uni provided and trying to figure out where I should move to.
A NAS would be great but at this moment I'm too nomadic to want to worry about that.
I'm fine with paying but would rather pay an organization that's very respecting of privacy and less likely to nuke your account without warning if you do something they don't like.
I switched from Dropbox to Nextcloud, and within a couple of months from Nextcloud to Google Drive. Really didn't like the software and it was so slow since I wasn't self-hosting it, but renting it from Hetzner.
Currently thinking of switching from Google Drive to Syncthing, since the new Google Drive clients suck and Google is going to be making my service worse with the new G Suite changes.
This is unrelated to that. It isn't about de-duping data, it's about symbolic links ("shortcuts") vs. hard links ("multi-parenting"). You can still make copies of files as usual, and the content can be de-duped (or not) transparently to users.
This whole article sounds like "we're remaking the backend, and no longer support the same file being in multiple locations, so we're just going to break any users using that feature."
My understanding is that this is mostly an attempt to improve the UX around permissions. The inherited sharing and permissions for files in multiple folders were incredibly confusing.
They have disallowed this for Shared Drives from the start as shared drives have ownership and strictly hierarchical permissions. Now they want to bring this UX simplification to everywhere in drive.
I'm sure they are happy to simplify the backend but this definitely makes the product less confusing. It does however make some rare workflows very complicated.
[+] [-] snewman|4 years ago|reply
(IIRC, the requirement stemmed from the need to support the various applications that were being folded into / integrated with Google Drive, such as Photos which of course allows a photo to appear in multiple albums.)
[+] [-] kyrra|4 years ago|reply
This was my understanding as well. The original Drive was built effectively as a directed graph (with cycles allowed). Any file or folder could be stored in multiple locations. And permissions were at a per-file basis, so 2 people viewing the same folder may see different sets of files.
And permissions were definitely a hard part of it, as if you applied new permissions to a folder and all children, it had to walk the entire graph to update the permissions.
This is the advantage of the Team Drive style structure that the Drive team put out. It follows the classic filesystem design of a tree, which allows for easier permissions modeling, among other things. It's also why all "hard links" are now becoming shortcuts / Soft-links.
[+] [-] zmj|4 years ago|reply
[+] [-] andybak|4 years ago|reply
The Photos/Drive integration was removed a long time ago. What other integrations were behind the original requirement? I'm curious to know if the extra complication was worth it in the long run and how long the integrations that needed this feature hung around for.
[+] [-] layer8|4 years ago|reply
[+] [-] grandpoobah|4 years ago|reply
Isn't that how file sharing/syncing services have worked since the MegaUpload days?
If I upload the same file to two separate folders it's because I want two separate copies. If I change one of the copies, I don't want it to change the other copy.
[+] [-] dataflow|4 years ago|reply
If that's what you were doing, you won't get affected by this. This is about files/folders that were "hardlinked", which was difficult to do by accident. I think you had to hold Ctrl while dragging the file into another folder, or something like that. (The key to notice is that they're talking about one file being in multiple directories, not multiple files with identical contents.)
[+] [-] deepl_derber|4 years ago|reply
That's precisely the behavior you'll get. You were allowed in the previous implementation to upload a file to one location and then put it in 2 locations, such that you would have changes in either location reflected the same way. This wasn't 'copying' a file, it was multiparenting it.
[+] [-] lxgr|4 years ago|reply
There's advantages and disadvantages to using either.
[+] [-] sangeeth96|4 years ago|reply
[+] [-] sowbug|4 years ago|reply
I'm on Linux where there's no official Drive client, so this won't happen to me. (I use Syncthing instead.)
[+] [-] kevincox|4 years ago|reply
[+] [-] CPAhem|4 years ago|reply
I'm on Linux with Syncdocs for syncing Google Drive so will wait and see how it handles things.
[+] [-] akudlacek|4 years ago|reply
[+] [-] ndynan|4 years ago|reply
I imagine this was a tough call for a PM, with a lot of cases to consider and account for given this is so embedded in the Drive product DNA.
[+] [-] thesuperbigfrog|4 years ago|reply
Is there any way for the user to specify that they want a full copy of a file?
What happens if another user makes a copy of the file and alters it? Are both copies changed?
"The replacement decision will be based on original file and folder ownership, and will also consider access and activity on all other folders to ensure the least possible disruption for collaboration."
"You can’t opt-out of the replacement."
This might be a deal-breaker for some users. Why not just ask the user if they want a replacement versus a full copy?
[+] [-] Rygian|4 years ago|reply
Shortcut preserves semantics: working on the original file or working on a shortcut to the original file will both modify the same document. Fully copy (create a new document with same contents as original document at a point in time) would bring new semantics.
[+] [-] markstos|4 years ago|reply
[+] [-] markstos|4 years ago|reply
Most people I know prefer symlinks for most uses, so this feels like better UX.
[+] [-] lupire|4 years ago|reply
[+] [-] rurp|4 years ago|reply
[+] [-] sanderjd|4 years ago|reply
[+] [-] nickcw|4 years ago|reply
I hope that this change can finally unlock the API to be able to return all the children of a given node recursively. Multiple parents make this much harder.
The drive API doesn't have that at the moment and it makes traversing deep directory trees really painful.
An API search term to find the objects which have a given ID as an ancestor at any depth would be fantastic.
[+] [-] pgrote|4 years ago|reply
Under the new process the files will no longer exist on the drive, but will now be links to files on Google Drive. Is that correct?
[+] [-] judge2020|4 years ago|reply
[+] [-] YPPH|4 years ago|reply
How did one achieve this? I would like to know, because I'm wondering if I have unintentionally done so. I gather this is completely different from "Make a copy" in the web UI. So how did you do it?
[+] [-] media-trivial|3 years ago|reply
[+] [-] ternaryoperator|4 years ago|reply
If so, what more compelling reason could there be to go migrate right away to Dropbox (or a similar service)?
[+] [-] kccqzy|4 years ago|reply
[+] [-] jpollock|4 years ago|reply
I might want to have different copies as "snapshot" and "working", de-duping them makes any version-control-like system mutable, doesn't it?
[+] [-] Rygian|4 years ago|reply
There is no de-duping mentioned anywhere in the Google support page.
[+] [-] raybb|4 years ago|reply
I'm losing access to the unlimited Google Drive storage that my uni provided and trying to figure out where I should move to.
A NAS would be great but at this moment I'm too nomadic to want to worry about that.
I'm fine with paying but would rather pay an organization that's very respecting of privacy and less likely to nuke your account without warning if you do something they don't like.
Only need a few hundred GB of space.
[+] [-] Hamuko|4 years ago|reply
Currently thinking of switching from Google Drive to Syncthing, since the new Google Drive clients suck and Google is going to be making my service worse with the new G Suite changes.
[+] [-] IceWreck|4 years ago|reply
I'm using https://filebrowser.org/
You can run it on a VPS, NAS or homeserver.
If you want something managed, you can pay Hetzner for managed nextcloud.
[+] [-] encryptluks2|4 years ago|reply
[+] [-] eshack94|4 years ago|reply
[+] [-] markstos|4 years ago|reply
[+] [-] whoomp12342|4 years ago|reply
[+] [-] nitinagg|4 years ago|reply
[+] [-] sanderjd|4 years ago|reply
[+] [-] londons_explore|4 years ago|reply
[+] [-] kevincox|4 years ago|reply
They have disallowed this for Shared Drives from the start as shared drives have ownership and strictly hierarchical permissions. Now they want to bring this UX simplification to everywhere in drive.
I'm sure they are happy to simplify the backend but this definitely makes the product less confusing. It does however make some rare workflows very complicated.
[+] [-] johndfsgdgdfg|4 years ago|reply
[deleted]