Does this pave the way for a “lite” version of the Dropbox client that _only_ syncs files and has none of the “added value” bloat that has crept in of late?
When are you going to offer a cheaper plan with less storage for people that only need <50GB?
I lucked out and have 2 free plans that have bonus storage from various promotions. I get about 25 GB per account. I haven't maxed either one.
I absolutely love the product. My wife scans a file, I can grab it right away. I'm at work and need some document (e.g., my driver's license photo), I hop on the website and download it.
I pay $5 for backblaze to backup 5TB. I don't want to spend $10 a month for storage I'll never use (I couldn't even keep that much synced on most of my devices) but I'd gladly pay $3-5 a month for 50-100GB.
Out of curiosity, how much does bandwidth usage contribute to your overall operational efficiency (as compared to for example the cost of running the actual servers)? Would totally understand if you can't answer this :)
This is why I continue to use Dropbox for daily work and constantly changing files. The syncing is unmatched. It’s surprising how bad the others like OneDrive and google drive are in comparison.
OneDrive completed its rollout of differential sync in April 2020[1], after beginning in Sep 2019. This should improve OneDrive’s sync speed substantially.
I recently switched from Dropbox because of the added device limitations for the free tier and because I don't really want to pay 10 euro a month for 2 TB of space when I only need 10 GB. Got myself a Nextcloud instance for third of the cost and I have to say that the syncing absolutely sucks. It's so bad that I'm going to migrate away from it as well.
Not going back to Dropbox yet though. I'd rather try out Google Drive since I consider it to be much better consumer plans.
I'm more of a security-focused engineer so I'm most interested in the "specially crafted low-privilege jail". What protocol gets data in and out, not shared memory I'm sure? Do the jail processes also have to implement an RPC server (protobuf/gRPC/HTTP?) or is there another mechanism for giving them work and receiving results?
And yes, much of the overhead stems from the RPC server that needs to be implemented. For lepton we used a raw TCP server (a simple fork/exec server) to answer compression requests. For Lepton we would establish a connection and send a raw file on the socket and await the compressed file on the same socket. A strict SECCOMP filter was used for lepton. It was nice to avoid this for broccoli since it was implemented in the safe subset of rust.
In my opinion broccoli does not go so well with bread (brötli = bread roll in swiss german), so some more matching name suggestions are: gipfeli (Croissant), weggli, pfünderli (500g bread), bürli, zöpfli
Savory with a touch of sweetness, Broccoli Bread cooks up like cornbread but offers fiber and calcium. The original name was Brot-cat-li (since files could be concatenated and compressed in parallel), but when we said it fast it sounded like "Broccoli" and the name stuck.
We heavily investigated zstd and met with the brilliant inventor, Yann, who provided amazing insights into the design and rationale behind zstd and why it is so fast and such an amazing technology. I also recompiled zstd into rust using https://github.com/immunant/c2rust and tried using various webasm mechanisms to run it (I didn't get webasm quite fast enough, and teaching c2rust to make it safe would be quite a slog).
But the main reason we settled on Brotli was the second order context modeling, which makes a substantial difference in the final size of files stored on Dropbox (several percent on average as I recall, with some files getting much, much smaller).
And for the storage of files, especially cold files, every percent improvement imparts a cost savings.
Also, widespread in-browser support of Brotli makes it possible for us to serve the dropbox files directly to browsers in the future (especially since they are concatenatable). Zstd browser support isn't at the same level today.
> Pre-coding: Since most of the data residing in our persistent store, Magic Pocket, has already been Brotli compressed using Broccoli, we can avoid recompression on the download path of the block download protocol. These pre-coded Brotli files have a latency advantage, since they can be delivered directly to clients, and a size advantage, since Magic Pocket contains Brotli codings optimized with a higher compression quality level.
It looks like they did, but having an implementation in a memory-safe language was one of their requirements. Learning that was for me the most fascinating part of the article.
> Maintaining a static list of the most common incompressible types within Dropbox and doing constant time checks against it in order to decide if we want to compress blocks
There is also a format-agnostic and adaptable heuristic to stop compression if the initial part (say, first 1MB) of the file seems incompressible. I'm not sure whether this is widespread, but I've seen at least one software doing that and it worked well. This can be combined with other kinds of heuristics like entropy estimation.
This is a really interesting write up of their use of Brotli! Makes me wonder if there might be a novel way I could leverage it beyond HTTP Responses.
I never realized the advantages of brotli over zlib could be so extensive, in particular, it appears they're getting a huge speed boost (I think also in part that its written in Rust)
>we were able to compress a file at 3x the rate of vanilla Google Brotli using multiple cores to compress the file and then concatenating each chunk.
Side note: I admit, at first I thought they were talking the Broccoli build system[0]
The tradeoff between client CPU time and upload speed is interesting. If they need to be able to output compressed text at 100mbps, that gives a budget of ~100ns/byte, or pretty much what they would have been spending with zlib in the first place. But on my fiber connection I only have a budget of 10ns/byte. Does that mean you'd use the equivalent of `brotli -q 1` for me? If so, doesn't the march of progress continually erode the advantages of compression in this use case?
They aren't on the same level of abstraction. Rsync currently uses zlib for block compression on the wire. Brotli/broccoli would be an alternative option.
Is there a pun between Broccoli and Brotli I'm not aware of? There's another Brotli compression tool called Broccoli (written in Go), just a coincidence?
Great question! We developed and deployed Lepton to losslessly encode JPEG image files. Lepton continues to deliver substantial storage and cost savings every year. You can read more about it here https://dropbox.tech/infrastructure/lepton-image-compression...
[+] [-] daniel_rh|5 years ago|reply
[+] [-] rcarmo|5 years ago|reply
Does this pave the way for a “lite” version of the Dropbox client that _only_ syncs files and has none of the “added value” bloat that has crept in of late?
That was one of the reasons I cancelled my paid plan: https://taoofmac.com/space/blog/2020/06/21/1600
[+] [-] Osiris|5 years ago|reply
I lucked out and have 2 free plans that have bonus storage from various promotions. I get about 25 GB per account. I haven't maxed either one.
I absolutely love the product. My wife scans a file, I can grab it right away. I'm at work and need some document (e.g., my driver's license photo), I hop on the website and download it.
I pay $5 for backblaze to backup 5TB. I don't want to spend $10 a month for storage I'll never use (I couldn't even keep that much synced on most of my devices) but I'd gladly pay $3-5 a month for 50-100GB.
For now, I'll keep mooching with my free plan.
[+] [-] adamsvystun|5 years ago|reply
[+] [-] CJefferson|5 years ago|reply
Does Dropbox still upload everything, even if the user has uploaded it before?
[+] [-] unknown|5 years ago|reply
[deleted]
[+] [-] manigandham|5 years ago|reply
[+] [-] signal11|5 years ago|reply
[1] https://techcommunity.microsoft.com/t5/office-365/onedrive-c...
[+] [-] Hamuko|5 years ago|reply
Not going back to Dropbox yet though. I'd rather try out Google Drive since I consider it to be much better consumer plans.
[+] [-] gnramires|5 years ago|reply
[+] [-] AaronFriel|5 years ago|reply
[+] [-] daniel_rh|5 years ago|reply
And yes, much of the overhead stems from the RPC server that needs to be implemented. For lepton we used a raw TCP server (a simple fork/exec server) to answer compression requests. For Lepton we would establish a connection and send a raw file on the socket and await the compressed file on the same socket. A strict SECCOMP filter was used for lepton. It was nice to avoid this for broccoli since it was implemented in the safe subset of rust.
[+] [-] rspoerri|5 years ago|reply
:-)
[+] [-] daniel_rh|5 years ago|reply
[+] [-] ipsum2|5 years ago|reply
[+] [-] glandium|5 years ago|reply
[+] [-] kevincox|5 years ago|reply
[+] [-] vmchale|5 years ago|reply
IME it's faster than brotli and often has a better compression ratio.
[+] [-] daniel_rh|5 years ago|reply
But the main reason we settled on Brotli was the second order context modeling, which makes a substantial difference in the final size of files stored on Dropbox (several percent on average as I recall, with some files getting much, much smaller). And for the storage of files, especially cold files, every percent improvement imparts a cost savings.
Also, widespread in-browser support of Brotli makes it possible for us to serve the dropbox files directly to browsers in the future (especially since they are concatenatable). Zstd browser support isn't at the same level today.
[+] [-] jainr|5 years ago|reply
> Pre-coding: Since most of the data residing in our persistent store, Magic Pocket, has already been Brotli compressed using Broccoli, we can avoid recompression on the download path of the block download protocol. These pre-coded Brotli files have a latency advantage, since they can be delivered directly to clients, and a size advantage, since Magic Pocket contains Brotli codings optimized with a higher compression quality level.
[+] [-] repiret|5 years ago|reply
[+] [-] lifthrasiir|5 years ago|reply
There is also a format-agnostic and adaptable heuristic to stop compression if the initial part (say, first 1MB) of the file seems incompressible. I'm not sure whether this is widespread, but I've seen at least one software doing that and it worked well. This can be combined with other kinds of heuristics like entropy estimation.
[+] [-] no_wizard|5 years ago|reply
I never realized the advantages of brotli over zlib could be so extensive, in particular, it appears they're getting a huge speed boost (I think also in part that its written in Rust)
>we were able to compress a file at 3x the rate of vanilla Google Brotli using multiple cores to compress the file and then concatenating each chunk.
Side note: I admit, at first I thought they were talking the Broccoli build system[0]
[0]https://github.com/broccolijs/broccoli
[+] [-] jeffbee|5 years ago|reply
[+] [-] shadykiller|5 years ago|reply
[+] [-] zmj|5 years ago|reply
[+] [-] lanius|5 years ago|reply
[+] [-] nerdponx|5 years ago|reply
[+] [-] tyingq|5 years ago|reply
[+] [-] daniel_rh|5 years ago|reply
[+] [-] andrewshadura|5 years ago|reply
[+] [-] Scaevolus|5 years ago|reply
[+] [-] jainr|5 years ago|reply
[+] [-] unknown|5 years ago|reply
[deleted]
[+] [-] rmhorn|5 years ago|reply
[+] [-] ksoong2|5 years ago|reply
[+] [-] myrloc|5 years ago|reply
Just kidding :) great article. As others have said, supporting data was very informative.