Broccoli: Syncing Faster by Syncing Less

[+] daniel_rh|5 years ago|reply

Hi folks, I'm Daniel from Dropbox, and I am happy to answer any questions about this tech.

[+] rcarmo|5 years ago|reply

Hi Daniel,

Does this pave the way for a “lite” version of the Dropbox client that _only_ syncs files and has none of the “added value” bloat that has crept in of late?

That was one of the reasons I cancelled my paid plan: https://taoofmac.com/space/blog/2020/06/21/1600

[+] Osiris|5 years ago|reply

When are you going to offer a cheaper plan with less storage for people that only need <50GB?

I lucked out and have 2 free plans that have bonus storage from various promotions. I get about 25 GB per account. I haven't maxed either one.

I absolutely love the product. My wife scans a file, I can grab it right away. I'm at work and need some document (e.g., my driver's license photo), I hop on the website and download it.

I pay $5 for backblaze to backup 5TB. I don't want to spend $10 a month for storage I'll never use (I couldn't even keep that much synced on most of my devices) but I'd gladly pay $3-5 a month for 50-100GB.

For now, I'll keep mooching with my free plan.

[+] adamsvystun|5 years ago|reply

Out of curiosity, how much does bandwidth usage contribute to your overall operational efficiency (as compared to for example the cost of running the actual servers)? Would totally understand if you can't answer this :)

[+] CJefferson|5 years ago|reply

My understanding is Dropbox used to first hash file, then see if a copy was already uploaded. That was removed as it was being used for piracy.

Does Dropbox still upload everything, even if the user has uploaded it before?

[+] unknown|5 years ago|reply

[deleted]

[+] manigandham|5 years ago|reply

This is why I continue to use Dropbox for daily work and constantly changing files. The syncing is unmatched. It’s surprising how bad the others like OneDrive and google drive are in comparison.

[+] signal11|5 years ago|reply

OneDrive completed its rollout of differential sync in April 2020[1], after beginning in Sep 2019. This should improve OneDrive’s sync speed substantially.

[1] https://techcommunity.microsoft.com/t5/office-365/onedrive-c...

[+] Hamuko|5 years ago|reply

I recently switched from Dropbox because of the added device limitations for the free tier and because I don't really want to pay 10 euro a month for 2 TB of space when I only need 10 GB. Got myself a Nextcloud instance for third of the cost and I have to say that the syncing absolutely sucks. It's so bad that I'm going to migrate away from it as well.

Not going back to Dropbox yet though. I'd rather try out Google Drive since I consider it to be much better consumer plans.

[+] gnramires|5 years ago|reply

I use Seafile, an open source alternative (with a German provider), and it works surprisingly well.

[+] AaronFriel|5 years ago|reply

I'm more of a security-focused engineer so I'm most interested in the "specially crafted low-privilege jail". What protocol gets data in and out, not shared memory I'm sure? Do the jail processes also have to implement an RPC server (protobuf/gRPC/HTTP?) or is there another mechanism for giving them work and receiving results?

[+] daniel_rh|5 years ago|reply

Dropbox uses a similar toolbox as https://chromium.googlesource.com/chromiumos/docs/+/master/s...

And yes, much of the overhead stems from the RPC server that needs to be implemented. For lepton we used a raw TCP server (a simple fork/exec server) to answer compression requests. For Lepton we would establish a connection and send a raw file on the socket and await the compressed file on the same socket. A strict SECCOMP filter was used for lepton. It was nice to avoid this for broccoli since it was implemented in the safe subset of rust.

[+] rspoerri|5 years ago|reply

In my opinion broccoli does not go so well with bread (brötli = bread roll in swiss german), so some more matching name suggestions are: gipfeli (Croissant), weggli, pfünderli (500g bread), bürli, zöpfli

:-)

[+] daniel_rh|5 years ago|reply

Savory with a touch of sweetness, Broccoli Bread cooks up like cornbread but offers fiber and calcium. The original name was Brot-cat-li (since files could be concatenated and compressed in parallel), but when we said it fast it sounded like "Broccoli" and the name stuck.

[+] ipsum2|5 years ago|reply

https://github.com/google/zopfli for the latter

[+] glandium|5 years ago|reply

But it goes well with courgette. https://www.chromium.org/developers/design-documents/softwar...

[+] kevincox|5 years ago|reply

The header on the page keeps hiding and reappearing as I scroll making it incredibly difficult to read.

[+] vmchale|5 years ago|reply

Surprised they didn't look more at zstd.

IME it's faster than brotli and often has a better compression ratio.

[+] daniel_rh|5 years ago|reply

We heavily investigated zstd and met with the brilliant inventor, Yann, who provided amazing insights into the design and rationale behind zstd and why it is so fast and such an amazing technology. I also recompiled zstd into rust using https://github.com/immunant/c2rust and tried using various webasm mechanisms to run it (I didn't get webasm quite fast enough, and teaching c2rust to make it safe would be quite a slog).

But the main reason we settled on Brotli was the second order context modeling, which makes a substantial difference in the final size of files stored on Dropbox (several percent on average as I recall, with some files getting much, much smaller). And for the storage of files, especially cold files, every percent improvement imparts a cost savings.

Also, widespread in-browser support of Brotli makes it possible for us to serve the dropbox files directly to browsers in the future (especially since they are concatenatable). Zstd browser support isn't at the same level today.

[+] jainr|5 years ago|reply

From the blog:

> Pre-coding: Since most of the data residing in our persistent store, Magic Pocket, has already been Brotli compressed using Broccoli, we can avoid recompression on the download path of the block download protocol. These pre-coded Brotli files have a latency advantage, since they can be delivered directly to clients, and a size advantage, since Magic Pocket contains Brotli codings optimized with a higher compression quality level.

[+] repiret|5 years ago|reply

It looks like they did, but having an implementation in a memory-safe language was one of their requirements. Learning that was for me the most fascinating part of the article.

[+] lifthrasiir|5 years ago|reply

> Maintaining a static list of the most common incompressible types within Dropbox and doing constant time checks against it in order to decide if we want to compress blocks

There is also a format-agnostic and adaptable heuristic to stop compression if the initial part (say, first 1MB) of the file seems incompressible. I'm not sure whether this is widespread, but I've seen at least one software doing that and it worked well. This can be combined with other kinds of heuristics like entropy estimation.

[+] no_wizard|5 years ago|reply

This is a really interesting write up of their use of Brotli! Makes me wonder if there might be a novel way I could leverage it beyond HTTP Responses.

I never realized the advantages of brotli over zlib could be so extensive, in particular, it appears they're getting a huge speed boost (I think also in part that its written in Rust)

>we were able to compress a file at 3x the rate of vanilla Google Brotli using multiple cores to compress the file and then concatenating each chunk.

Side note: I admit, at first I thought they were talking the Broccoli build system[0]

[0]https://github.com/broccolijs/broccoli

[+] jeffbee|5 years ago|reply

The tradeoff between client CPU time and upload speed is interesting. If they need to be able to output compressed text at 100mbps, that gives a budget of ~100ns/byte, or pretty much what they would have been spending with zlib in the first place. But on my fiber connection I only have a budget of 10ns/byte. Does that mean you'd use the equivalent of `brotli -q 1` for me? If so, doesn't the march of progress continually erode the advantages of compression in this use case?

[+] shadykiller|5 years ago|reply

Is it possible to use this as rsync replacement ?

[+] zmj|5 years ago|reply

They aren't on the same level of abstraction. Rsync currently uses zlib for block compression on the wire. Brotli/broccoli would be an alternative option.

[+] lanius|5 years ago|reply

Is there a pun between Broccoli and Brotli I'm not aware of? There's another Brotli compression tool called Broccoli (written in Go), just a coincidence?

[+] nerdponx|5 years ago|reply

We codenamed the Brotli compressor in Rust “Broccoli” because of the capability to make Brotli files concatenate with one another (brot-cat-li).

[+] tyingq|5 years ago|reply

Curious if there's enough of any one type of file that a specialty compression for it would be worth the added complexity.

[+] daniel_rh|5 years ago|reply

Great question! We developed and deployed Lepton to losslessly encode JPEG image files. Lepton continues to deliver substantial storage and cost savings every year. You can read more about it here https://dropbox.tech/infrastructure/lepton-image-compression...

[+] andrewshadura|5 years ago|reply

I wonder whether syncthing can use it.

[+] Scaevolus|5 years ago|reply

None of the images are loading. :(

[+] jainr|5 years ago|reply

Should be fixed now :)

[+] unknown|5 years ago|reply

[deleted]

[+] rmhorn|5 years ago|reply

Good supporting data

[+] ksoong2|5 years ago|reply

Yeah, I really like how well the performance is quantified

[+] myrloc|5 years ago|reply

Middle out compression has shown considerable performance over the investigated options listed in the article. I wonder why it was not mentioned?

Just kidding :) great article. As others have said, supporting data was very informative.

53 comments