top | item 13580649

Intern Impact: Brotli compression for Play Store app downloads

313 points| abhikandoi2000 | 9 years ago |students.googleblog.com | reply

172 comments

order
[+] orliesaurus|9 years ago|reply
So wait, if I understand this article correctly she applied the compression because someone told her to? Or did she research herself and apllied the whole thing? I agree this is a bit like a "we re hiring interns" post
[+] dajohnson89|9 years ago|reply
The amount of negativity in the comments section here is astounding. How could you not be excited and happy for this promising young woman's achievement? No, her work will not put her on the shortlist for a Turing Award. But it is something any engineer should be proud of, and has real impact for millions of users.

You have a right to be unimpressed, but if you're taking the time to say "So what?" or "This is just a recruiting ad" then you should probably rethink. I never thought I'd say this, but the negativity here really indicates the kind of latent discrimination that so many URMs & women in tech complain about. I have literally no other explanation for it -- a senior engineer at Google could have implemented this compression and it would still be HN worthy, and nobody would be calling the blog article a fluffy PR piece.

[+] djrogers|9 years ago|reply
> a senior engineer at Google could have implemented this compression and it would still be HN worthy,

No, I doubt it would be. How many of the hundreds of little features in the google play store have been posted on HN with an article about the person who implemented them?

Also, I find it more than a little presumptuous of you to assume that any scoffing is due to sexism. I see the exact same cynicism and lack of awe in the posts below that I have come to expect from HN - regardless of gender or color of the person involved.

[+] amichal|9 years ago|reply
I agree, even if all she did was comment out one line of config and enable another line, doing it at scale, in production, is not to be done lightly. I suspect as a student intern she learned a lot about doing that and contributed a fair amount to making sure it was done carefully and correctly. I'm jealous actually and I change config lines like that every day.
[+] hathawsh|9 years ago|reply
In fact, this was not posted to the main Google blog, but rather a special "student" blog that's obviously designed for recruiting interns. Had I seen this post when I was going to school, I certainly would have seriously considered a Google internship.

So bravo, Anamaria, for completing important work for Google, and bravo, Google, for highlighting her work in an appropriate place. Not every good work needs to be earth shattering.

[+] therealmarv|9 years ago|reply
Even if this is a recruiting post... let's stick to the facts: Her work has nevertheless saved 1.5 PB per day. All of the people who are saying that everybody can do this and implement this: I bet your hacker excellent skills cannot save 1.5 PB traffic per day. And it's nice to see that an interns work can have so great results. Kudos to Google that they have great programs for young people. And I'm also forever thankful to them that I had the opportunity to work as Google Summer of Code Student and Mentor for them which was a true life changer for me!
[+] chmln|9 years ago|reply
Well, HN is where rather technical discussions take place.

And from a technological viewpoint, some people here may find that swapping gzip/etc for brotli is not that astonishing. After all, she didn't invent or implement Brotli, but merely applied it.

Note, this is not to denigrate her work in any way - the results, scaled at Google's level, are very impressive.

It is sad to see this turn into a gender-focused discussion, as I'm sure we'd see all the same "meh" comments here regardless of intern's gender or any other physical traits.

Also, I've noticed this trend where the top comment is attacking over-exaggerated "astounding negativity". There's skepticism here under nearly every article, and the odd couple of dead, downvoted comments; not "astounding negativity" by any means. While it's challenging to resist the desire to virtue signal and collect karma points, please don't exaggerate.

[+] ljk|9 years ago|reply
> the negativity here really indicates the kind of latent discrimination that so many URMs & women in tech complain about

was with you until this. hn's negativity is across the board, it's very unlikely there's discrimination going on here imo

[+] NetStrikeForce|9 years ago|reply
I agree with you.

It is things like this one which makes a great engineer. Not only to spend decades coding an earth-shattering algorithm that will take the industry by storm, but also knowing how to save 1.5PB per day with a rather simple (I guess it's not as simple) decision.

HN is supposedly full of business-minded technologists. How are we missing the great impact this had on the business is mind-boggling.

I can't find an explanation for this reaction either.

[+] codegeek|9 years ago|reply
"has real impact for millions of users"

Come on. That is so not hip these days. She saved a bunch of PB ? who cares. If only she built a nice shiny js app, now that is something to talk about

/s

[+] prewett|9 years ago|reply
If my impression was anything to go by, I clicked on the original title of "Intern saves Google 15M GB every day" (something like that) expecting to hear what on earth could have saved that much bandwidth. Instead, I got an advertisement for how great this woman is. (But, now that you want to hire her, too bad, she's already taken.) There was just one sentence that talked about what I went there for: namely, she changed the compression algorithm. If the title were "Look how awesome this Google intern is" I wouldn't have had a problem with it. On the other hand, I woudn't have clicked it, either.
[+] ocdtrekkie|9 years ago|reply
This is surely a great resume item for her, it's a direct benefit to the place she interned, which is pretty awesome. But I think people are quite well attuned to Google's marketing/recruiting efforts these days, and how they compare to the average example. (It's like when such-and-such for profit school brags about how it's graduates work for companies A, B, and C, but more than likely, that's a very small percentage, and the results for the average graduate are lower.)

So yeah, it's awesome she did this, but most interns should not expect this out of their Google internship.

[+] whatshisface|9 years ago|reply
I wouldn't call this discrimination - I don't think HN has ever treated early-career projects with a light touch. There's a curmudgeonly atmosphere in nearly all of these threads!
[+] davb|9 years ago|reply
URMs?
[+] Ar-Curunir|9 years ago|reply
This is typical HN; a bunch of people who think highly of themselves for not being sheep working for the big companies, and for whom everything that these companies do is easy peasy stuff that they could do in their sleep.

If I had done something to save anybody 1.5 Petabytes of bandwidth per day, I would be very content for at least a few months. Congratulations to the intern for having such a lasting impact.

[+] sidcool|9 years ago|reply
+1 I am proud and jealous of her. She did something amazing during her internship that I haven't done in my 10 year professional career. Hell she very well might win a Turing. I would happily have her as my mentor.
[+] mattlondon|9 years ago|reply
+1 This article a great of example of women in tech doing real-world, impactful, "proper" engineering work that directly benefits millions of users.

We need more stories like this.

Please - if you criticised this article about how an engineer implemented a newly-published compression algorithm that saved 1.5 petabytes day, please go take 2 minutes to think genuinely about why you criticised this article. Your 120 seconds of introspection will benefit our entire industry regardless of your motivations and conclusions.

[+] jdcarter|9 years ago|reply
> her work resulted in saving users an expected 1.5 petabytes (that's 1.5 million gigabytes) of data each day.

I'm guessing this is not a measure of data at rest, but data transferred over the network. The couple samples listed on the page ranged from 2.5% improvement to 20.3% (vs. zLib) so I guess they're extrapolating that out to all app downloads and updates across the world. Nicely done.

More generally, we've seen some great advances in compression lately. I've been using Facebook's zStandard [1] for compression in a product I'm currently working on, and I've been extremely pleased with both its speed and compression ratio. The days of "just use zLib" are coming to a close.

[1]: https://github.com/facebook/zstd

[+] rdtsc|9 years ago|reply
Are you worried at all about their patents stance. I currently I think it says if you litigate with Facebook you lose the license. Otherwise I agree zstd is looking like a very nice improvement in an area where most people think nothing happens. I especially dictionary compression bit.
[+] arenaninja|9 years ago|reply
Pretty cool that an intern was given this level of confidence. Less data for updating/installing applications is good no matter how you slice it
[+] mbesto|9 years ago|reply
I've worked with a fair number of people that graduated from the Mathematics and Informatics at Babeș-Bolyai University. I'm generally very impressed by them, and is just another data point of areas of that world that get overlooked.
[+] Syzygies|9 years ago|reply
Can we get her to work for DropBox? Every time my iPad GoodReader syncs my 1,000+ papers, it has to check every file. It boggles the mind that they don't support some version of change records.
[+] falloutx|9 years ago|reply
I don't whats the config of your computer, but Dropbox works like a charm for me. I have more than 600 gigs of data synced btwn Dropbox and my computer and it works pretty nicely. I never had to manually check whether a file has been transferred or not.
[+] tln|9 years ago|reply
Maybe the solution should start with GoodReader? I don't have syncing issues with the core app.
[+] bhouston|9 years ago|reply
I bet switching to LZMA would have saved even more. LZMA beats Brotli nearly every time. zStandard would likely have worked better as well. Brotli is very slow to compress.
[+] Someone1234|9 years ago|reply
That doesn't appear to be true:

https://cran.r-project.org/web/packages/brotli/vignettes/bro...

I'm sure you can use those results to argue that LZMA is superior in some way (e.g. compression speed) but it definitely isn't clear cut superior in other important ways (compressed size and decompression speed are inferior).

I can see why, given those results, that they would use Brotli over LZMA.

[+] gribbly|9 years ago|reply
I agree that LZMA beats Brotli in compression in the majority of cases (not by much), zStandard does not however.

The thing that makes Brotli attractive though, is that it has high compression (again, very close to LZMA, sometimes even better) while decompressing MUCH faster than LZMA.

The big downside is that it is very slow in compressing, which makes it mainly suitable for 'compress once, decompress MANY times' type data.

[+] okreallywtf|9 years ago|reply
There isn't much information but this reads more like an advertisement for google internships than anything else. Not to denigrate her work, she could very well be brilliant and have gone above and beyond, but from how it reads they could be blowing it up to make it seem like every intern has a huge impact and you could too! Either way good for her, but not sure why this is so high up on HN.
[+] bobdole1234|9 years ago|reply
It's almost like Google might be recruiting.
[+] aylmao|9 years ago|reply
At the scale at which Google operates even small optimizations can save tons of data. Not saying this was a small contribution, but I agree, mostly internship advertising.

A lot of my friends who have interned at Google felt insignificant, Google needs this counter-marketing.

[+] power78|9 years ago|reply
It is strange. They were trying to decrease update sizes by using File-by-File patching, for example, but never decided to use the best applicable compression library until an intern came along?
[+] anon987|9 years ago|reply
It probably is - much like their Pokemon Go + Google Cloud article.

When will HN readers that they are furthering corporate agendas when they upvote fluff like this?

[+] iamleppert|9 years ago|reply
This compression technique seems to be based on the fact they have previous installation of an app that can be diffed and patched, so it wouldn't receive any benefit from first installations, only updates. But still might be worth it for many applications. I remember I investigated a way to send and apply diffs of javascript code (using a js version of patch) and store in the browser using localstorage. However, at the time the performance wasn't good enough when compared in an end to end benchmark.

However, this has got me wondering as a general corollary for application delivery...would it just make more sense to use something like a well-pruned and compact git repo, and make the connections over HTTP with gzip compression? I'm not sure how space efficient the git repo is but may seem like an interesting project. I'm wary of using any Google technology, open source or not if it can be done yourself in an afternoon.

Does such a thing even exist?

[+] jknoepfler|9 years ago|reply
The phrasing makes it clear that this is not intended to wow a tech audience. It's a Google ad to parents, or something.
[+] Yuioup|9 years ago|reply
Mathematicians are the true programmers. I wish I was one.
[+] jedc|9 years ago|reply
"Google Student Blog" // "Google news and updates especially for students"

Important context for this blog post and the comments in this thread.

[+] bluedino|9 years ago|reply
On the other end of the spectrum, how much more energy has been used by the millions of Android phones uncompressing the app, applying the patch, and re-compressing the data?
[+] sp332|9 years ago|reply
This page is consistently crashing my Firefox content process. I'm running 51.0.1 64-bit on Win10. Anyone else having this problem?
[+] jfasi|9 years ago|reply
I've seen replies about how this is a "simple library swap" and so doesn't deserve the attention it recognition it has received. As some who works at Google but not anywhere remotely near this project, but with experience in similar projects, I'd like to shed some light on why this isn't a simple library swap, and seems from far away to have been both a tremendous accomplishment and a wonderful learning experience.

First off, there is no such thing as a library swap at Google. Our codebase is quite large. Like shockingly overwhelmingly large. Executing a change like this is almost certainly not a case of "swapping out one configuration line for another." It requires writing new code, testing it appropriately, updating any integration tests, updating documentation, etc. But the real fun starts when you're done coding...

There's the issue of frontend and backend. Serving Brotli-compressed data is great, but what if you're app doesn't support it? If you're lucky, this will be handled by the underlying network layer but then you have to deal with...

Rollout. I don't know how many servers are dedicated to app updates, but I imagine it's a lot. I also imagine they're distributed geographically, across regions and probably even continents. Getting all those servers to support new features is a delicate, time consuming process where any misstep will result in users noticing. It's not coding, but that's why it's called "software engineering" and not "coding engineering." But then once you're servers are all up and running you have to deal with...

Versioning. Updating backend servers is bad enough, but at least you control them. What about that zoo of Android versions out in the wild? How do you ensure they all support this changes? Short answer: you don't. You design a strategy that will allow the rollout to happen gradually over a period of time, and closely monitor it to make sure nothing unintended is happening.

Then how do you turn down the old feature? When do you turn it down? You need to build and properly use instrumentation to determine the safest time to kill off the old feature. Or you could never kill it and commit to paying the cost in perpetuity. That's a design decision, and not a trivial one.

But, odds are you're not the only feature being rolled out. You have to anticipate/deal with potential interactions with other features, rollbacks of other people's work, etc.

I could go on, but I think I've already demonstrated why this is by no means a trivial accomplishment, even for a full time engineer. Add to this the fact that every intern has to race against the clock to get ramped up on their project, making something of this complexity and with this large an impact happen deserves applause.

I should add, I'm speaking as myself here and not representing Google in any way.

[+] MtL|9 years ago|reply
Makes you wonder how much they'd save by using Courgette, like the Chrome team does.
[+] mnml_|9 years ago|reply
thats like 50 million dollars a year (in egress cost)
[+] jordache|9 years ago|reply
she didn't create a compression algorithm.

More akin to enabling GZIP in IIS...

[+] 16bytes|9 years ago|reply
If you had an intern that was responsible for turning GZIP on in IIS for a website that had 1B users it starts to become much more of an accomplishment.

Even small changes at that scale require careful analysis and coordination.