The story of Google Guava and patches

[+] stephen|14 years ago|reply

I'll call bullshit. Either they care about external developers or they don't. This is saying they don't.

Google's culture seems insular and elitist. Besides Guava, they did the same thing with GWT (which, as much as I love GWT, didn't work out in the project's best interests, IMO), and now are doing the same thing with Dart (AFAICT).

Maybe in the 90s you could get away with this. But now if you don't have an active external community, whenever your old guard of Guava/GWT/Dart developers gets bored and leaves, the new guys that come in behind them aren't going to care nearly as much about the Google internal technologies vs. the true open source technologies they've been hacking on before/after their time at Google.

So the Google/internal technologies will eventually stagnate.

Perhaps internally-driven projects can get more stuff done in the short term (thanks to dedicated resources), but I think in the long term the external community out-innovates internal projects (due to the internal teams getting burdened with legacy requirements (cough GWT), politics, etc.). Dunno, that's my impression.

[+] cromwellian|14 years ago|reply

I used to an be external contributor to open-source GWT, and then became a Google employee to work on the GWT team and the situation really has nothing to do with elitism or culture.

It basically boils down to a matter of resources. When I started as an open source contributor to GWT, it was used by external developers, but not really used internally by Google, so changes made by external committers couldn't possibly break anything.

Slowly over time, more and more Google properties started using GWT, and suddenly, you had the situation where an external user could submit a patch, that passed all the GWT unit tests, but broke major Google properties (e.g. AdWords, Google Groups, Wallet/Checkout, etc) Google builds everything from head, so when you do an internal commit, not only do your unit tests run, but the unit tests from every project that depends on GWT, so you find very quickly if your patch broke real applications. This happens all the time. I commit, pre-submit queue for GWT is green (all tests past), then hundreds of other projects get their chance, and there's always 1 or 2 that break, not always because of GWT per se, sometimes because of bad code in those projects.

The problem is, there is no way for external committers to get notified of internal (potentially confidential) apps breaking on their changes. This meant that every external commit would need to be reviewed and proxied by someone on the GWT team.

Now, back when we had over 20+ people working on GWT, it wasn't hard. Now there are only 5 full time committers, and it has become a lot more difficult to keep up with the external community and support internal users.

I have been internally advocating that we "re-open source GWT". That is, we make the "source of truth" be an external repository, possibly re-hosting it on GitHub or Google Code, and fork it off from the internal version. We grant all our of best and dedicated contributors rights to administer and commit on equal footing with Google employees, and we run external continuous build systems for it.

On the innovation front, I think it's true for gwt-user, but for the compiler, I've hardly gotten any external contributions for optimizations and almost all of the improvements in speed and code size have arisen internally. This may indicate that we should run separate open source projects for the compiler/tools and the libraries, splitting them up and separately managing them.

But the GWT team IMHO was never elitist, just saddened that contributions were piling up and we lacked the bandwidth to review and commit them in a timely manner. I feel bad about it, given the time people put in, and I've been spending time recently trying to collect all outstanding patches for landing into GWT 2.5.

[+] js2|14 years ago|reply

Not all of Google is that way. The Chromium project, I think, is a notable exception. Gerrit Code Review is happy to accept outside contributions. And technically the Git maintainer (Junio Hamano) is a Google employee these days.

Sadly, Kevin doesn't sound too good here:

And here's the last thing. Be honest: if you were going to sign yourself up for doing all that work above... wouldn't you at least want to have the pleasure of writing the code for it yourself?

Code is code. (Well, as long as it's not awful/ugly code.) I'm as happy to marshal through someone else's code as write my own.

[+] vineet|14 years ago|reply

Highlight: Stop submitting patches to Guava - it is too much work for us.

My Favorite response (from Martijn Verburg): "...could you guys work with the community to teach them to submit better proposals/patches? Many open source projects are able to do this from the Linux Kernel through to hobby projects like PCGen. Perhaps talking to their committer teams might give you some insights."

[+] drats|14 years ago|reply

Interestingly I heard that the Java people at Google rail against using Python for large projects because they supposedly get out of hand..

[+] nknight|14 years ago|reply

Sounds like Martijn doesn't actually know much about Linux kernel development.

Patches get rejected from the kernel all the time for many of the same reasons listed in the linked post. It takes a long time for most new kernel contributors to get anything substantive in, and major changes almost never go in without huge reviews, fights, competing proposals, etc..

The only difference between that and what this post talked about is that for the kernel, a lot of the sausage making goes on in public (though far from all of it).

[+] spaznode|14 years ago|reply

Maybe I'm alone in thinking this but having been in the position of reviewing more than a few non-trivial bugfix patches myself I think I might tend to agree with Kevin.

Sure it's great people are excited and want to contribute but all that excitement is due to the love and care people sweated in to making every single line in that codebase as perfect / performant / easy to understand as possible.

Patches almost never add to something like that. ESP not on such a small focused library. Truth be told most of the time on open source projects you're accepting patches simply to get more community involvement and acceptance. Guava doesn't need acceptance, it has been lovingly accepted already. If you want open armed love go to apache commons.

If you want perfect performant code you can use and trust consistently go to guava.

I'm grateful and happy that it exists and it is a pleasure and delight every time I incorporate a little bit more into my codebase, slowly.

[+] cheatercheater|14 years ago|reply

Gee, how does Linux ever make it! All of it is contributed! Oh: Torvalds just sits down and comes through on looking at the submissions.

[+] dhanji|14 years ago|reply

A rebuttal: http://rethrick.com/guava

[+] st3fan|14 years ago|reply

Am I the only one who is really annoyed by links to Google+ that can only be seen after signing in? I thought it was considered bad style to do that for NYT links here. Maybe the same holds true for G+ links?

[+] dewitt|14 years ago|reply

This post should be publicly visible without being logged in (at least it is for me). But this is the second HN thread in a week where I've seen a comment like this, so can you send me your details (web browser, etc), so I can debug, please?

[+] peeters|14 years ago|reply

You really have to take the good with the bad when it comes to Google sponsored libraries.

I'm usually won over by them because Google does truly great Java API design, their releases are relatively high quality (there is some assurance of quality when it's used internally at Google) and their libraries almost always enforce good design (they don't accept things that you would consider helpful if they think it will be easy to use incorrectly or abuse).

But with that comes the bad. If something's not helpful to Google, it won't have sponsorship to be added to the library. The library will always support only versions of Java that Google internally uses (Kevin has said before that it is unlikely that Guava will be expanded to cover even Java 6 any time soon).

So I'm enjoying using their libraries while they are current, but am fully aware that they might need to be forked eventually.

[+] ary|14 years ago|reply

The point that seems to be missed here is that Google is eating their own dog food. Doing such they are hesitant to fix what "ain't broke." Were this merely code that was being thrown over the fence from time to time I'm sure you'd see a higher patch adoption rate.

[+] robryan|14 years ago|reply

It seems there is open source which is setup for community contribution and open source which isn't. We tend to only really think of open source, ideally at least, in terms of projects that allow community contribution.

From what I have read Android is pretty similar? Very hard for developers to actually get some of their code merged.

Wondering now what other Google projects are like for outside contributions, Chromium etc.

[+] MatthewPhillips|14 years ago|reply

Very few Google projects allow outside contributions. Go is one that does; no surprise there of course. There is an attitude pervasive at Google that open source is a great marketing tool, but that it's a one-way street. I think it's because of the high employment standards at Google; they cannot fathom how someone without an @google.com email address can do better than them.

Dart is the perfect example of this: developed in secret, in a dark room, and then dumped on the open source community. When no one was excited about it they shrugged their shoulders, confused about what they had done wrong.

I have no interest in Java, and I really hope someone forks this project and treats it like a real open source project.

[+] MatthewPhillips|14 years ago|reply

tldr: It's more difficult to maintain a Java util library than it is to maintain the Linux kernel, so patches are not welcome. They'd love the community to do their bitch work though.

[+] zem|14 years ago|reply

a better point is the preface to the guava project docs:

"The Guava project contains several of Google's core libraries that we rely on in our Java-based projects"

given that, it is fair to say "this is essentially an internal codebase, and we would prefer to develop it ourselves so that it fits our internal practices and standards; however, it is an extremely useful set of libraries, and we are happy to share it with the open source community so that you can use it too, if you like"

[+] spaznode|14 years ago|reply

Sounds like a good article, wish I could read it on my mobile iOS device.

[+] michaelneale|14 years ago|reply

For those who can't read the G+ post (yes, it can be a pain):

----------

The story with #guava and your patches

Guava users,

Many of you, when you request a feature for Guava, have submitted a patch to us with the implementation (or even pasted code directly into bug reports).

And we have almost never accepted any of these patches, let alone even read them. And I know this makes us look all manner of self-absorbed, arrogant and unappreciative. That's what I'd think in your shoes. So it's time I tried to explain to you more fully why it's like this.

I realize that from your perspective, you're handing us a shiny new feature on a silver platter. It should be making our decision easy, since the work is already done. It's a gift of your time and effort and you've already solved the problem and all we need to do is just accept it! Looked at that way, we're either idiots or jerks for not being interested.

But here's the part that I don't think many of you understand: the work you've done to produce that patch is actually minuscule compared to the total amount of work we have to do to put it in Guava. I know that it feels to you like you've certainly gotten us more than halfway there, but trust me, it's only scratched the surface.

- We have to work out whether the problem it's trying to solve is truly the right problem - We have to work out whether the solution presented is truly the best solution we can come up with - We have to find evidence in the internal Google codebase that users will actually use the proposed feature if we create it. If we are adding methods to our libraries that don't get used, it hurts our case when we try to argue to management that we're doing important work and need more staff. - We have to figure out how it relates to the piles of legacy code we have floating around our libraries (that you, lucky folks you are, don't even see!), and how we would deal with migrating those users if they exist. - We have to decide the best name and location for the new API. This is hard! We spend a lot of time in our API review meetings just batting names around. - We have to review the code deeply. Our code reviews are grueling and go on for many rounds. When you look at the code in Guava it tends to look "obvious", but we work very hard to achieve that quality. It's only obvious in hindsight. - In almost every case we have to completely rewrite the javadoc that first gets submitted to us. And this is very hard. Writing good documentation is probably the biggest challenge we ever face. - The tests that were first written are rarely sufficient; we're going to need to add more. When we do, some usually fail. - If the change touches on any existing functionality, we have to submit it to Google's global submit queue and analyze test results from many thousands of projects to make sure we won't break any internal users with it. - If the change goes in, we have to deal with the machinery that gets that change integrated out to you in Guava. - We then become responsible for fixing any bugs with it that come up over time, and dealing with the related feature requests it will touch off. - And the code never "stays finished' in general; we are constantly performing various maintenance tasks over our whole library (or even the whole codebase of Google), to make various cross-cutting improvements, and every bit of new code added increases that burden.

There's more I'm leaving out, but you get the idea. Guava did not get to the level of quality it has by accident, but by us being really obsessive about these things.

Now, when the patch comes from outside Google, we have additional mechanical overhead. One of us has to sponsor the patch as if it's their own, converting it into an internal patch that can merge correctly (which isn't always as trivial as it sounds), and sending it for review to another member of the team. And because we are the ones most familiar with our own style, conventions, practices and pitfalls to avoid, etc., sometimes just doing that plus "cleaning up" the code to get it ready for review is already more time-consuming than if we had written it ourselves from the start. That doesn't even mean that the code sent to us in the patch was bad. It can be very good by most standards but still need a lot of rework for our purposes.

Remember, if your feature is valuable, then we're going to want it in Guava whether you provided a patch or not. Providing the patch doesn't make it more likely that we'll decide it's a good fit for Guava -- if anything it just puts us more on guard against that seductive temptation to think "but it's already mostly done anyway, might as well!"

And here's the last thing. Be honest: if you were going to sign yourself up for doing all that work above... wouldn't you at least want to have the pleasure of writing the code for it yourself? I love writing code -- that's why I do this! -- but such a large majority of my time goes into activities like those described above. If my job were all about just applying other people's patches, I would inevitably start hating it after a while. Let me have some fun sometimes, okay? :-)

I really hope this helps to understand why your patches seem to go into a black hole. I know that no matter what I say it will probably continue to seem unappreciative and condescending, and I apologize. I do recognize that you are just trying to help. But, if you really want to help, then keep an eye out for the times when we will ask for help on a particular issue, because that's where your time and energy will really do the most good!

Rantingly yours, KB

[+] grinich|14 years ago|reply

Try this: http://www.readability.com/articles/7c9wdnlq

[+] vilda|14 years ago|reply

Branching.

A simple feature everyone is trying to avoid, but sometimes it's like am open door from golden cage.

Guava is the "from inside out" project presented as "take it or leave it". I would not personally beat Google because of their attitude towards changes. But they have to state it explicitly, otherwise more contributors - after so much work they invested - will feel betrayed!

[+] cpeterso|14 years ago|reply

> If the change touches on any existing functionality, we have to submit it to Google's global submit queue and analyze test results from many thousands of projects to make sure we won't break any internal users with it.

Is there any public info about "Google's global submit queue"? I would love to learn more about such a huge automated test system.

[+] nolliesnom|14 years ago|reply

Guava is a work of art created by the team that maintains it; what's the big deal if you can't add your own code to it? Merely a lost opportunity to advance your own vanity?

Respect their boundaries and let your experience inform your feedback to them. If you read what Kevin is saying, it is clear they are interested in hearing if, how, and why their library is helping or hurting your own project. They will probably listen if your feedback provides the answers they seek.

[+] spullara|14 years ago|reply

The hard part of guava is deciding what to include and exclude and they base those decisions mostly on what is useful at Google. The code is easy and handing them a patch is pointless.

[+] steinbrenner|14 years ago|reply

"Sam Berlin - Disclaimer: I don't know what patches to the Linux Kernel are typically like, nor PCGen. But, +Martijn Verburg, there's a pretty big difference between submitting a patch to a library and submitting a patch to a "project". Patches to libraries are typically changes/additions/removals to the API, whereas patches to 'projects' are typically changes to the internals. It's a whole lot easier to change the internals of something than it is to change the API. Changing the API means the effects can bubble outwards. Changing the internals is usually just optimizations or bug-fixes."

haha wow. this is why I love java programmers

[+] jey|14 years ago|reply

How is this Java-specific? Same problem exists in any language.

[+] codeonfire|14 years ago|reply

The simple solution for dealing with Google-scale bureaucracy is to fork and continue pushing forward. Then when that fork gets locked down, create a new one. Not every project is going to be bureaucracy impaired, but there is a correct procedure when it is. Multi-round "grueling" code reviews, and API review meetings? W.T.F.

[+] 1010011010|14 years ago|reply

Java is stupid.

[+] eta_carinae|14 years ago|reply

Guava is one of the best Java libraries available today, and the fact that the bar for submitting patches is so high is a simple consequence of that.

You can't have it both ways.

49 comments