top | item 2251171

Amount of profanity in git commit messages per programming language

285 points| AndrewVos | 15 years ago |andrewvos.com | reply

184 comments

order
[+] edw519|15 years ago|reply
Out of 929857 commit messages, I found 210 swear words (using George Carlin's Seven dirty words).

That set only includes [shit, piss, fuck, cunt, cocksucker, motherfucker, tits], so these are probably not meaningful results.

I have personality commented "asshole forgot to increment the counter" 527 times in 4 different languages.

[EDIT: 528 times in 5 different languages. Sorry, bitches.]

[+] ZachPruckowski|15 years ago|reply
Yeah, any viable list of swear words has to include "damn" (and derivatives), "hell", and "ass" (and derivatives). I'd even go so far as to say that "crap" and "retard" (and derivatives) are sufficiently unprofessional that they belong on the list.
[+] eftpotrm|15 years ago|reply
Hmm, that's not very comprehensive for this purpose.

Years ago at a former employer, we discovered just after shipping a large quantity of quite sensitive demo materials that an outside contractor had managed to slip a hidden profanity into it. Oh, the joy that caused....

The immediate reaction though was that a few of the less polite and proper members of the organisation were tasked with producing as close to an exhaustive profanity list as they could so we could do a relatively thorough sweep. From memory that list was pushing 40 terms - I think I might still have a copy somewhere but the last thing this discussion needs is more swearing!

[+] ableal|15 years ago|reply
bitches

I cannot explain how much the injection of this in current use irks me.

Is this lack of civility, and appalling misogyny, really needed for satisfying self-expression nowadays?

[+] jakevoytko|15 years ago|reply
I never curse in my commit messages. That doesn't mean I don't want to! Cursing is a vice of mine, acquired through summers of cleaning bathrooms and picking up trash at a state park in high school. I use euphamisms when coding professionally, but it's easy to map my commit messages at old companies back to my original swear.

"Blameless" bug:

   Original: Now recalculates the height of the container element after repopulating
       the content.
   Translation: Did Bob test this fucking thing ONCE before he committed this?
Fixing my own mistake:

   Original: Tweaks the NUM_PATHS config value.
   Translation: Wow, I apparently have shit-for-brains. I hope nobody ran a build in
       the past 20 minutes.
Overdesigning:

   Original: Updates the object creation code per Bob's feedback.
   Translation: Another Goddamn FactoryFactoryBuilder?! I officially don't 
       understand this codebase.
Major cleanup needed:

   Original: Style tweaks needed for GCC compilation.
   Translation: OMFG. This isn't even valid C++. It doesn't even compile.
OK, I'm not perfect:

   Original: Fuck IE7.
   Translation: No seriously, fuck IE7.
[+] nostrademons|15 years ago|reply
I'd kinda like to see which swear words appear most often in commit messages. I'm guessing that "shit" and "fuck" are much more common than "cocksucker" and "motherfucker", and if that's not true, I want to know which language has the most cocksuckers and motherfuckers.
[+] danielsoneg|15 years ago|reply
Yeah, the pie chart doesn't quite cover it - I'd like to see both swear words per commit per language (if, say, Java has 10% of the swear words but 3% of the commits) and complexity of the swear words - a simple "Fuck" implies far less frustration than a "Motherfucking Cocksucker!"

Could develop quite a nice Programming Language Pain Index…

[+] AndrewVos|15 years ago|reply
From what I remember there was only one or two "motherfuckers".

I will post up some more data if anyone is interested.

[+] stcredzero|15 years ago|reply
How to offend members of 3 different programmer communities in 9 different ways with just one sentence: "It somehow makes sense that C++, Ruby, and JavaScript are all equally profane."
[+] bad_user|15 years ago|reply
Or that they are so relevant in 2011 :-)
[+] twymer|15 years ago|reply
Given that there were only 210 total swear words, the accuracy of this seems pretty questionable. It's possible that one guy could be responsible for a large percentage of swearing for a given language.
[+] rosser|15 years ago|reply
It's code comments vs. commit messages, but the prevalence of profanity in the Linux kernel tree suggests developers' use of blue speech is pretty widespread.
[+] pyrhho|15 years ago|reply
> It's possible that one guy could be responsible for a large percentage of swearing for a given language.

Sorry, Ruby!

[+] csphy|15 years ago|reply
I want to know the proximity of the curse to 'IE' in the Javascript commits
[+] snprbob86|15 years ago|reply
Pie chart? I have no idea how to interpret this...

http://www.flickr.com/photos/amit-agarwal/3196386402/sizes/l...

[+] DanielStraight|15 years ago|reply
Pie charts have a lot of drawbacks, sure, but it's ridiculous that we're at the point now where the first (and highest rated) response to a pie chart is always a negative comment about pie charts, regardless how good or bad the pie chart is.

This one in particular is very clear:

C++, Ruby and Javascript have the most profanity. They're relatively equal to each other and collectively account for more than 50% of the swearing in commit messages.

C is next, with significantly less swearing.

C# and Java are roughly tied a bit below C.

Python and PHP have, comparatively, almost no swearing.

Was that really so hard? When the data is already subjective (what is and isn't a swear word) and intended almost solely for humor, do we really need more precision than a pie chart offers?

It is at best hyperbolic and at worst dishonest to say you "have no idea" how to interpret this. You have an idea. You just don't have precision.

[+] ph0rque|15 years ago|reply
A good weekend project would be to take an existing graphing library and make a wizard for it that would create a correct type of graph based on the data and your stated intentions with the data, as shown in the flowchart above.
[+] mnbvcxz|15 years ago|reply
Thanks for reminding me again why I don't bother reading these forums. One day I'll quit clicking links too.
[+] kd0amg|15 years ago|reply
Note that I ripped an equal amount of commit messages per language so the results aren't based on how many projects there are per language.

I like how he had to tweak the data collection process to make the visualization method fit.

[+] cookiecaper|15 years ago|reply
That flow chart there is helpful, thanks.
[+] rflrob|15 years ago|reply
A neat idea, although I think the pie chart isn't really the right format. I'd prefer to see a bar graph, with the y-axis as (swears/million messages) or similar.
[+] r00k|15 years ago|reply
I wrote a post on my blog 4 years ago (!) with lots of examples of profanity in code comments.

It took a half-hour to write and has consistently gotten more traffic than the rest of my blog.

Ah well, give the people what they want: http://codeulate.com/2007/12/fcking-programming/

[+] JonnieCache|15 years ago|reply
To be fair, some of those are hysterical.

In particular:

    # no, no, no, no, no, no, no, no
    # no. fuck no. I am a fucking
    # moron.
I think it's the punctuation. It's like angry japanese poetry.
[+] PonyGumbo|15 years ago|reply
I'm completely stunned that PHP is on the bottom here.
[+] sambeau|15 years ago|reply
PHP is very international - I wonder if it would rank higher if, say, German swearing was counted?
[+] phamilton|15 years ago|reply
A small child doesn't think a crayon is badly designed until he has used a pencil or a pen. Without a frame of reference, a PHP developer has little reason to swear at the code.
[+] vault_|15 years ago|reply
The only PHP devs using git are all super professional with the patience of saints?
[+] kunley|15 years ago|reply
Perhaps they don't write meaningful commit messages?
[+] joblessjunkie|15 years ago|reply
There is not much PHP on Github.

It's the same reason there is no profanity it FORTRAN or COBOL in this graph.

The author makes a terrible mistake: he does not normalize the graph relative to the total amount of code in each language.

There's a lot of Ruby on Github, which is why you can find plenty of Ruby profanity.

The graph is just about useless without this normalization.

[+] spoondan|15 years ago|reply
Gaming companies, start-ups, anyone looking for a "Ninja" or "Rock Star" all seem more likely to tolerate swearing and less likely to be using PHP. Additionally, I'd wager that PHP projects on GitHub tend to get fewer commits from hobbyists and other non-professionals.
[+] chc|15 years ago|reply
My interpretation is that anyone still working in PHP is long resigned to its frustrations.
[+] unknown|15 years ago|reply

[deleted]

[+] jrockway|15 years ago|reply
No Perl? I fucking swear all the time...
[+] JustinSeriously|15 years ago|reply
I ran it for Javascript, Ruby, and Perl, and I got this:

{"JavaScript"=>48, "Ruby"=>46, "Perl"=>28}

[+] cfontes|15 years ago|reply
Is it ok statistically to get for example all Ruby commits and 25% of C++ ones and compare them ? Another kind of chart would be nice... also some other params.
[+] jhamburger|15 years ago|reply
Why not? As long as the sample is random and equal in size.
[+] jefe78|15 years ago|reply
Well played!

I wonder if this has anything to do with: http://news.ycombinator.com/item?id=2247962

I know I plan to comment my python code a little differently now! Maybe that will help balance the numbers?

I know I'd be pretty vulgar if I programmed in C++/Javascript all day!

[+] mgrouchy|15 years ago|reply
I'm surprised there are so few commit messages with curse words in them. 210 out of 929857, thats like 0.02%, I would have thought that developers were more vulgar then that(I know I am).

Maybe if we looks at comments in source code we would get a better representation of the vulgarness of developers.

[+] m0hit|15 years ago|reply
Interesting. Of course I am thinking of the many ways that the results might not be representative, but that doesn't make it any less of a cool weekend project.

Would be great to see some context around where the most _profanities_ occur by language, and the kind used.

[+] jcw|15 years ago|reply
Yeah, this is funny, there is novelty here, etc. A story counting profanities in source code/commits/etc. pops up every now and then.

I've found that the only real profanity in a source code comment is "HACK".

My swear jar overflows with quarters.

[+] JCB_K|15 years ago|reply
Next time they do a test they should include "git". Let's see what happens.
[+] damoncali|15 years ago|reply
My favorite:

- fuck it. let's release