Yeah, any viable list of swear words has to include "damn" (and derivatives), "hell", and "ass" (and derivatives). I'd even go so far as to say that "crap" and "retard" (and derivatives) are sufficiently unprofessional that they belong on the list.
Hmm, that's not very comprehensive for this purpose.
Years ago at a former employer, we discovered just after shipping a large quantity of quite sensitive demo materials that an outside contractor had managed to slip a hidden profanity into it. Oh, the joy that caused....
The immediate reaction though was that a few of the less polite and proper members of the organisation were tasked with producing as close to an exhaustive profanity list as they could so we could do a relatively thorough sweep. From memory that list was pushing 40 terms - I think I might still have a copy somewhere but the last thing this discussion needs is more swearing!
I never curse in my commit messages. That doesn't mean I don't want to! Cursing is a vice of mine, acquired through summers of cleaning bathrooms and picking up trash at a state park in high school. I use euphamisms when coding professionally, but it's easy to map my commit messages at old companies back to my original swear.
"Blameless" bug:
Original: Now recalculates the height of the container element after repopulating
the content.
Translation: Did Bob test this fucking thing ONCE before he committed this?
Fixing my own mistake:
Original: Tweaks the NUM_PATHS config value.
Translation: Wow, I apparently have shit-for-brains. I hope nobody ran a build in
the past 20 minutes.
Overdesigning:
Original: Updates the object creation code per Bob's feedback.
Translation: Another Goddamn FactoryFactoryBuilder?! I officially don't
understand this codebase.
Major cleanup needed:
Original: Style tweaks needed for GCC compilation.
Translation: OMFG. This isn't even valid C++. It doesn't even compile.
OK, I'm not perfect:
Original: Fuck IE7.
Translation: No seriously, fuck IE7.
I'd kinda like to see which swear words appear most often in commit messages. I'm guessing that "shit" and "fuck" are much more common than "cocksucker" and "motherfucker", and if that's not true, I want to know which language has the most cocksuckers and motherfuckers.
Yeah, the pie chart doesn't quite cover it - I'd like to see both swear words per commit per language (if, say, Java has 10% of the swear words but 3% of the commits) and complexity of the swear words - a simple "Fuck" implies far less frustration than a "Motherfucking Cocksucker!"
Could develop quite a nice Programming Language Pain Index…
How to offend members of 3 different programmer communities in 9 different ways with just one sentence: "It somehow makes sense that C++, Ruby, and JavaScript are all equally profane."
Given that there were only 210 total swear words, the accuracy of this seems pretty questionable. It's possible that one guy could be responsible for a large percentage of swearing for a given language.
It's code comments vs. commit messages, but the prevalence of profanity in the Linux kernel tree suggests developers' use of blue speech is pretty widespread.
Pie charts have a lot of drawbacks, sure, but it's ridiculous that we're at the point now where the first (and highest rated) response to a pie chart is always a negative comment about pie charts, regardless how good or bad the pie chart is.
This one in particular is very clear:
C++, Ruby and Javascript have the most profanity. They're relatively equal to each other and collectively account for more than 50% of the swearing in commit messages.
C is next, with significantly less swearing.
C# and Java are roughly tied a bit below C.
Python and PHP have, comparatively, almost no swearing.
Was that really so hard? When the data is already subjective (what is and isn't a swear word) and intended almost solely for humor, do we really need more precision than a pie chart offers?
It is at best hyperbolic and at worst dishonest to say you "have no idea" how to interpret this. You have an idea. You just don't have precision.
A good weekend project would be to take an existing graphing library and make a wizard for it that would create a correct type of graph based on the data and your stated intentions with the data, as shown in the flowchart above.
A neat idea, although I think the pie chart isn't really the right format. I'd prefer to see a bar graph, with the y-axis as (swears/million messages) or similar.
A small child doesn't think a crayon is badly designed until he has used a pencil or a pen. Without a frame of reference, a PHP developer has little reason to swear at the code.
Gaming companies, start-ups, anyone looking for a "Ninja" or "Rock Star" all seem more likely to tolerate swearing and less likely to be using PHP. Additionally, I'd wager that PHP projects on GitHub tend to get fewer commits from hobbyists and other non-professionals.
Is it ok statistically to get for example all Ruby commits and 25% of C++ ones and compare them ?
Another kind of chart would be nice... also some other params.
I'm surprised there are so few commit messages with curse words in them. 210 out of 929857, thats like 0.02%, I would have thought that developers were more vulgar then that(I know I am).
Maybe if we looks at comments in source code we would get a better representation of the vulgarness of developers.
Interesting. Of course I am thinking of the many ways that the results might not be representative, but that doesn't make it any less of a cool weekend project.
Would be great to see some context around where the most _profanities_ occur by language, and the kind used.
[+] [-] edw519|15 years ago|reply
That set only includes [shit, piss, fuck, cunt, cocksucker, motherfucker, tits], so these are probably not meaningful results.
I have personality commented "asshole forgot to increment the counter" 527 times in 4 different languages.
[EDIT: 528 times in 5 different languages. Sorry, bitches.]
[+] [-] ZachPruckowski|15 years ago|reply
[+] [-] eftpotrm|15 years ago|reply
Years ago at a former employer, we discovered just after shipping a large quantity of quite sensitive demo materials that an outside contractor had managed to slip a hidden profanity into it. Oh, the joy that caused....
The immediate reaction though was that a few of the less polite and proper members of the organisation were tasked with producing as close to an exhaustive profanity list as they could so we could do a relatively thorough sweep. From memory that list was pushing 40 terms - I think I might still have a copy somewhere but the last thing this discussion needs is more swearing!
[+] [-] ableal|15 years ago|reply
I cannot explain how much the injection of this in current use irks me.
Is this lack of civility, and appalling misogyny, really needed for satisfying self-expression nowadays?
[+] [-] jakevoytko|15 years ago|reply
"Blameless" bug:
Fixing my own mistake: Overdesigning: Major cleanup needed: OK, I'm not perfect:[+] [-] nostrademons|15 years ago|reply
[+] [-] danielsoneg|15 years ago|reply
Could develop quite a nice Programming Language Pain Index…
[+] [-] AndrewVos|15 years ago|reply
I will post up some more data if anyone is interested.
[+] [-] stcredzero|15 years ago|reply
[+] [-] unknown|15 years ago|reply
[deleted]
[+] [-] bad_user|15 years ago|reply
[+] [-] twymer|15 years ago|reply
[+] [-] rosser|15 years ago|reply
[+] [-] pyrhho|15 years ago|reply
Sorry, Ruby!
[+] [-] unknown|15 years ago|reply
[deleted]
[+] [-] csphy|15 years ago|reply
[+] [-] snprbob86|15 years ago|reply
http://www.flickr.com/photos/amit-agarwal/3196386402/sizes/l...
[+] [-] DanielStraight|15 years ago|reply
This one in particular is very clear:
C++, Ruby and Javascript have the most profanity. They're relatively equal to each other and collectively account for more than 50% of the swearing in commit messages.
C is next, with significantly less swearing.
C# and Java are roughly tied a bit below C.
Python and PHP have, comparatively, almost no swearing.
Was that really so hard? When the data is already subjective (what is and isn't a swear word) and intended almost solely for humor, do we really need more precision than a pie chart offers?
It is at best hyperbolic and at worst dishonest to say you "have no idea" how to interpret this. You have an idea. You just don't have precision.
[+] [-] jedsmith|15 years ago|reply
[+] [-] ph0rque|15 years ago|reply
[+] [-] mnbvcxz|15 years ago|reply
[+] [-] unknown|15 years ago|reply
[deleted]
[+] [-] kd0amg|15 years ago|reply
I like how he had to tweak the data collection process to make the visualization method fit.
[+] [-] cookiecaper|15 years ago|reply
[+] [-] AndrewVos|15 years ago|reply
[+] [-] rflrob|15 years ago|reply
[+] [-] r00k|15 years ago|reply
It took a half-hour to write and has consistently gotten more traffic than the rest of my blog.
Ah well, give the people what they want: http://codeulate.com/2007/12/fcking-programming/
[+] [-] JonnieCache|15 years ago|reply
In particular:
I think it's the punctuation. It's like angry japanese poetry.[+] [-] PonyGumbo|15 years ago|reply
[+] [-] sambeau|15 years ago|reply
[+] [-] phamilton|15 years ago|reply
[+] [-] vault_|15 years ago|reply
[+] [-] kunley|15 years ago|reply
[+] [-] joblessjunkie|15 years ago|reply
It's the same reason there is no profanity it FORTRAN or COBOL in this graph.
The author makes a terrible mistake: he does not normalize the graph relative to the total amount of code in each language.
There's a lot of Ruby on Github, which is why you can find plenty of Ruby profanity.
The graph is just about useless without this normalization.
[+] [-] spoondan|15 years ago|reply
[+] [-] chc|15 years ago|reply
[+] [-] unknown|15 years ago|reply
[deleted]
[+] [-] jrockway|15 years ago|reply
[+] [-] JustinSeriously|15 years ago|reply
{"JavaScript"=>48, "Ruby"=>46, "Perl"=>28}
[+] [-] cfontes|15 years ago|reply
[+] [-] jhamburger|15 years ago|reply
[+] [-] jefe78|15 years ago|reply
I wonder if this has anything to do with: http://news.ycombinator.com/item?id=2247962
I know I plan to comment my python code a little differently now! Maybe that will help balance the numbers?
I know I'd be pretty vulgar if I programmed in C++/Javascript all day!
[+] [-] mgrouchy|15 years ago|reply
Maybe if we looks at comments in source code we would get a better representation of the vulgarness of developers.
[+] [-] m0hit|15 years ago|reply
Would be great to see some context around where the most _profanities_ occur by language, and the kind used.
[+] [-] jcw|15 years ago|reply
I've found that the only real profanity in a source code comment is "HACK".
My swear jar overflows with quarters.
[+] [-] JCB_K|15 years ago|reply
[+] [-] damoncali|15 years ago|reply
- fuck it. let's release
[+] [-] khingebjerg|15 years ago|reply
[+] [-] chrismetcalf|15 years ago|reply
https://gist.github.com/198320
A one-liner I wrote that uses git blame to seek out who swears the most in a given codebase. Pretty fun.
[+] [-] antihero|15 years ago|reply