Yeah, most of this is just drawing conclusions from what is essentially noise. Especially the consistency measure: that's bound to be heavily affected by how many people use a given language. The so-called "tier 1" are not popular because they're consistent--they're consistent because they're popular. The same goes for the tier 3 languages in reverse: they probably have so little activity that a high variance is inevitable.
Also, number of lines per commit is not really a good measure of expressivity. I don't even see how it's a reasonable proxy: the number of lines I commit changes more depending on my project (is it simple or complex, for work or for fun?) than on my language.
Also, it makes literally no sense at all to consider very domain-specific languages like puppet: you may as well talk about how expressive HTML or CSS are relative to normal programming languages!
Basically, I think this article draws too many arbitrary conclusions on limited data.
"the number of lines I commit changes more depending on my project (is it simple or complex, for work or for fun?) than on my language."
This amounts to saying: Some days I eat eggs for breakfast, but other days I eat oatmeal for breakfast, so I am totally inconsistent in what I eat from day to day, therefore it would make no sense to include me on a survey about what people in my country eat.
You do understand that a single data point might be useless, but when combined with thousands or millions of other data points, it becomes useful? What you do on any one project does not matter, but what you do, combined with thousands of other developers, all averaged together, starts to get interesting.
If you honestly believed your own premise, then you would expect all the results to be the same -- there would be no variation between Fortran, Java, Javascript, Clojure or Coffeescript, cause, you know, everybody is different and does different stuff, and its all so crazy, how can anybody make sense of it?
But we can make sense of it. All that is needed is a good understanding of probability and a sufficiently large data set.
Mind you, the article above might be total bunk. There might be lots wrong with the dataset. But its not bunk for the reason you give: " the number of lines I commit changes more depending on my project (is it simple or complex, for work or for fun?) than on my language."
Yes. It would have been better to publish the chart, and then on this line:
> What conclusions can we draw from this?
...answer the question with "None" and then explain why.
Following that with a deeper analysis of the myriad explanations for this data could have been interesting (e.g. what kinds of projects are strongly correlated with larger commits? How much does that vary by language and age of the project?), but unsupported conclusions are not.
I think language age is another point in support of your consistency argument. Older languages have had more massive library changes etc. whereas new languages or unused languages have libraries change much less and therefore less churn. It's why Lua can be at one end and Boo at the other. Or C# and Vala despite their similarities. Old languages with lots of users at the low-churn end of the spectrum are something to look at though.
I'm not certain that commit size is a good measure of expressiveness. Even not counting obvious outliers like javascript (oops, 3241245 people had a commit with jquery inside), it will be very dependent on the cultures in the different languages.
Some languages have users that are less mature in VCS usage than others, some languages have users that spend a lot more time writing tests (which create larger commits) etc.
So true. I would normally have trouble believing that assembly is more expressive than C, but maybe the people who write assembly tend to be excellent at what they do? And JS is more expressive than Java, no? Maybe some of these are skewed by the sort of programmers that use them.
I agree, for all the time he spends on his results, as far as I can tell he never really defends the idea that commit size is in any way a good proxy for "expressiveness." It would be nice if he could explain why he chose commit size over somthing like rosetta code size or some other metric.
There is a lot of weak-ass criticism going on in this thread when the data -- whatever about its methodology is troubling -- seems to almost perfectly back up what is the common experience among programmers. Yes, copy-and-paste doubtlessly affected the numbers for JavaScript, but I am not at all surprised to see JavaScript where it is.
Does anyone here really doubt that you can get more done with a single line of Python than a line of C/Java/C++? Same for Clojure/Common Lisp/Racket versus Python.
We might not take individual ranking too seriously, and none of this affects language choice when performance is a critical concern (though the spacing between Scala, OCaml, and Go is interesting and relevant to this), but do you guys honestly doubt the trend here? Does anyone have a strong counter-example? It seems like the authors may have had a decent notion with using LOC as a measure. There is no proof of this here, but I am intrigued by it.
The final conclusions in favor of CoffeeScript, Clojure, and Python are again pretty obvious. Is anyone going to suggest JavaScript or C++ is more expressive than any of these?
> There is a lot of weak-ass criticism going on in this thread when the data -- whatever about its methodology is troubling -- seems to almost perfectly back up what is the common experience among programmers.
So?
I mean, really, I can come up with completely bogus metrics all day, and whenever one produces results in a domain that happen to align with CW in that domain post a infographic using it, but that doesn't make that metric meaningful.
> The final conclusions in favor of CoffeeScript, Clojure, and Python are pretty obvious, I would think.</blockquote>
So? A metric that has no intrinsic validity doesn't become valuable just because it produces conclusions which match what you would have assumed to be true (whether based on valid logic or not) before encountering the metric.
There's enough good data backing up that conclusion that there's no point using crappy data like in the article.
Nobody will argue C is more expressive than Python, but the data in the article doesn't support it. Just because something is true doesn't mean it's okay to support it with shoddy data.
LOC per commit isn't a proxy measurement of the expressiveness of a language. The entire premise of the article is flawed.
I personally think that the poor methodology of this post would never have survived to see the light of day if the conclusions did not match what programmers expect. Conversely the methodological flaws mean that we should be very careful about accepting the data for any conclusion beyond, "Well, it looks like what I expect."
I think this plot begs a different question... which languages are being abused by the development community?
Javascript is way too expressive for its given position. I also believe ruby is more expressive than python, and yet the plot shows the opposite there as well.
This plot could have some interesting data, but there's far too much noise to really learn much from it.
Actually, I think thats exactly the problem. There is a general perception based on anecdotal experience, followed by a non rigorous 'scientific' data 'experiment', followed by analysis of results which throws out all the data which disagrees with the original perception. Look at the actual results, they dont really show a strong correlation with 'common experience amound developers',
The good thing about Coffeescript is that it actually seems to deliver what it was promising in the first place, terseness, expressiveness, simplicity and good Javascript output.
It is really a joy to develop with (though my being a Ruby programmer probably makes me a somehow biased and enthusiastic candidate).
It's hard to tell if the fact that CoffeeScript is #6 (the highest ranked major language) and Javascript is #51 (second to last) is a reflection on how much of a shift CS is from JS, or on the quality of methodology and metrics in the OP.
I'd say it's a byproduct of the fact that only in-the-know developers are using CoffeeScript, while JavaScript is being written by every rank-and-file amateur web dev in the world. There is likely an inverse correlation between language popularity and median code quality.
I'm thinking that's mostly due to the testing methodology. In JS it's pretty common to start a project by downloading and committing jQuery, Underscore, and whatever framework dependencies you're using. And just speaking from experience, coffeescript is better, but it's not _that much_ better (I'd guess ~1.5 lines of JS per line of CS, the chart shows somewhere around 8:1).
Yeah, I agree — I really doubt it's that large of a shift, as I mentioned under "Specific language effects." I find it hard to believe that JavaScript could really be less expressive than C and assembly.
The difference between Fortran Free Format and Fortran Fixed Format should be enough to tell us that lines of code per commit is all about how much stuff you put on a line!
(Is `nl` significant in the language syntax? Were readable wide screen displays available when the code was written?)
Note that in fifth place -- ahead of Coffeescript, Clojure, Python, etc., etc., -- is eC, a that's basically C plus a few OO features and a GUI library. It has, for instance, no automatic memory management (besides some rather primitive refcounting); neither dynamic typing nor type inference; no nice literal syntax for collection types; in short, whatever its merits it is not an outstandingly expressive language.
There is surely some correlation between short commits and expressiveness, but they're far enough apart that I think the title is very misleading.
Terse isn't the same as expressive. Consider Scala vs Java. In Java, definining a singleton involves implementing a design pattern in one of a few ways. Maybe you have a private constructor and a static initializer. In Scala you define your class with `object` and it's done. That's concise and expressive.
In Java, if your generic class has a lower bound you write Class Foo<T extends Bar> while in Scala you write def Foo[T<%Bar] which is just an abbreviation. Replacing a word with punctuation. One is good, one isn't.
It's pretty hard to disambiguate speed of development, fluidity and flexibility (which potentially increases LOC per commit by bundling multiple 'conceptual pieces') with expressiveness (which decreases LOC per commit) in a single LOC per commit metric.
The idea that a single commit corresponds to a 'single conceptual piece' is probably not very precise. It also doesn't measure for the complexity of the conceptual piece. It hasn't been established that the same level of 'conceptual pieces' are tackled across all programming languages per commit.
Just some thoughts. That said, I think though given all the factors involved (more expressive concepts per commit are perhaps tackled in more expressive languages, and that cancels out that less expressive concepts are tackled per commit in less expressive languages -- that balances out simplicity in simpler / less expressive languages can lead to more actual features commited), that actually the methodology kind of works. But, like others here, I wouldn't presume it's quite so simple underneath.
There are a few surprises in that list. I'd have thought both Coq and Lua would end up further to the left. I suppose the graph shows an interesting dynamic: expressiveness is not just due to a language's semantics, nor syntax, nor libraries. You need a healthy interplay between all three aspects of a programming language to truly be "expressive".
The whiskers on the Prolog and Logtalk plots would tend to confirm what PG had said in one place or another, that a language can be both more high level and less expressive, apparently using a definition of expressive related to line count. In that light I'm kind of surprised they do as well as they do in this post.
As I get more serious about Prolog I think it's kind of a shame more people aren't exposed to it (apart from apparently dreaded university classes). It's pretty impressive just how quickly one can write a parser, and for writing an "internal" DSL it rivals or exceeds Lisp (depending on one's taste).
This assumes that languages are used to solve similar classes of problems, or that, in aggregate, those different classes result in similarly-sized "feature" chunks.
I think it's fair to say that nobody's ever written a forum in puppet, and that few people are working on micro-[web]frameworks in Fortran.
Interesting data, just need to be careful about what conclusions are drawn from it.
Are you referring to the programming language for calculators? Wikipedia doesn't make it seem that expressive, and I can't find much else associated with the acronym RPL.
[+] [-] tikhonj|13 years ago|reply
Also, number of lines per commit is not really a good measure of expressivity. I don't even see how it's a reasonable proxy: the number of lines I commit changes more depending on my project (is it simple or complex, for work or for fun?) than on my language.
Also, it makes literally no sense at all to consider very domain-specific languages like puppet: you may as well talk about how expressive HTML or CSS are relative to normal programming languages!
Basically, I think this article draws too many arbitrary conclusions on limited data.
[+] [-] lkrubner|13 years ago|reply
This amounts to saying: Some days I eat eggs for breakfast, but other days I eat oatmeal for breakfast, so I am totally inconsistent in what I eat from day to day, therefore it would make no sense to include me on a survey about what people in my country eat.
You do understand that a single data point might be useless, but when combined with thousands or millions of other data points, it becomes useful? What you do on any one project does not matter, but what you do, combined with thousands of other developers, all averaged together, starts to get interesting.
If you honestly believed your own premise, then you would expect all the results to be the same -- there would be no variation between Fortran, Java, Javascript, Clojure or Coffeescript, cause, you know, everybody is different and does different stuff, and its all so crazy, how can anybody make sense of it?
But we can make sense of it. All that is needed is a good understanding of probability and a sufficiently large data set.
Mind you, the article above might be total bunk. There might be lots wrong with the dataset. But its not bunk for the reason you give: " the number of lines I commit changes more depending on my project (is it simple or complex, for work or for fun?) than on my language."
[+] [-] rtfeldman|13 years ago|reply
> What conclusions can we draw from this?
...answer the question with "None" and then explain why.
Following that with a deeper analysis of the myriad explanations for this data could have been interesting (e.g. what kinds of projects are strongly correlated with larger commits? How much does that vary by language and age of the project?), but unsupported conclusions are not.
[+] [-] LukeHoersten|13 years ago|reply
[+] [-] yxhuvud|13 years ago|reply
Some languages have users that are less mature in VCS usage than others, some languages have users that spend a lot more time writing tests (which create larger commits) etc.
[+] [-] alexvr|13 years ago|reply
[+] [-] kevinnk|13 years ago|reply
[+] [-] rohern|13 years ago|reply
Does anyone here really doubt that you can get more done with a single line of Python than a line of C/Java/C++? Same for Clojure/Common Lisp/Racket versus Python.
We might not take individual ranking too seriously, and none of this affects language choice when performance is a critical concern (though the spacing between Scala, OCaml, and Go is interesting and relevant to this), but do you guys honestly doubt the trend here? Does anyone have a strong counter-example? It seems like the authors may have had a decent notion with using LOC as a measure. There is no proof of this here, but I am intrigued by it.
The final conclusions in favor of CoffeeScript, Clojure, and Python are again pretty obvious. Is anyone going to suggest JavaScript or C++ is more expressive than any of these?
[+] [-] dragonwriter|13 years ago|reply
So?
I mean, really, I can come up with completely bogus metrics all day, and whenever one produces results in a domain that happen to align with CW in that domain post a infographic using it, but that doesn't make that metric meaningful.
> The final conclusions in favor of CoffeeScript, Clojure, and Python are pretty obvious, I would think.</blockquote>
So? A metric that has no intrinsic validity doesn't become valuable just because it produces conclusions which match what you would have assumed to be true (whether based on valid logic or not) before encountering the metric.
[+] [-] stiff|13 years ago|reply
[+] [-] jlarocco|13 years ago|reply
Nobody will argue C is more expressive than Python, but the data in the article doesn't support it. Just because something is true doesn't mean it's okay to support it with shoddy data.
LOC per commit isn't a proxy measurement of the expressiveness of a language. The entire premise of the article is flawed.
[+] [-] btilly|13 years ago|reply
[+] [-] jdonaldson|13 years ago|reply
Javascript is way too expressive for its given position. I also believe ruby is more expressive than python, and yet the plot shows the opposite there as well.
This plot could have some interesting data, but there's far too much noise to really learn much from it.
[+] [-] apalmer|13 years ago|reply
[+] [-] igouy|13 years ago|reply
Maybe not if you follow PEP:8 -- maybe so if you write really really long lines ;-)
[+] [-] jmcqk6|13 years ago|reply
I've never understood this criticism before. Consider this line of python:
x = 3
To this line in C:
DoAllTheThings();
A single line of code is a bad comparison because it doesn't say anything about the underlying language or platform.
[+] [-] VeejayRampay|13 years ago|reply
It is really a joy to develop with (though my being a Ruby programmer probably makes me a somehow biased and enthusiastic candidate).
[+] [-] btucker|13 years ago|reply
[+] [-] chrisdevereux|13 years ago|reply
That's a big red flag on this as a measure of expressiveness.
[+] [-] danso|13 years ago|reply
[+] [-] tootie|13 years ago|reply
[+] [-] vec|13 years ago|reply
[+] [-] dsberkholz|13 years ago|reply
[+] [-] igouy|13 years ago|reply
The difference between Fortran Free Format and Fortran Fixed Format should be enough to tell us that lines of code per commit is all about how much stuff you put on a line!
(Is `nl` significant in the language syntax? Were readable wide screen displays available when the code was written?)
[+] [-] gjm11|13 years ago|reply
There is surely some correlation between short commits and expressiveness, but they're far enough apart that I think the title is very misleading.
[+] [-] tootie|13 years ago|reply
In Java, if your generic class has a lower bound you write Class Foo<T extends Bar> while in Scala you write def Foo[T<%Bar] which is just an abbreviation. Replacing a word with punctuation. One is good, one isn't.
[+] [-] tmsh|13 years ago|reply
It's pretty hard to disambiguate speed of development, fluidity and flexibility (which potentially increases LOC per commit by bundling multiple 'conceptual pieces') with expressiveness (which decreases LOC per commit) in a single LOC per commit metric.
The idea that a single commit corresponds to a 'single conceptual piece' is probably not very precise. It also doesn't measure for the complexity of the conceptual piece. It hasn't been established that the same level of 'conceptual pieces' are tackled across all programming languages per commit.
Just some thoughts. That said, I think though given all the factors involved (more expressive concepts per commit are perhaps tackled in more expressive languages, and that cancels out that less expressive concepts are tackled per commit in less expressive languages -- that balances out simplicity in simpler / less expressive languages can lead to more actual features commited), that actually the methodology kind of works. But, like others here, I wouldn't presume it's quite so simple underneath.
[+] [-] sharkbot|13 years ago|reply
[+] [-] fusiongyro|13 years ago|reply
As I get more serious about Prolog I think it's kind of a shame more people aren't exposed to it (apart from apparently dreaded university classes). It's pretty impressive just how quickly one can write a parser, and for writing an "internal" DSL it rivals or exceeds Lisp (depending on one's taste).
[+] [-] sliverstorm|13 years ago|reply
-- Perl user
[+] [-] Mithaldu|13 years ago|reply
[+] [-] pekk|13 years ago|reply
-- Python user
[+] [-] mikebabineau|13 years ago|reply
I think it's fair to say that nobody's ever written a forum in puppet, and that few people are working on micro-[web]frameworks in Fortran.
Interesting data, just need to be careful about what conclusions are drawn from it.
[+] [-] unknown|13 years ago|reply
[deleted]
[+] [-] jug6ernaut|13 years ago|reply
[+] [-] dsberkholz|13 years ago|reply
[+] [-] hp50g|13 years ago|reply
[+] [-] dsberkholz|13 years ago|reply
[+] [-] bjterry|13 years ago|reply
[+] [-] billsix|13 years ago|reply
This premise is complete rubbish.
[+] [-] coolsunglasses|13 years ago|reply