top | item 5438755

Programming languages ranked by expressiveness

80 points| dsberkholz | 13 years ago |redmonk.com | reply

88 comments

order
[+] tikhonj|13 years ago|reply
Yeah, most of this is just drawing conclusions from what is essentially noise. Especially the consistency measure: that's bound to be heavily affected by how many people use a given language. The so-called "tier 1" are not popular because they're consistent--they're consistent because they're popular. The same goes for the tier 3 languages in reverse: they probably have so little activity that a high variance is inevitable.

Also, number of lines per commit is not really a good measure of expressivity. I don't even see how it's a reasonable proxy: the number of lines I commit changes more depending on my project (is it simple or complex, for work or for fun?) than on my language.

Also, it makes literally no sense at all to consider very domain-specific languages like puppet: you may as well talk about how expressive HTML or CSS are relative to normal programming languages!

Basically, I think this article draws too many arbitrary conclusions on limited data.

[+] lkrubner|13 years ago|reply
"the number of lines I commit changes more depending on my project (is it simple or complex, for work or for fun?) than on my language."

This amounts to saying: Some days I eat eggs for breakfast, but other days I eat oatmeal for breakfast, so I am totally inconsistent in what I eat from day to day, therefore it would make no sense to include me on a survey about what people in my country eat.

You do understand that a single data point might be useless, but when combined with thousands or millions of other data points, it becomes useful? What you do on any one project does not matter, but what you do, combined with thousands of other developers, all averaged together, starts to get interesting.

If you honestly believed your own premise, then you would expect all the results to be the same -- there would be no variation between Fortran, Java, Javascript, Clojure or Coffeescript, cause, you know, everybody is different and does different stuff, and its all so crazy, how can anybody make sense of it?

But we can make sense of it. All that is needed is a good understanding of probability and a sufficiently large data set.

Mind you, the article above might be total bunk. There might be lots wrong with the dataset. But its not bunk for the reason you give: " the number of lines I commit changes more depending on my project (is it simple or complex, for work or for fun?) than on my language."

[+] rtfeldman|13 years ago|reply
Yes. It would have been better to publish the chart, and then on this line:

> What conclusions can we draw from this?

...answer the question with "None" and then explain why.

Following that with a deeper analysis of the myriad explanations for this data could have been interesting (e.g. what kinds of projects are strongly correlated with larger commits? How much does that vary by language and age of the project?), but unsupported conclusions are not.

[+] LukeHoersten|13 years ago|reply
I think language age is another point in support of your consistency argument. Older languages have had more massive library changes etc. whereas new languages or unused languages have libraries change much less and therefore less churn. It's why Lua can be at one end and Boo at the other. Or C# and Vala despite their similarities. Old languages with lots of users at the low-churn end of the spectrum are something to look at though.
[+] yxhuvud|13 years ago|reply
I'm not certain that commit size is a good measure of expressiveness. Even not counting obvious outliers like javascript (oops, 3241245 people had a commit with jquery inside), it will be very dependent on the cultures in the different languages.

Some languages have users that are less mature in VCS usage than others, some languages have users that spend a lot more time writing tests (which create larger commits) etc.

[+] alexvr|13 years ago|reply
So true. I would normally have trouble believing that assembly is more expressive than C, but maybe the people who write assembly tend to be excellent at what they do? And JS is more expressive than Java, no? Maybe some of these are skewed by the sort of programmers that use them.
[+] kevinnk|13 years ago|reply
I agree, for all the time he spends on his results, as far as I can tell he never really defends the idea that commit size is in any way a good proxy for "expressiveness." It would be nice if he could explain why he chose commit size over somthing like rosetta code size or some other metric.
[+] rohern|13 years ago|reply
There is a lot of weak-ass criticism going on in this thread when the data -- whatever about its methodology is troubling -- seems to almost perfectly back up what is the common experience among programmers. Yes, copy-and-paste doubtlessly affected the numbers for JavaScript, but I am not at all surprised to see JavaScript where it is.

Does anyone here really doubt that you can get more done with a single line of Python than a line of C/Java/C++? Same for Clojure/Common Lisp/Racket versus Python.

We might not take individual ranking too seriously, and none of this affects language choice when performance is a critical concern (though the spacing between Scala, OCaml, and Go is interesting and relevant to this), but do you guys honestly doubt the trend here? Does anyone have a strong counter-example? It seems like the authors may have had a decent notion with using LOC as a measure. There is no proof of this here, but I am intrigued by it.

The final conclusions in favor of CoffeeScript, Clojure, and Python are again pretty obvious. Is anyone going to suggest JavaScript or C++ is more expressive than any of these?

[+] dragonwriter|13 years ago|reply
> There is a lot of weak-ass criticism going on in this thread when the data -- whatever about its methodology is troubling -- seems to almost perfectly back up what is the common experience among programmers.

So?

I mean, really, I can come up with completely bogus metrics all day, and whenever one produces results in a domain that happen to align with CW in that domain post a infographic using it, but that doesn't make that metric meaningful.

> The final conclusions in favor of CoffeeScript, Clojure, and Python are pretty obvious, I would think.</blockquote>

So? A metric that has no intrinsic validity doesn't become valuable just because it produces conclusions which match what you would have assumed to be true (whether based on valid logic or not) before encountering the metric.

[+] stiff|13 years ago|reply
If you only validate research via checking if it "agrees with experience", then what's the point of doing it in the first place?
[+] jlarocco|13 years ago|reply
There's enough good data backing up that conclusion that there's no point using crappy data like in the article.

Nobody will argue C is more expressive than Python, but the data in the article doesn't support it. Just because something is true doesn't mean it's okay to support it with shoddy data.

LOC per commit isn't a proxy measurement of the expressiveness of a language. The entire premise of the article is flawed.

[+] btilly|13 years ago|reply
I personally think that the poor methodology of this post would never have survived to see the light of day if the conclusions did not match what programmers expect. Conversely the methodological flaws mean that we should be very careful about accepting the data for any conclusion beyond, "Well, it looks like what I expect."
[+] jdonaldson|13 years ago|reply
I think this plot begs a different question... which languages are being abused by the development community?

Javascript is way too expressive for its given position. I also believe ruby is more expressive than python, and yet the plot shows the opposite there as well.

This plot could have some interesting data, but there's far too much noise to really learn much from it.

[+] apalmer|13 years ago|reply
Actually, I think thats exactly the problem. There is a general perception based on anecdotal experience, followed by a non rigorous 'scientific' data 'experiment', followed by analysis of results which throws out all the data which disagrees with the original perception. Look at the actual results, they dont really show a strong correlation with 'common experience amound developers',
[+] igouy|13 years ago|reply
>>you can get more done with a single line<<

Maybe not if you follow PEP:8 -- maybe so if you write really really long lines ;-)

[+] jmcqk6|13 years ago|reply
>Does anyone here really doubt that you can get more done with a single line of Python than a line of C/Java/C++?

I've never understood this criticism before. Consider this line of python:

x = 3

To this line in C:

DoAllTheThings();

A single line of code is a bad comparison because it doesn't say anything about the underlying language or platform.

[+] VeejayRampay|13 years ago|reply
The good thing about Coffeescript is that it actually seems to deliver what it was promising in the first place, terseness, expressiveness, simplicity and good Javascript output.

It is really a joy to develop with (though my being a Ruby programmer probably makes me a somehow biased and enthusiastic candidate).

[+] btucker|13 years ago|reply
Now I want to see the same thing, but based on the length of the commit message instead.
[+] chrisdevereux|13 years ago|reply
Wait... JavaScript and CoffeeScript end up at opposite extremes while having near-enough identical semantics?

That's a big red flag on this as a measure of expressiveness.

[+] danso|13 years ago|reply
It's hard to tell if the fact that CoffeeScript is #6 (the highest ranked major language) and Javascript is #51 (second to last) is a reflection on how much of a shift CS is from JS, or on the quality of methodology and metrics in the OP.
[+] tootie|13 years ago|reply
I'd say it's a byproduct of the fact that only in-the-know developers are using CoffeeScript, while JavaScript is being written by every rank-and-file amateur web dev in the world. There is likely an inverse correlation between language popularity and median code quality.
[+] vec|13 years ago|reply
I'm thinking that's mostly due to the testing methodology. In JS it's pretty common to start a project by downloading and committing jQuery, Underscore, and whatever framework dependencies you're using. And just speaking from experience, coffeescript is better, but it's not _that much_ better (I'd guess ~1.5 lines of JS per line of CS, the chart shows somewhere around 8:1).
[+] dsberkholz|13 years ago|reply
Yeah, I agree — I really doubt it's that large of a shift, as I mentioned under "Specific language effects." I find it hard to believe that JavaScript could really be less expressive than C and assembly.
[+] igouy|13 years ago|reply
> let the results speak for themselves

The difference between Fortran Free Format and Fortran Fixed Format should be enough to tell us that lines of code per commit is all about how much stuff you put on a line!

(Is `nl` significant in the language syntax? Were readable wide screen displays available when the code was written?)

[+] gjm11|13 years ago|reply
Note that in fifth place -- ahead of Coffeescript, Clojure, Python, etc., etc., -- is eC, a that's basically C plus a few OO features and a GUI library. It has, for instance, no automatic memory management (besides some rather primitive refcounting); neither dynamic typing nor type inference; no nice literal syntax for collection types; in short, whatever its merits it is not an outstandingly expressive language.

There is surely some correlation between short commits and expressiveness, but they're far enough apart that I think the title is very misleading.

[+] tootie|13 years ago|reply
Terse isn't the same as expressive. Consider Scala vs Java. In Java, definining a singleton involves implementing a design pattern in one of a few ways. Maybe you have a private constructor and a static initializer. In Scala you define your class with `object` and it's done. That's concise and expressive.

In Java, if your generic class has a lower bound you write Class Foo<T extends Bar> while in Scala you write def Foo[T<%Bar] which is just an abbreviation. Replacing a word with punctuation. One is good, one isn't.

[+] tmsh|13 years ago|reply
The real measure is features shipped.

It's pretty hard to disambiguate speed of development, fluidity and flexibility (which potentially increases LOC per commit by bundling multiple 'conceptual pieces') with expressiveness (which decreases LOC per commit) in a single LOC per commit metric.

The idea that a single commit corresponds to a 'single conceptual piece' is probably not very precise. It also doesn't measure for the complexity of the conceptual piece. It hasn't been established that the same level of 'conceptual pieces' are tackled across all programming languages per commit.

Just some thoughts. That said, I think though given all the factors involved (more expressive concepts per commit are perhaps tackled in more expressive languages, and that cancels out that less expressive concepts are tackled per commit in less expressive languages -- that balances out simplicity in simpler / less expressive languages can lead to more actual features commited), that actually the methodology kind of works. But, like others here, I wouldn't presume it's quite so simple underneath.

[+] sharkbot|13 years ago|reply
There are a few surprises in that list. I'd have thought both Coq and Lua would end up further to the left. I suppose the graph shows an interesting dynamic: expressiveness is not just due to a language's semantics, nor syntax, nor libraries. You need a healthy interplay between all three aspects of a programming language to truly be "expressive".
[+] fusiongyro|13 years ago|reply
The whiskers on the Prolog and Logtalk plots would tend to confirm what PG had said in one place or another, that a language can be both more high level and less expressive, apparently using a definition of expressive related to line count. In that light I'm kind of surprised they do as well as they do in this post.

As I get more serious about Prolog I think it's kind of a shame more people aren't exposed to it (apart from apparently dreaded university classes). It's pretty impressive just how quickly one can write a parser, and for writing an "internal" DSL it rivals or exceeds Lisp (depending on one's taste).

[+] sliverstorm|13 years ago|reply
Hah! Take that, Python aficionados!

-- Perl user

[+] Mithaldu|13 years ago|reply
You'll find that no Perl user would actually stoop to that level.
[+] pekk|13 years ago|reply
What matters more is how much is expressed to you when you have to read your own code again in a year - or when someone else does

-- Python user

[+] mikebabineau|13 years ago|reply
This assumes that languages are used to solve similar classes of problems, or that, in aggregate, those different classes result in similarly-sized "feature" chunks.

I think it's fair to say that nobody's ever written a forum in puppet, and that few people are working on micro-[web]frameworks in Fortran.

Interesting data, just need to be careful about what conclusions are drawn from it.

[+] jug6ernaut|13 years ago|reply
Does this data take into effect the total lines of code in the project being committed to? Would this not have an effect on the results?
[+] dsberkholz|13 years ago|reply
No, it's a combined total across all ~7.5 million open-source projects in Ohloh, so (potentially real) effects like that are averaged out.
[+] billsix|13 years ago|reply
>One proxy for this is how many lines of code change in each commit

This premise is complete rubbish.

[+] coolsunglasses|13 years ago|reply
APL and J would win this handily, but we don't use those languages for a reason.