top | item 13443128

Most used words in programming languages

194 points| Walkman | 9 years ago |anvaka.github.io | reply

182 comments

order
[+] Noughmad|9 years ago|reply
In most languages, the top words are self/this and function/def. On the other hand, in Go, the most common word is "err", and "if err != nil" are three of the top four words. I really wonder how big part of the code in Go is just propagating errors.
[+] grabcocque|9 years ago|reply
One day, maybe half a decade from now, Rob Pike will wake up from a nightmare and suddenly realise they got this one wrong. Maybe it'll occur at the same time as the generics one.
[+] rebeccaskinner|9 years ago|reply
There are some was to make error handling much less annoying in go. Since error is just an interface it's pretty easy to make monadic constructs that can carry the error information and let you write pipelined code. IMHO it makes the code much cleaner and easier to read, but a lot of golang enthusiasts haven't really adopted the idea because it's quite different from the established idioms around error handling. Hard-core gophers love them some verbose and stupidly explicit code; but I'm hoping that as the language gets adoption in the wider community the voice of reason will win here.
[+] s_kilk|9 years ago|reply
Same with many node apps. Every other statement is:

    if (err !== null) {
      return callback(err);
    }
[+] flgr|9 years ago|reply
I was thinking about forking the language just so I could add a Maybe monad.

I now typically deal with this via a Must() function that takes to values and return the non-err value or panics if there was an err.

That means you can call it like con := Must(db.Connect(...)).(db.Connection)

[+] AsyncAwait|9 years ago|reply
To be honest, part of the reason for this is that in Go, 'self' is any name you see fit in the particular context, otherwise it may still be self.
[+] oelmekki|9 years ago|reply
This made me smile as well. While I'm now used to it, I have troubles saying there's nothing wrong in having `err` a more common word than `if` :)
[+] z3t4|9 years ago|reply
Letting errors surface is a good thing.
[+] fagnerbrack|9 years ago|reply
"err". Because "error" is too hard to type, right?
[+] nonsince|9 years ago|reply
Lots of uses of "the" in C++ code, massively skewed by giant projects that have the massive section of boilerplate copyright info in a comment at the top of every single file. I didn't realise until I looked at this, but although that's really common to see in C, C++, and Java you barely ever see it in the languages that I spend time in (Rust, Haskell, C#, Python, various flavours of Lisp, JavaScript).
[+] adrianN|9 years ago|reply
Wait another fifteen to twenty years or so until Python and Javascript are "enterprise ready" and this will change.
[+] konsnos|9 years ago|reply
In C# the first word is "summary" which is massively influenced by the way Visual Studio formats comments. I wonder though if this is an indication that a lot of people write comments.
[+] chriswarbo|9 years ago|reply
Would be nice to see Haskell included; would ".", "$", "<$>", "<*>", ">>=", etc. count as words? ;)

Made even better by the use of the language's logo as the word clouds shape; the Haskell logo is ">λ=" https://www.haskell.org/static/img/haskell-logo.svg

[+] TuringTest|9 years ago|reply
The layout algorithm for the word cloud is awesome! How is it made?
[+] ddavis|9 years ago|reply
The slow adoption of modern C++ is very apparent. Some things not event listed: forward, unique_ptr, shared_ptr, tuple, constexptr. nullptr is much lower than NULL and move is quite low. These features are now 6+ years old! ;)
[+] mschuetz|9 years ago|reply
Smart pointers realy need a native way to be specified, similar to how we use & to declare references and * for pointers. Filling your code with shared_ptr<Something> and the likes is just not going to win the majority over, even if it's useful.
[+] kosma|9 years ago|reply
This might just be my Python upbringing, but... am I the only one to be troubled by Go's single-letter words? I've always found Go code very hard to read because it isn't self-descriptive at all.
[+] gribbly|9 years ago|reply
Depends, if it's self evident in the context what it means, like in a method which is declared as (p *player) Attack, p makes perfect sense (to me) rather than typing 'player' everywhere in the function, just like typing i instead of index.

In short, I don't think there is a particular rule that applies, that said my impression is that a lot of variable names in Go code are typically three letters or at least two, like err, buf, src, dst, ok etc.

[+] kps|9 years ago|reply
Go is a language in the almost-forgotten algebraic tradition (loosely descending Fortran → Algol → BCPL → C → Go), in which people prefer E=mc² to MULTIPLY REST-MASS BY SPEED-OF-LIGHT-IN-A-VACUUM BY SPEED-OF-LIGHT-IN-A-VACUUM GIVING ENERGY.
[+] the_duke|9 years ago|reply
Did you mean single-syllable words?
[+] Walkman|9 years ago|reply
When you have a strong static type system like Go has, you don't need descriptive names that much, because even single letter variable names are evident, because the declarations are there, close to the variable names. It needs a bit of getting used to if you only programmed dynamic or scripting languages.
[+] dustinmoris|9 years ago|reply
I am quite surprised that "self" is so much more used than the next word in the list "if". I know many languages where "self" is not a keyword at all, but I cannot think of a single language where "if" is not a keyword.

EDIT: Ops, I missed that there is a language filter. Ignore my comment :)

[+] x1798DE|9 years ago|reply
This is broken down by language, the linked list is for python, so it just means that self is more frequently used than if in python.
[+] goatlover|9 years ago|reply
I don't think if is a keyword in Smalltalk. IO might not have it either.
[+] RodericDay|9 years ago|reply
assignments are probably way more common than if-driven program flow control
[+] ape4|9 years ago|reply
Besides "return", C/C++ doesn't have a particular word that stands out. Probably because you just write write things. eg you don't put "function" in front of a function.
[+] macygray|9 years ago|reply
Cool stats about Java. "import" is the most frequently used word. It looks like everything is already exist in Java, so just import all the things, some clue code and you are done.
[+] gravypod|9 years ago|reply
Then why isn't this the case for python. It's as equally as kitchen sink, right?
[+] mschuetz|9 years ago|reply
The world would be a much better place if self(Python) and this(Javascript ES6 classes) were implicit.
[+] mpjme|9 years ago|reply
This would improve a lot if they filtered out comments.
[+] WatchDog|9 years ago|reply
I was thinking the opposite, I would like to see a version with all of the language reserved symbols removed.
[+] chriswarbo|9 years ago|reply
Really? I think it's interesting to see "TODO" feature quite prominently in Python, for example :)
[+] donatj|9 years ago|reply
I think it speaks well towards the ferocity and forced completeness of Go's error handling that err is the most used word.
[+] di4na|9 years ago|reply
I would say it shows that this is a place where the compiler could help :D
[+] cpsempek|9 years ago|reply
I do appreciate this, but word clouds really are a terrible visualization method for text data. With regard to the python example, I cannot grasp at all if the frequency of self and None are similar or drastically different. The table on the value is more informative and less likely to misread.
[+] enitihas|9 years ago|reply
Looking at Scala, the difference between val and var is huge, with val being at 2nd, and var at 38.
[+] virtualwhys|9 years ago|reply
Usage of `var` as idomatic Scala is an oft used trolling mechanism.

I wonder what percentage of `_` usage is value/type discarding in pattern matching and type signatures vs. function application.

Would be nice to somehow ditch `case`:

    adt match {
      Foo(x) if cond x => ...
      Bar(x) => ...
    }

    pairs.map{ (a,b) =>
      ...
    }
[+] nrinaudo|9 years ago|reply
And else is used more than if...
[+] Walkman|9 years ago|reply
One can make very interesting conclusions based on purely this. Examples:

- Python developers does not follow Clean Code (ala Uncle Bob) as much as Ruby , because if statement is more frequent than def and return.

- Ruby makes it possible to write in a much more functional style than Python. OR Ruby developers like to develop more in a functional style than Python developers.

- People don't really care about good variable names ("a" is a terrible variable name in scripting languages like JS and Python, still top 11)

- PHP developers might practice "return early" in functions (more return than function keywords) OR their functions just do too much :)

[+] brak1|9 years ago|reply
'a', 'the' etc seem to be from comments, not 'actual' code

(click them and you can see examples of how each word is being used)

[+] dagw|9 years ago|reply
- Ruby makes it possible to write in a much more functional style than Python. OR Ruby developers like to develop more in a functional style than Python developers.

I instinctively think you might be right (at least with your second statement), but what are you using as your metric here?

[+] Insanity|9 years ago|reply
I was really amused by the logos / names in the word clouds. Nice project!
[+] pcwalton|9 years ago|reply
The Rust compiler seems somewhat overrepresented in this data set. I see "ccx", "fcx", and "CrateContext", which are only used in the Rust compiler itself.
[+] questerzen|9 years ago|reply
It is probably a sign of a good language that the words used most should be similar in frequency to their use in pseudo code. When words like "end" (Ruby), "self" (Python), "import" (Java), "err"/"error" (Go and Node) are over-represented, it's likely a sign that the language is introducing accidental complexity. By this metric Swift looks astonishingly sane.
[+] realworldview|9 years ago|reply
It would be interesting to see the frequency of words found in comments—TODO, FIXME, LATER, OMG—foreach language too.
[+] cven714|9 years ago|reply
Pretty cool! Some unexpected results, or at least not what I guessed. "summary" as the top for C#, "SELECT" all the way down at #43 for SQL, "err" as the top for Go (I'm sure that will spawn some pleasant discussion).
[+] pepve|9 years ago|reply
I think for SQL their sample includes just ".sql" files, which tend to contain schema definitions and data dumps, hence CREATE and INSERT. Most of it not handwritten also.