Felienne Hermans: How patterns in variable names can make code easier to read

[+] DonHopkins|3 years ago|reply

I like to use "big-endian" naming molds (love that term!) to define sets of names that when you alphabetize them place related variables next to each other. (i.e. in a completion menu or browser.)

For example, left_foo and right_foo are little-endian, since the least significant word comes first, so they'll be a long distance away from each other in an alphabetized list.

But foo_left and foo_right are big-endian, since foo is more significant than left or right. So they will appear one after the other in an alphabetized list.

Common suffix words are _x _y _z or _min _max, or _left _right _top _bottom, of even singletons like _enabled _loaded _error etc.

But when you combine multiple dimensions together in names, you need to think of which dimensions are more significant, based on how the variables are used, so use foo_x_min foo_x_max, if the positions are important, or foo_min_x foo_min_y, if the ranges are more important.

Sometimes it's hard to decide or ambiguous, so just try to be predictable and the same as all the other code. Think of which variables should appear closest to each other in an alphabetical list.

And avoid middle-endian or random-endian (or sentence-grammar-order-endian) like the plague. A variable name should probably not be a grammatically correct sentence.

Another really annoying linguistic naming smell is "smurfing," where all of class Smurf's instance variables have smurf_ prefixes. Or where all the classes, methods, or instance variables have an "xyz_" prefix where "xyz" is the name of the project or library. Arrgh!!!

[+] elliekelly|3 years ago|reply

I really like this concept but I find it a bit frustrating that the name for the naming convention doesn’t follow its own convention. Shouldn’t it be called “endian-big”? ;)

[+] mankyd|3 years ago|reply

There's an interesting question that arises when you says "when you alphabetize them place related variables next to each other".

Let's say you have some non-trivial class that includes, among others, some 2d rectangular data: An x, y, width, and height. They're all related, but they don't naturallly occur near each other without a little massaging:

coordX, coordY, sizeWidth, sizeHeight?

xMin, xMax, yMin, yMax?

coordXMin, coordXMax, coordYMin, coordYMax?

I generally agree with your sentiment, but there's a reason "naming things" is one of the hardest problems in computer science :)

[+] lmm|3 years ago|reply

"Smurfing" and "big-endian" are the same thing though!

IMO a big alphabetical list of everything in your project is not a useful or important thing. Use a language that has good support for hierachical namespaces, and use them.

[+] unknown|3 years ago|reply

[deleted]

[+] SnowHill9902|3 years ago|reply

Agreed. When dealing with real values, it’s favorable to explicit the units: weight_lb, length_cm.

[+] reidjs|3 years ago|reply

Loved how short and to the point it was. If you don’t have time to watch, the idea is it’s incredibly rare for two devs to come up with the same names for vars. To increase the odds of coming up with the same names for vars, you should agree on naming conventions (name molds) as a team. Sounds obvious, but great science is often confirmations or denials of the obvious.

[+] extrememacaroni|3 years ago|reply

Patterns in the way code looks in general are invaluable for parsing code quickly especially in areas that you're somewhat familiar with. You can discard/ignore big chunks of code very quickly and go straight to where you think the relevant part is if they look as you'd expect at a glance. If they don't, it's sort of like a cache miss. "What the hell why don't people autoformat their goddamn files before saving" and then read those bits of code just to make sure they're not hiding any surprises, before formatting them properly.

It's the difference between taking, say, 2 seconds to read a method, and 10 or more.

I can only assume people who don't treat code formatting as a rule read every.single.thing.line.by.line.every.time.

[+] nickjj|3 years ago|reply

Aren't linguistic names a form of mold?

I find Rails' conventions are very good around this, for example datetime fields end with _at and dates end with _on. This way you end up with variable names like published_at or published_on depending on if you care about the time or not. It sounds so natural.

The idea of using ? to end a variable name for booleans is great too.

It's the opposite of cognitive load because you can glance at a name and know what it is without knowing more about it. If the implementer of a linguistic named function does bad things to break the expected behavior then you shouldn't blame the method -- that's a user error.

Personally I find consistent names more important for CLI tools, kubectl's CLI is good in this department for being consistent. You can predict how each command works by knowing the pattern. They went with a "verb noun" style. I don't think one is necessarily better than the other but being consistent does help for CLIs because you often need to recall what to run by memory, CTRL+r history or running the command incorrectly to get a help menu on what you can run. However a code editor gives you a lot more help with auto-complete or buffer-complete for function or variable names.

For naming things in programming, I'm not 100% convinced a hard pattern based standard makes sense because naming is very subtle, sometimes you want the emphasis on the "thing" or an emphasis on the "action" depending on the context -- basically which one is more important for that specific instance.

For her open question of "what would you name a variable for storing the maximum number of orders per month", that's an incomplete question. What's the context behind it? Is this variable defined as a constant somewhere? What other functions are in that module or class? How do you plan to use the variable? Will it be used in more than 1 spot? Is it part of a library that third party folks can use or limited to 1 code base? Will there be other similar variables, such as getting weekly or yearly orders?

[+] DonHopkins|3 years ago|reply

My name is Don, so every time I see a column called "createdon" I think it's a boolean flag that you can set true to create me. I wish the db designer would use snake case instead of mashing all the words together. But then again, I keep my ssh key in a file called donkey.pem.

The "big-endian naming mould" suggests naming it orders_per_month_max, since orders is the object (most significant), per_month is a count of orders (secondary significance), and max is a constraint of the order per month count (least significant).

Then you can use other parallel names in the same big-endian pattern, like orders_per_month orders_per_year orders_per_year_max orders_per_second_min refunds_per_year_average etc, and they will all sort next to their closely related names, instead of the "inline max" or "prefix max" scrambling the alphabetical order.

[+] hinkley|3 years ago|reply

These are the sorts of 'style guides' that we need. I started boycotting 'style' meetings at new companies ages ago because it always turned into a bunch of people using up all of their time, energy, and social capital arguing about where the curly brackets go and how whitespace should be handled. These are things a machine can do for you. We shouldn't be wasting our breath on them.

As far as 'consistent' names go, there are multiple dimensions of sameness. Using the same word for all instances of the same concept, not using the same word for other concepts, using consistent pluralization. Using same adjective/adverb/gerund form for related concepts. You are telegraphing sameness in these cases, and difference in others.

We have tried things similar to what you describe before, we just have dialed it in wrong. New-ish, good ideas often fall prey to bad execution. Hungarian notation, for instance, dictates that the variable name stays the same when the sense of the data changes, but is supposed to change when the implementation details shift. Which is exactly the opposite of what we want. If I fix a Y2K bug or a 2038 bug in due_on, I'm going to end up with a slightly different structure, but the deadline it represents is still 12 midnight. And if it's not, well, maybe we need a different convention for calendar day versus business day deadlines.

[+] nicoburns|3 years ago|reply

I also really like is_ and has_ prefixes.

[+] bjourne|3 years ago|reply

Very interesting video. I'm convinced that this is a very under-explored area of software engineering and that proper naming is at least 50% of developer productivity. Often it doesn't matter how well-structured a code base is, if the function and variable names are nonsensical the code will still be very hard to read.

[+] throwawayboise|3 years ago|reply

This is actually not a new idea at all.

I once worked in a place in the 1990s that took it to such an extreme that every table name, column name, and variable name had to be approved by a naming standards committee before it could go into production. IIRC the committee met once a month, maybe twice? Which was not ideal for the developers but changes only went to production once a month during a "change window" anyway.

Naming conventions can help with code readability, but don't let the process become more important than the goals.

[+] prettyStandard|3 years ago|reply

Agree. The name mold I like best is nounAdjective like Spanish rather than adjectiveNoun like English.

I wouldn't mind you poking holes in my logic here.

https://soft-wa.re/naming-conventions

To use her example. I would have chosen ordersPerMonthMax. Which would probably sort alphabetically nicely with ordersPerDayMin and ordersPerYearAverage.

Now that I know "name-mold" would be a good query, I might find something better than the Spanish name-mold.

[+] jonahx|3 years ago|reply

               Wide Scope     Narrow Scope
             +-------------+-------------
    Function | Short Name  |  Long Name
             +-------------+-------------
    Variable | Long Name   |  Short Name
             +-------------+-------------

    I can’t quite explain why this works

I'll take a shot...

The general principle uniting all 4 quadrants of the table is: "Use names just long enough to be clear, but not longer."

Here's an illuminating exception to the heuristics: The use of the very short global "DB" for database.

We are really trying to balance two competing goals:

1. Brevity -- Don't explain what I already know. You mention this in relation to a tight loop variable: "I bet you didn’t need me to explain dL stood for Drivers License. It might have even annoyed you if I had spelled it out."

2. Clarity -- Don't confuse me. Don't make me look something up to figure it out.

Maximize brevity while retaining clarity.

Clarity is related to frequency of use. This relates to your comment: "How come the jQuery constructor feels much more natural than the native version? document.querySelectorAll('#appContainer')". It is annoying because we use it all the time... we don't need or want a verbose description.

If the thing is used everywhere, and especially if it is a general convention, assume familiarity. Sure, someone might be confused by "DB" the first time they ever see it, but it will quickly become part of their lexicon and remain so through repeated exposure. However, the same cannot be said for "CGTAO" as a stand in for "cudaGetTextureAlignmentOffset". In that case, the long form is what I want.

We handle these principles effortlessly with our use of "he" vs "John" vs "John Smith" vs "the John Smith you went to highscool with" but for some reason have trouble with them when writing code.

[+] jackblemming|3 years ago|reply

This was noted in Code Complete too, so you're probably in good company.

[+] RhysU|3 years ago|reply

Definitely nounAdjective.

Alphabetically {a, b, c} × {Min, Max} is soo much nicer than the converse. Especially in lists dozens of items long.

[+] ojintoad|3 years ago|reply

The Programmer's Brain is my favorite read this year, highly recommend

https://www.manning.com/books/the-programmers-brain

[+] teddyh|3 years ago|reply

Making Wrong Code Look Wrong: https://www.joelonsoftware.com/2005/05/11/making-wrong-code-...

[+] theranger|3 years ago|reply

Could you please update the title with [video] so that we know what to wait from that link.

[+] azeirah|3 years ago|reply

It says youtube in the link

[+] beebeepka|3 years ago|reply

Like most things, it's a double edged sword. I haven't worked with world-class developers so most of my experience is dealing with people who would benefit immensely from any linguistic practice.

If you think someone comes up with bad names, wait till they have to write a few sentences, or paragraphs.

[+] funstuff007|3 years ago|reply

She also has a number of talks on Excel (for the HN crowd) up on YouTube are worth the watch.

[+] armchairhacker|3 years ago|reply

Sometimes I create variable names like "runProcessAsync" (instead of "asynchronouslyRunProcess"), "setIsActive" (instead of "setActive"), and even use shorthand vs non-shorthand (e.g. "src" vs "source") in different contexts.

It abuses the English language but makes code much easier for me to read. Most of the time I don't even realize I'm doing it.

But does it make the code easier for others to read? The first 2 steps I think so, and I've seen them in other projects. The last one probably not, and I try to avoid it and use more descriptive names (like "srcPath" and "srcData") when I spot myself making it.

[+] gfaregan|3 years ago|reply

My only rule for variable names is to never use abbreviations.

[+] DonHopkins|3 years ago|reply

I strongly agree. There's only one correct way to spell a word, but many different possible abbreviations. The hard part is remembering just WHICH letters to leave out, not typing the letters.

[+] elevaet|3 years ago|reply

I have a terrible habit of mixing camelCase with snake_case. I'll start out using snake_case because I find it slightly more readable, but then use some library that has camelCase methods, and before I know it's all a bit of a hodge-podge. (Or is that hodge_Podge?)

[+] epgui|3 years ago|reply

You're a monster.

[+] astrange|3 years ago|reply

It seems to me that ideally, if a variable name is so predictable that you can name it by rules, that’d be an opportunity for the language to not require a name.

But in practice $0 and Haskell’s point-free styles can be annoying to read, so maybe what I want is the IDE to insert obvious names.

[+] marcosdumay|3 years ago|reply

You still have to say what of the many obvious things you are using here.

Point-free syntax has a different kind of namelessness, where if you have a single thing, you don't have no name it. And the $0 is really a limitation of the language, nobody ever though it was a good thing.

[+] eckza|3 years ago|reply

This is, in essence, Hungarian notation. It's great.

I wrote about it a few months ago:

https://dev.to/jmpavlick/hungary-for-the-power-a-closer-look...

[+] Supermancho|3 years ago|reply

While there is evidence that hungarian notation + camelCase is better for token usage - eg variables

Underscores are better for readability other kinds of things like unit test names or filenames. Because humans tend to shortcut, it becomes camelCase for everything, including other inappropriate attributes out of laziness, which is aggravating. It's too bad that distinction has not been properly subjected to rigor yet.

62 comments