Why I'll never abbreviate a variable as long as I live

[+] benkuykendall|9 years ago|reply

"Never abbreviate a variable" is a very strong statement, surely inspiring religious wars. And maybe there are edge cases: i as a loop counter, id for an integer primary key, whatever. But this example is something else entirely; "ttpfe" is honestly the worst variable name I have every seen.

[+] dgcoffman|9 years ago|reply

Ah, good ol' TransactionTemplatePayloadFactoryExtension.

[+] sengstrom|9 years ago|reply

Have sufficient Hamming distance could be a better maxim!

[+] other_herbert|9 years ago|reply

I've seen where the typing skill of the person implementing something will affect how verbose they are...

[+] cbd1984|9 years ago|reply

Historically, i wasn't even an abbreviation: It was the first variable name which would be assumed to be integer by FORTRAN compilers which implicitly assigned types to variables based on name. The choice was probably further influenced by longstanding mathematical tradition, which uses i and j as indices.

(You could declare types and the compiler would respect it, leading to the old truism "GOD is REAL, unless declared INTEGER".)

(If you think that's the weirdest thing old FORTRAN did, look up the arithmetic IF statement sometime. Then, look up assigned GOTO.)

[+] mmaunder|9 years ago|reply

When hiring we ask for sample code from developers written specifically for the job application. We place a really high premium on readability. Occasionally we'll get code from devs that is very concise - not just abbreviated variable names, but complex long statements using ternary etc.

I'll usually ask them to resubmit code and really focus on readability and it turns out fine. But I think there might be a misconception among devs where the thought is that really compact code shows talent.

Our ideal coder is someone who cares deeply about performance and where the code is ridiculously easy to read and trace through. I mention performance, because in some situations making things more verbose can affect that.

[+] component|9 years ago|reply

Whenever we interview a candidate, the one thing we give high premium on is structure, the rest come next.

bad variable names, long complex code...can easily be adjusted. If the candidate doesn't have a good understanding of how to structure a code (what goes where, for what purpose, separation on concern...), it'll take much more effort to teach that candidate than to teach the bad variable namer.

Properly structured code naturally leads to a better performing software, even if/when a bug arises it'll be easier to spot / test.

All that being said, if a coder names a variables, 'a', 'aa', 'aaa', 'aaaa' that's a serious red flag.

[+] new_hackers|9 years ago|reply

The older I get, the less I value "cleverness" in programmers. I used to delight in writing code golf one-liners. But now I realize it detracts from solving the hard problems, and it makes simple problems harder.

[+] knocte|9 years ago|reply

I'm sorry, how can verbose code impact performance???? Please elaborate.

[+] gavinpc|9 years ago|reply

Great writeup, full of sharp details.

In the 90's I worked on a FoxPro application, also for DoD. My boss, a retired Navy Captain, used very terse variable names, partly because he was a hunt-and-peck typist, and partly because earlier languages allowed only two-character names.

But FoxPro allowed variable names of any length. However, it only recognized the first ten characters. To my boss's chagrin, I often used names longer than ten characters, risking collisions with other long names. That never happened.

But ten years after leaving, I was hired to port the product to Visual FoxPro, which does recognize the whole name. Some of the early commits were "Fixed inconsistently-used long variable name..."

Of course, in those days we weren't using linters, or tests, or source control, or reproducible builds... and yet still had a business. No wonder Alan Kay calls computing "not quite a field."

[+] Roboprog|9 years ago|reply

I remember in college in the mid 80s trying to get something to work on one of the Apple IIs in the library in BASIC, only to discover that my earlier experience with the disk based interpreter supported 6 char names, but the ROM based interpreter only recognized the first 2 chars, but would accept longer names it would not differentiate.

Hilarity ensues...

And, yeah, I also did a lot of XBase in the late 80s, though usually "Clipper", rather than "Fox".

Batch files made a decent build process, and you could be disciplined with regular arc/zip files for source - not too unlike "make clean" and svn. We certainly had a lot of cruddy manual processes otherwise back then, though.

[+] foota|9 years ago|reply

Hearing that absolutely terrifies me like?? Silently allowing names to collide in that way just sounds ridiculous to me.

[+] stevep98|9 years ago|reply

I was recently implementing a geometry algorithm which I looked up on quora. It was described using typical vector notation, using r,s . t,u. Since I referenced the algorithm in the comments, I decided to use these same variable names in my code.

I think this is the right choice, but my code reviewer didn't. But he didn't click on the quora link.

Why is okay for mathemeticians to abbreviate things but programmers? Is it because they deal in more abstract entities where the name is irrelevant?

[+] dang|9 years ago|reply

In math, notations are designed to make statements about the problem domain concise. Once you pass a certain degree of concision, longer names impede readability rather than enhancing it. That is because the ability to take in an entire complex expression or subexpression at a glance tells you things—and lets you see patterns—that wouldn't be as apparent if longer names were used. Programmers in the APL tradition understand this, but most programmers do not. (Many refuse to believe it's possible when they hear about it!)

In software, programmers have grown accustomed to a notion of readability that derives from large, complicated codebases where unless you have constantly repeated reminders of what is going on at the lowest levels (i.e. long descriptive names) there is no hope of understanding the program. In such a system, long descriptive names are the breadcrumbs without which you would be lost in the forest. But that is not true of all software; rather, it's an artifact of the irregularity and complexity of most large systems. It's far less true of concise programs that are regular and well-defined in their macro structure.

In the latter kind of system, there's a different tradeoff: macro-readability (the ability to take in complex expressions or subprograms at a glance) becomes possible, and it turns out to be more valuable than micro-readability (spelling out everything at the lowest levels with long names).

It also turns out that consistent naming conventions give you back most of what you lose by trading away micro-readability, and consistent naming conventions are possible in small, dense codebases. That of course is also how math is written: without consistent naming conventions and symmetries carefully chosen and enforced, mathematical writing would be less intelligible.

Edit: The fact that readability without descriptive names is widely thought to be impossible is probably because of how little progress we've made so far in developing good notations, and tools for developing good notations, in software. This may not be so hard to understand: it took many centuries to develop the standard mathematical notations and good ways of inventing new ones to suit new problems. Mathematics is the most advanced culture we have in this respect, and in computing we're arguably still just beginning to retrace those steps. If we wrote math the way we write software, mathematics as we know it wouldn't be possible.

Edit 2: The best thing on this is Whitehead's astonishingly sophisticated 1911 piece on the importance of good notation: http://introtologic.info/AboutLogicsite/whitehead%20Good%20N.... If you read it and translate what he's saying to programming, you can glimpse a form of software that would make what people today call "readable code" seem as primitive as mathematics before the advent of decimal numbers seems to us. The descriptive names that people today consider necessary for good code are examples of what Whitehead calls "operations of thought"—laborious mental operations that consume too much of our limited brainpower—which he contrasts to good notations that "relieve the brain of unnecessary work".

Applying Whitehead's argument to software suggests that we'll need to let go of descriptive names at the lowest levels in order to write more powerful programs than we can write today. But that doesn't mean writing software like we do now, only without descriptive names; it means developing better notations that let us do without them. Such a breakthrough will probably come from some weird margin, not from mainstream work in software, for the same reason that commerce done in Roman numerals didn't produce decimal numbers.

[+] potatolicious|9 years ago|reply

> "Is it because they deal in more abstract entities where the name is irrelevant?"

Partially, but also because math equations don't really have strong maintainability needs. Another mathematician isn't going to walk in 6 months later, get confused, and blow up the universe.

Though I'd argue that in fact math equations should be better named. They're rather abstractly named as a matter of convention, but they would be more easily understandable in many cases if variables were more carefully named.

Another risk of "porting" geometry algorithms, especially complex ones, directly from their mathematical expressions, is that you don't gain insight into why they work. This makes debugging later difficult, since nobody who wrote the code actually understood what the algorithm was doing. Forcing yourself to rename variables into something sensible will also force you to understand the mechanics of what's happening.

[+] bhollan|9 years ago|reply

...So if a Triangle is composed of points "Steve, Alice, and Bob,"...

[+] santaclaus|9 years ago|reply

I find that as a piece of code skews more math- and geometric- centric, the variable names skew more towards math-like brevity as well. To some degree this is due to historical technical limitations (LAPACK and Fortran being one extreme -- dgbsv anyone?), but I see a ton of contemporary production code with one letter variable names. R is a rotation, n is a normal, p is momentum, whatever. I think a lot of the historical notation conventions carry over to code, and most peeps working in the domain day to day are down with whatever baggage is brought along. This is fine until you try to read francophone code... try reading a French a thesis and you'll find that they spur all conventions that the rest of the world has agreed upon :).

[+] NobleSir|9 years ago|reply

>Why is okay for mathemeticians to abbreviate things but programmers? Is it because they deal in more abstract entities where the name is irrelevant?

for example, xyz^2 in a piece of written math means something different than it would in a program (in the math case we are obviously not taking the variable "xyz" and squaring it). I guess what I am trying to say with this example is that perhaps variable names in math are one symbol because concatenation, in many cases, already means "multiply".

[+] guelo|9 years ago|reply

If the variables had been named timeOfTenPercentHeightOnTheFallingEdge and timeOfTenPercentHeightOnTheRisingEdge it probably would have still been hard to notice that they had been swapped in that one line.

[+] DSMan195276|9 years ago|reply

I think the catch is that the 'f' and 'r' keys are right next to each-other, so if you accidentally type one instead of the other then you'd get the other variable by mistake.

That said, you raise a fair point - we don't know how this bug got there. If it was a simple typo like above then the verbose names would have prevented it. If it was a logic error by the programmer (For whatever reason), then you're right that the name wouldn't matter because they typed the one they intended, it just wasn't the right one to use.

[+] niccaluim|9 years ago|reply

Yes, but the likelihood of mistyping them would be far less. 'f' and 'r' are right next to each other. 'Falling' and 'Rising' are a bit harder to unnoticeably fat-finger.

[+] saghm|9 years ago|reply

I definitely prefer more verbose names to abbreviated ones, but I'm not sure that never abbreviating a variable name is the right way to go either. Surely there's a middle ground between `Ttpfe` and `timeOfTenPercentHeightOnTheFallingEdge`?

[+] pshc|9 years ago|reply

tl;dr: use newtypes!

I like using static types to avoid these sorts of problems. Modern languages like Swift, Rust and Haskell let you make zero-overhead type wrappers around other types.

So here they could have defined `newtype RisingEdge(Float)` and `newtype FallingEdge(Float)`, and then use those types in the function parameters as appropriate.

Helps shorten function names to boot!

[+] sbov|9 years ago|reply

Use context. If you can't, refactor so you can. If you can't, well, you're SOL and stuck with timeOfTenPercentHeightOnTheFallingEdge.

Edit: e.g. if all of your variables are "timeOfTenPercentHeight...", cut that part out of the name. Full names can be just as bad as abbreviations - e.g. if they're all "timeOfTenPercentHeight..." then they all start to blend together.

[+] adrianratnapala|9 years ago|reply

One thing I often see elided in these naming wars is scope: am I the only person who gives longer names to globally visible things than to locals?

For example:

    size_t *find_foo(const char *source_text)
    {
        const char *src = source_text;
        for(; src != 'f'; src++)
           ... do stuff
        ... do more stuff

        return src - source_text;
    }

Now the meaning of "source_text" would not be evident, except for the name. But just glancing at the usage shows that "src" is clearly a working cursor into the source text.

But if I called it "working_cursor" would that really explain anything to the reader? If anything, giving a detailed name risks misleading readers in much the same way as stale comments can mislead.

[+] twblalock|9 years ago|reply

The problem was not that the variable was abbreviated. The problem was that the abbreviated variable was so similar to another abbreviated variable that was used for a similar purpose.

[+] DSMan195276|9 years ago|reply

I would add - there's a bit of a difference between an abbreviation that just keeps something from being ungodly verbose (20 letters instead of 50), and an abbreviation that shortens something so much that the original meaning is completely lost ("Ttpfe"). This is especially true for things in context - "time of ten percent height on the falling edge" is needlessly verbose, but in context "ten_percent_falling_edge" would probably be perfectly fine. And indeed, that's really just an expanded version of "Ttpfe", which is what was used anyway.

If you shorten everything to five-letter names, it's not really that surprising that it becomes an issue - And I mean, what's the real point of abbreviations when they are so short that it makes it harder to read the code?

Edit: It's worth pointing out though - if this is really old C code it may be justifiable. Back in the olden days of C (Older then C89 at least), only the first 8 characters of a symbol actually mattered. "blahblahone" and "blahblahtwo" would resolve to the same symbol. So shortening in this way could be necessary.

[+] bryanrasmussen|9 years ago|reply

sure but if the variable was not abbreviated the chance of collision would have been less.

[+] dgcoffman|9 years ago|reply

Yeah, the problem isn't that the airplane is on fire, the problem is that it crashed into another burning airplane as both tumbled uncontrolled through the sky, hurdling toward the earth.

[+] reactor|9 years ago|reply

then there are two problems.

[+] syphilis2|9 years ago|reply

I say this tongue in cheek: With current IDEs having such great autocompletion, has anyone experimented with coding far outside of the ASCII character set? Programming language restrictions aside, I recognize the obvious troubles this would cause. And yet I do a lot of work on simulations where math formulas are converted to code, which means lots of compounded Greek names like "omegaSquared" or "epsilonMinus". Naming decisions becomes more challenging as subscripts and superscripts are added, yet alone matrix indices. At some point perhaps the symbolic name should be replaced with the descriptive name, such as "first_eccentricity_flat_to_fourth". But it sure would be nice to have access to something with such brevity.

[+] new_hackers|9 years ago|reply

I see some new emojis on the horizon!

[+] buzzybee|9 years ago|reply

I self-tested variable name lengths on my own code.

Three letters is enough to avoid most collisions. Words do not make sense yet. At four letters most words become decipherable given an appropriate encoding. At five letters a two word phrase may make sense.

I make a rough decision based on variable scope - shorter lifetime means shorter variable name, but I rarely go with just one letter as it reduces uniqueness.

If I need to use a really long phrase frequently I take a mathematician's approach and alias it to an abstract and highly unique symbol. The phrase may still exist in the addressed data structure, I just avoid it within the algorithm. Mathy code also has a tendency to encourage numbered variables, e.g. "x0, x1, x2".

[+] rileymat2|9 years ago|reply

Did you self test on code from years ago or was it all fresh in your mind?

[+] tehchromic|9 years ago|reply

It's the quality of engaging posts like this one that draws me here on a Friday night.

[+] bhollan|9 years ago|reply

Why thank you, sir.

[+] foota|9 years ago|reply

Anyone else feel a bit uncomfortable reading the level of detail in this post?

[+] bhollan|9 years ago|reply

I tried to keep it as high-level as I could. What would you change?

[+] mmaunder|9 years ago|reply

Hope those systems are now declassified for his career's sake.

[+] bhollan|9 years ago|reply

Nothing in this post is even remotely classified.

[+] ams6110|9 years ago|reply

Heh. When I first learned to code as a kid, a variable was max two characters (the first of which had to be a letter A-Z and the other could be a letter or number).

[+] ericssmith|9 years ago|reply

I would suggest that the need for Englishy variable names is due to a weakness in programming languages and possibly the programming model itself. Why should a set of legitimate values for a computation benefit from how you refer to that set? Can that variable take on undesired values? Do you rely on that name and its comprehensibility to distinguish good from bad values? I sometimes find it hard to believe we still program this way.

[+] correnos|9 years ago|reply

We don't have to still program this way - you can write code with very strict types, with machine-checked proofs that it works correctly, etc, etc. We don't do this very often because it turns out this level of rigor is incredibly time-intensive.

[+] TheDong|9 years ago|reply

May I recommend checking out Rob Pike's document for a counter-point http://doc.cat-v.org/bell_labs/pikestyle

I think that his variable names section is utterly ridiculous, but it's a relevant read from a relatively prolific person, so worth sharing.

[+] Roboprog|9 years ago|reply

Thank you for that. That was a good little read.

While many of the Bell Labs guys didn't much like Pascal, Turbo Pascal managed to address almost all of the complaints I've ever seen, while preserving the good parts of Pascal (or Modula???).

Java must look irredeemable to the Bell Labs folks, though. Perhaps it is: UglyNames; limited structured constant literals; still too clunky lambdas for callbacks.

[+] sseagull|9 years ago|reply

This is common in the sciences as well, since senior professors also had the 8-character limit (from fortran [0]). And functions are also named this way as well (see lapack/blas).

Some people also hate typing out slightly longer variable names and what not. I try to emphasize that a section of code will tend to be read more times that it is written, and therefore readability is more important. It's a frustrating battle sometimes, though.

[0] Exacerbating the problem of using the wrong variable is the fact that much existing code uses implicit types...

[+] mchahn|9 years ago|reply

> The term was supposed to be “Ttpfe”, but he had mistakenly called it “Ttpre”,

The same cognitive mistake could have happened with it spelled out. Two concepts closely related can easily be confused.

[+] iblaine|9 years ago|reply

TL:DR Abbreviated variables are not always intuitive to others. I tend to agree. If you're going to use a pattern or abbreviation then be as non-creative as possible.

[+] hartror|9 years ago|reply

Is the issue that they were abbreviations or that they were named similarly?

[+] foota|9 years ago|reply

I have the... pleasure of working with a legacy oracle database where table names are restricted by convention to 8 names.

109 comments