top | item 38831991

Subject-First Commit Messages

151 points| s4i | 2 years ago |github.com | reply

168 comments

order
[+] hn_throwaway_99|2 years ago|reply
TBH, at least at every company I've worked at recently, the style of the commits is more or less irrelevant to me save for one very important piece: The commit starts with the ID of the ticket for the code change (or some placeholder for very small commits where a ticket wasn't necessary, e.g. [no-tix]). The biggest benefit of this is that these commits can then be automatically linked to the ticket which often has a lot more detail and conversation.

99% of the time when I'm looking at commit history, I'm not just generically scanning. I'm usually looking for a particular change, or when something touched a particular file, etc. Even the example given in the linked page isn't really relevant to me, because I'm rarely "quickly scanning" a whole commit log for something in particular - I'd always just use a search feature (in GitHub or some git IDE plugin) for that.

[+] kelnos|2 years ago|reply
To me, the ticket id is extra info and goes at the end of the message body, not in the title at all.

I've seen far too many people think that putting the ticket id into the commit message can be a replacement for writing a good commit message. I've even seen people think that "Fix #123" is all they need to put in there.

The commit history should be the primary source of truth for what changes have happened.

I've also been at orgs where they've switched from one ticket tracking software to another, and haven't maintained history, or even when they do, the ticket ids and up changing afterward.

[+] lucb1e|2 years ago|reply
> 99% of the time when I'm looking at commit history, I'm not just generically scanning. I'm usually looking for a particular change, or when something touched a particular file,

Exactly this! In our company's commit messages, I start with the component or file I've touched, such as 'Config: update year in example value' or 'make check: reduce false positives for erroneous backslash detector' (context: we clone the repository for short-lived projects which result in a document, so having an up-to-date year value is handy and we check the PDF build for common issues)

Not only does it help others find when a particular change was made, it also gives context when they're scratching their head "what do you mean updated year value, where do we hardcode year values? ... Oooh, in the config file where we keep example values, right that makes sense!"

This also incorporates what u/twodave said in a sibling thread (https://news.ycombinator.com/item?id=38832441):

> when I'm looking at blame/history I'm usually very aware of what the code is doing already. My question at those times is less about "what" and more about "why".

By saying the change intends to reduce false positives, the regex change makes a lot more sense. Commit messages are often in the style of "improved backslash detector". You might wonder why the old version was broken, or how it is an improvement if it detects fewer instances of the problem. By knowing the why (false positive issues), things click into place.

[+] hmeh|2 years ago|reply
Author here. We have actually managed to use the same task tracking tool for the three years our project has been running: Linear. We don't do pull requests, and we don't put our work item number in our commit messages.

I can count on zero fingers the number of times I have ever wished I had this.

We do, however, keep daily work logs (each person submits them) so we have a log of everything that everyone worked on. We have Slack history, which is sometimes referred to.

I'm curious what folks find this useful for. What are the scenarios where one has to refer back to an issue from a commit?

By the way, I also have the experience of inheriting a code base where the commit messages were mostly (or just) issue numbers. Guess what I didn't inherit? The issue tracker. The commit messages were useless to me.

[+] imron|2 years ago|reply
> The commit starts with the ID of the ticket for the code change (or some placeholder for very small commits where a ticket wasn't necessary, e.g. [no-tix]).

Taking this a bit further you can make sure your branch names have the ticket in them, which you then extract in a git hook (git-prepare-msg) and put as a trailer at the end of the message using ‘git interpret-trailers’

This reduces clutter when looking at a list of messages (eg git log —-oneline) and makes it easier to script things based on the ticket numbers (other git commands can pull out the trailers)

[+] formerly_proven|2 years ago|reply
This tends to become useless after a few years because of tool or software migrations, changes in JIRA/whatever structure, ticket retention policies etc.
[+] avgcorrection|2 years ago|reply
I agree that the ticket id should go somewhere in the commit message. Certainly if things like pull requests are tied to tickets. However I get a little concerned that such sentiments are so popular here. Because the ticket id is a nice contextual link. But the commit message should be self-contained with regards to the change.[1]

It’s a version control system. That includes an explanation as well. The explanation for each revision is integral.

And I don’t think people here really disagree with that. Just that they might prefer to put all that integral information somewhere else—the information (princess) is in another castle, Mario. Why? Why not at least just copy-paste the relevant information into the version control system proper? I see this all the time; people often do the same with GitHub pull requests—paragraphs upon paragraphs in the pull request description and then just a “Merge <convoluted URL>” in the merge commit message!

It’s not that I blame people for this. Maybe Git is just so unergonomic that many would prefer to offload everything into another interface (even Jira).

[1] My tickets will usually have all the history. Maybe a lot of back and forth. The commit message though will just have what was done when all was said and done—no back and forth, just the result. And some of my tickets probably don’t even have a description body since what I ended up doing is just described in the commit message.

[+] dev_tty01|2 years ago|reply
I prefer to go the other way. Put the commit ID(s) and/or links in the ticketing system as part of ticket comments. Now if the ticket system changes or is exported to another system the commit ID(s) will still be in the record. They are much less likely to change plus you can have multiple commits for an issue.
[+] geraldwhen|2 years ago|reply
Wait until corporate drops jira, or migrates prem to cloud or vice versa with data loss, or revokes your access to an old project id the app was run on.

Git commit info needs to live in git. Ticketing systems will always die. The source remains.

[+] twodave|2 years ago|reply
I look at commit messages every day. I actually don't love a message that just describes the code change (I can figure that out on my own if I look at it long enough). That's fine for bigger or more misleading changes, but when I'm looking at blame/history I'm usually very aware of what the code is doing already. My question at those times is less about "what" and more about "why". Why was this change made? Sometimes that can be found by looking at accompanying tests (if the original author was so gracious), but if it's not in the commit (or only obtusely so, as many developers I know tend to commit in what I'll call "stream-of-consciousness" style) then I've got to go outside the codebase to research by interviewing people (assuming they're still around and remember) or trawling through our project management system to try and line up released features/fixes with the change. It quickly becomes an inexact science.

That said, in my experience, the larger an organization, the more commit messages get looked at. On my side projects where only 1-3 people are involved, commit messages only really show up on the PRs as things are going in. Sometimes we complete our own PRs to keep up the velocity since we're all working async part-time. Rarely if ever do we look back to figure out why we did something because the project is so small.

[+] masklinn|2 years ago|reply
I agree with your first paragraph, but I don't think it's super relevant to the article: the why of a commit usually takes some time to explain, and the subject line is nowhere near sufficient, so that usually goes into the first paragraphs. For non-trivial commits I try to follow the "why, what, how" template: first explain the issue / concern / background for the change, then clarify the change if the diff is not super obvious or is a bit large, and finally explain the implementation changes if there were options.

In my experience the former and latter are the most important when blame-diving, I want to know the context / driving force / use case for a change, but I also want to know the scope of consideration, because e.g. sometimes I'm thinking of a fix which was considered and rejected for still-valid reasons, while at other times it was just considered unnecessary back then (YAGNI) or the blockers have since been removed.

Either way the subject needs to be snappy, so that it can quickly be selected for consideration or not e.g. if I'm blame diving and I see a commit which just does indenting / formatting, I probably can skip over it without looking at the actual change.

[+] BoxFour|2 years ago|reply
> My question at those times is less about "what" and more about "why". Why was this change made?

Isn't the essence of the commit message body to delve into the 'why'?

The ability to swiftly glean the 'what' from the commit subject line aids in identifying the appropriate commit for review. The commit message body can then have a detailed explanation of the reasons behind the change.

[+] o11c|2 years ago|reply
> when I'm looking at blame/history

Those are two very different things when it comes to commit messages.

For history, the linked argument is arguing that subject-first is better. Personally I'd prefer `[component] verb details` though, which should fix the skippability issues mentioned elsewhere. And of course you can enable diffstat to get a bit of an idea of the structure.

For blame, you have the full file context in front of you. Some blame frontends only show author, but assuming you have the commit message, what you really want out of it is a rough idea of the historical significance of that line. So definitely verbs first.

[+] Hamuko|2 years ago|reply
>I actually don't love a message that just describes the code change (I can figure that out on my own if I look at it long enough).

Isn't the point of the commit message title precisely that you don't have to look at it long enough?

[+] r3trohack3r|2 years ago|reply
I personally prefer conventional commits.

https://www.conventionalcommits.org/en/v1.0.0/

The subject line says what the purpose of the change is (CI/CD, tests, bug fix, new feauture), what component it primarily focuses on, and then a plain text one line summary.

For example:

    ci(GitLab): Add rust build directory to cache
    
    This should improve build times by caching most of the compile tree between builds. In testing, I’ve observed builds drop from 2m to 30s when scheduled on a node with a primed cache.
It also forces me to keep my commits focused. If I can’t create a one line summary of what I’ve done, it probably needs to be broken up into multiple commits.
[+] ericrallen|2 years ago|reply
Conventional commits are great, especially if you add in commit linting.

Being able to programmatically increment semantic versions and automatically generate relevant changelogs is awesome.

It’s also nice to implement Commitizen[0] for a little hand holding until folks get used to the linting.

I used to care a lot about doing things the way that felt right to me, but now I just want some common standard that is easy for everyone to follow, easy to automate, and easy to verify programmatically.

Things like conventional commits and semantic versioning aren’t perfect, but they are quite good and apply broadly to many use cases with common tooling and conventions.

--

[0]: http://commitizen.github.io/cz-cli/

[+] diarrhea|2 years ago|reply
But they’re not mutually exclusive. The two concepts play well together!
[+] rzzzt|2 years ago|reply
I find the word choice for types "feat" and "chore" infuriating for some reason. Feat?!
[+] rco8786|2 years ago|reply
I think I would prefer this style.

And also the discourse around commit messages and their various flavors may be one of the biggest wastes of time in terms of ROI in our industry.

I’ve been doing this for 14-15 years and can probably count on two hands the number of times I’ve needed to look at historical commit messages.

[+] safety1st|2 years ago|reply
What's tremendously useful for us to have in the commit message, is the ticket ID the work is related to in our issue management system, to the point where it is practically mandatory.

I see it as a matter of scoping communications. The issue management system includes a broader set of people than the git repo. You want the commit linked to the issue so that you can see the entire history behind the commit, including the business decisions, the designer making a call, the back and forth with QA etc. That can all be useful a year or two down the road if you want to understand why something was done a certain way.

Now there is some info about the change that perhaps only the devs would ever care about, and probably that can sit in the commit messages. That sort of happens naturally but there's not really a lot of it for us. But that seems to me to be the relevant thing from an organizational perspective, no one writes commit messages except for devs, and almost nobody except the devs reads them.

[+] azornathogron|2 years ago|reply
I look at past commits probably every week or two to track down why something is the way it currently is.

Having said that, I care very little about style and grammar of commit messages. Ideally I want them to say why the change was made, what the intended effect was, and where I can look for related work (bugs that tracked the work and have related commits or investigation attached to them, docs, whatever)

Hit rate on the information I'd like is not great, but if I get at least one of the three things then it's something I can work with.

[+] bob1029|2 years ago|reply
Blame & diffs are much more useful to me in the context of GitHub. I don't care about someone's subjective ~50 char description of a change. The important part is the pull request itself and that it references whatever issue prompted the change.

Commit messages have only ever been useful in tracking my own intermediate work products. Often times, I will leave helpful bread crumbs in my commit notes if I know I won't be working on something for a few days. We strictly do squash+merge, so I don't have to worry about these things causing trouble for others. All of our commits into master have some standardized "Issue #1234" note as automatically copied from the PR title (which we do have ~standardized).

If we didnt have some git wrapper like GitHub available, then I suspect we'd be significantly more aggressive with policy around our commit messages.

[+] naavis|2 years ago|reply
I don't know, I do it almost daily. So your mileage may vary.
[+] tsimionescu|2 years ago|reply
I personally look more often at past commits, but I've still never really had a need to scan a list of changes like this. I really don't see the point of the advantage they're claiming, even if it's real.

What I normally do is, while investigating a tricky bug and finding some lines that don't make sense, look through history to see why those lines were introduced (to see if they fix something else), and perhaps when (to see which versions may be affected by a hard-to-reproduce bug). But that typically pinpoint a commit already, and I just need the commit message to explain why it was there.

[+] marginalia_nu|2 years ago|reply
It does sort of feel a bit cargo culty sometimes. I expect commit messages are situationally important, as a function of the size and maturity of the project, the size of the development team (esp. the amount of developer churn).

That said, there's probably some fringe benefit to describing what you do in some fashion. It's a great way of making sure you understand what's happening yourself.

[+] lucb1e|2 years ago|reply
<10 instances of looking at the commit history in >10 years is exceptionally little. I think you may have a different use-case if your commits are write-only and crafting commit message texts is probably a waste of time at that rate

Which is a very interesting take, for what it's worth. Might be worth a blog post what circumstances lead to this working well for your organization!

[+] gherkinnn|2 years ago|reply
The subject-first style felt grating at first, but comparing the two I found myself reading more per line than I did in the verb-first approach.

Fix skip fix skip fix skip add skip

I knew nothing at the end. The second style had me grasp more, as there were no obvious hooks to skip without knowing what had changed.

But as others have said, commit messages are a cesspool of bikeshedding and ultimately useless, unless every commit is a self-contained chunk of work which it rarely is.

On the side, I enjoyed all of this author's thoughts in the repo. Good find.

[+] hmeh|2 years ago|reply
Author here. Thank you! I agree that there can be plenty of bike shedding about them, which is why we have an approach that is based on human psychology and can be backed up in that way. We don't often put too much more in the content of a commit message -- the diff, plus the repo it's in (we have many repos) tell you most what what you need to know.

If you want more context, every team member has a daily work log that is available to the entire team, so you can see more about what they worked on and why in that work log.

[+] minimaul|2 years ago|reply
I like a style like:

codebase section: short summary

longer description if necessary

eg: cocoaui: fix images being incorrectly aligned on high dpi displays

We were calculating the position of the image in logical pixels, but not converting that to actual display pixels for rendering.

I think it's a really nice style that makes commit messages really easy to scan in a short log, and lets you ignore commits easily that don't touch the area you're interested in

[+] andai|2 years ago|reply
Spent 5 minutes trying to figure out where field_name is in the second list... I thought it was the same commits rewritten to be more scannable, but they are from a different project.

(I scanned, naturally, the article itself!)

Or more generally, I thought the point of the article would be that the function / variable names from the first list would be moved to the beginning of sentences. But in the 2nd example they are completely absent, favoring much vaguer / high level descriptions. So we are unable to see the before/after effect for the same list of commits, I think most of the potential impact of the author's intended point is lost.

(Also, the first half extolls the virtues of being able to read quickly, while the second half tells you to use longer sentences for everything for apparently no reason?)

[+] henning|2 years ago|reply
Like everything in software development, this is just personal preference without any actual evidence, a substitute for actually making software with quantitative quality metrics that users actually care about. Let's fiddle with our commit messages, PR formats, code formatting, etc. etc. etc. instead of addressing the fact that our website has 6 different fonts and makes 30 HTTP requests to show a few hundred bytes of text.
[+] idlephysicist|2 years ago|reply
I totally agree, I have seen people get so fixated on this in addition to:

- review feedback format

- variable naming

- keeping line lengths to 80 lines

- log message formats (sweet jesus the amount of time wasted debating that alone)

Whenever I see these conventions about commit formatting they generally seem to focus on the first line (notable exceptions being Git and the Linux kernel among others I assume), one line is rarely enough to describe the change – I’m not advocating for writing an essay for a 5 line change though in some cases it might be warranted.

Professionally I rarely see useful commit messages, by useful I mean something that could be read by someone without context and get a general understanding of why a change was made. Frequently I have seen “updates” / “wip” etc. in the master branch. More frequently I see that the quality of commit message decreases with an engineer’s seniority – though like every thing there are exceptions.

[+] hmeh|2 years ago|reply
Author here. You're well within your rights to believe it is personal preference, but it's somewhat hard to refute the fact that it's easier to scan lists when you put the most significant (i.e., the part that varies the most) first. Imagine if every item on Amazon's list of products was of the format: "Product for purchase, Headphone, Sennheiser Momentum 4"
[+] knorker|2 years ago|reply
This seems super subjective, while being written as if it's not.

I find this way WAY harder to read/scan, and very much prefer Google's standard: https://google.github.io/eng-practices/review/developer/cl-d...

[+] hmeh|2 years ago|reply
Author here. The psychology portion isn't subjective -- things are easier to scan if the more important part is first. One would have a hard time refuting that. Also, there is some familiarity bias that comes into play when reading a commit message style that is foreign, so the experiment I offer may not be super compelling. We are used to what we are used to, and that takes some time to overcome. I know it took me a while to overcome it both in writing, scanning, and reading them. It was worth it for me and our team.
[+] Salgat|2 years ago|reply
In my experience consistency is more important than any specific style of commit messages. Similar to style guides for code; there's plenty of ways to approach it, but the real value is the consistency in how things are written across all developers.
[+] vitus|2 years ago|reply
> The second style likely feels foreign, and possibly uncomfortable. It's passive voice and present tense — all the things that we aren't supposed to make our commit messages.

Nitpick: it's passive voice and indicative mood (and sometimes subjunctive) -- there's nothing wrong with present tense. "Fix typo" is active voice, present tense, and imperative mood. (The Rails commits that dip into past tense bother me slightly, but whatever.)

Broadly, I have a few opinions on commit messages. The style doesn't really matter as much to me, although I'm a relatively strong adherent to "one-line summary, followed by paragraph(s) of additional context" (as is standard in the Linux kernel, and supposed best practice at Google even if it isn't universal by a long shot).

One is that the commit message should be useful for anchoring a search for potentially relevant changes, and for providing broader context re: why the change was made.

At the same time, I waffle between putting more description in the commit message, versus just commenting the code (or making the code clearer).

The last is more pragmatic: when I'm searching for a specific change, I'm often looking at the history of a particular file (blame or otherwise). I can quickly filter out all the "fix typo" messages or "[LSC]" (large-scale change, term used at Google for various company-wide code health refactors). Or if I'm trying to figure out which change introduced a bug, I'll probably bisect it one way or another (I often try to short-circuit that bisection toward the end to save the last few iterations). Either way, I don't actually spend that much time reading through commit messages until I've identified a potentially problematic change.

[+] hmeh|2 years ago|reply
Author here. Thank you, updated.

We don't tend to put much in the commit messages themselves. The code often speaks for itself (and if it doesn't, it isn't likely that the author would have had much more to say about it at the time, as our norm is to comment "why" when necessary). We also have work logs we can refer to that sometimes have the why, or at least some of the context for the work we are better trying to understand.

[+] egnehots|2 years ago|reply
there are pros and cons to both verb-first and subject-first commit styles.

verb-first commits can be organised in a few categories. since they are few commit verbs (fix, add, refact..), always use the same verb prefix for bug, features messages and you will have nice categories to scan/filter:

- add feature

- fix bug

- refactor logic

On the other hand, subject-first indeed put the important part first and let you search for a term:

- Instance configure template method called from constructor

- Store's project method is an alias for fetch

- Title is changed

Depending on the use case, are you often searching for something? or would you like to highligh the nb of bugs/features in the last release?, one style is better suited than the other.

For important changes, I like the linux kernel style: oneline summary, details (problem, impact, solution..)

https://docs.kernel.org/process/submitting-patches.html

[+] TacticalCoder|2 years ago|reply
I like to be able to read the commit line following the sentence: "This commit shall ...".

For example from TFA:

"Fix code example in the field_name method"

gives:

"This commit shall fix code example in the field_name method".

OK, cool, I may or may not merge it, pull it, whatever-it. But I know what it'll do should I use it.

Now from what TFA recommends:

"Default reader batch size is 1000"

Means nothing. Tells nothing. Is 1000 good or bad? No clue. Is it causing a bug? A performance issue? Was it 1000 before the commit or after applying it? Zero information.

I'll pass and keep using the "best practice".

Using the best practices the commit would read:

"Change default reader batch size to 500" or "Change default reader batch size to 1000" or maybe "Add default reader batch size" (btw the commit line in TFA is so bad that I'm not sure at all it's 500 or 1000 or something else I should put here to make my point).

The "best practices" aren't a bias. They're the result of people thinking long and hard as to how to make commit lines as clear as possible.

I look at how multi-million lines codebase like Linux or Emacs are doing it and use that as the authority. If it works well enough for these projects, it works well enough for smaller projects.

[+] IshKebab|2 years ago|reply
I prefer the first style. It reads way less weirdly. I spend way more time reading commit messages than "manually searching" by scanning them. I either arrive at the commit from git blame, or I use this really cool feature of computers called "ctrl-F".
[+] graypegg|2 years ago|reply
Past commit messages have 2 uses to me.

- Blame view, where I usually want to know the ticket number so I can track down a story which has the business rules the dev was implementing.

- Bisect, where I want to know if this works or not without having to do a test run every time.

This format is alright I guess, but the best thing you can do honestly is just

TICKET: [done/works but wip/does not work] [describe what you’re thinking]

AB-1234: does not work, trying to rewrite this controller to remove all the duplication, /api removed temporarily

To me, that doesn’t have to be set in stone. No one likes working with a commit message stickler when standardized formats are only useful ~20 times a year to save 10 minutes each. (In my experience)

[+] ikari_pl|2 years ago|reply
Error messages. Get to the meat of it in the first 3 words, there's a high chance that's the only part that will get read, or even displayed. It must be the most meaningful part of the error message.
[+] cmgriffing|2 years ago|reply
I find commit messages are more useful when we consider machines to be primary the consumers of them.

Using conventional commit style messages allow us to generate changelogs and modify our semver versions automatically. Generating changelogs by hand is extremely tedious. Modifying semver by hand leads to caring too much about the number.

Like others have pointed out, the context and "why" can easily be tracked by linking out to the ticket/task that the work is associated with.

[+] alberth|2 years ago|reply
I go the other extreme and do below as commit messages.

https://keepachangelog.com

Makes creating change logs super easy & clear.

Which is prefacing each commit with either:

  ADDED for new features.
  CHANGED for changes in existing functionality.
  DEPRECATED for soon-to-be removed features.
  REMOVED for now removed features.
  FIXED for any bug fixes.
  SECURITY in case of vulnerabilities.