top | item 45163043

Formatting code should be unnecessary

354 points| MaxLeiter | 6 months ago |maxleiter.com | reply

476 comments

order
[+] automatoney|6 months ago|reply
I've never understood why people care so much about the linter settings. It's so obviously bikeshedding, just make a choice, run the linter automatically and be done with it. I'm too busy doing actual software engineering to care about where exactly everything goes - I promise after a week you'll just get used to whatever format your team lands on.
[+] AdieuToLogic|6 months ago|reply
> I've never understood why people care so much about the linter settings.

Source code formatting programs are not the same as lint[0] programs. The former rewrites source code files such that the output is conformant with a set of layout rules without altering existing logic. The latter is a category of idempotent source code analysis programs typically used to identify potential implementation errors within otherwise valid constructs.

Some language tools support both formatting and source code analysis, but this is an implementation detail.

0 - https://en.wikipedia.org/wiki/Lint_(software)

[+] vidarh|6 months ago|reply
Because I spent the vast majority of the time I spent on code reading it, and the layout matters to me in terms of how much time it takes for me to read code.

Yes, I can get used to other layouts, but that by no means means all layouts are equal to me in terms of how readable they are, and how well things stand out when they should, or blend in when they should.

I recognise this isn't the case for everyone - some people read code beginning to end and it doesn't matter how its laid out. But I pattern match visually, and read fragments based on layout, and I remember code based on visual patterns.

Ironically, because I have aphantasia, and don't visualise things with my "minds eye", but I still remember things by visual appearance and spatial cues better than by text.

[+] socalgal2|6 months ago|reply
some settings have advantages. For example, trailing commas on tables

    [
      'apple',
      'banana',
      'orange',
    ]
has an advantage over

    [
      'apple',
      'banana',
      'orange'
    ]
Because adding a new line at the end of the table (1) requires editing 1 line, instead of 2 (2) makes the diffs in code review smaller and easier to read and review. So a bad choice makes my life harder. The same applies to local variable declarations.

Sorted lists (or sorted includes) is also something that makes my life easier. If they're not sorted then everyone adds their new things to the end, which means there are many times more merge conflicts. sorted doesn't mean there are zero but does mean there are less than "append to the end". So, just like an auto-formatter is there to save time, don't waste my time by not sorting where possible.

Also, my OCD hates inconsistency. So

    [1, 2, 3]
    {a, b, c}
Is ok and

    [ 1, 2, 3 ]
    [ a, b, c ]
Is ok but

    [1, 2, 3]
    { a, b, c }
Is not. I don't care which but pick ONE style, not two styles!
[+] psychoslave|6 months ago|reply
I don't care that much about the specific retained options (though my own gusts of the day are obviously the best taste ever in the whole existence of universe) but having a common linter setting to prevent the noise in every damn PR is a must have.

Yes both git and all these PL are actually damn stupid to take lines at face value instead of something more elegant like Ada does. In my 20+ year career I've been proposed only once a project that involved Ada.

It's hard to come with something elegant and efficient. It's even harder to make it reach top tiers global presence, all the more when the ecological niche is already filled with good enough stuff.

[+] jupp0r|6 months ago|reply
I generally agree, but max line length being so high you have to horizontally scroll while reading code is very detrimental to productivity.
[+] smokel|6 months ago|reply
I've never understood why we still look at the plain text representation of code, and not a visualization of the code that makes more sense.

Note that, in my mind, this visualization is not automatically generated, but lovingly created by humans who wish their code to be understood by others. It is not separate from the code, as typical design documentation is, but an integral part of it, stored in metadata. Consider it an extension of variable and function naming.

There is of course "literate programming" [1], but somehow (improvements of) that never took off in larger systems.

[1] https://en.wikipedia.org/wiki/Literate_programming

[+] kolme|6 months ago|reply
I did that when I was young and naive. I'll tell you why I did it.

I thought I was very smart. Like, really really smart, maybe the smartest programmer in the team.

And as such my opinion was very important. Maybe the most important opinion in the team. Everyone had to listen to it!

That is all. Also, I was wrong.

[+] schneems|6 months ago|reply
I learned to love rustfmt but there’s one thing that bothers me: There’s a few times where there are two ways to do something like a one line closure can omit the curly brackets, but multi line closures cannot. Rustfmt prefers to remove those brackets when it can, but I prefer to keep them, which makes editing the code faster since I don’t have a syntax error if I suddenly need a second line.

I can still live with it. And I like the clean, minimal version when I don’t have to edit. Just adding that “style” can have impact beyond how it looks involving ease of editing. And it stinks when your preferences clash with the community.

[+] torginus|6 months ago|reply
The problem is that tools like ESlint often come with highly opinionated rules that might not even be applicable all of the time (leading to me having to manually turn them off via annotations)

And there's no centralized idea on best practices.

[+] rs186|6 months ago|reply
That is true if a set of good linting rules are set up, those that help discover errors or other code smells which are valid issues in 99% of cases, or pure formatting rules when there is no "correct" thing to do. Linting becomes a problem when it is opinionated and has questionable rationale to begin with, and stands in your way instead of help you catch issues. Nobody should be fighting linting rules, but sadly that's what often happens.

See my other comment: https://news.ycombinator.com/item?id=45166670

[+] forrestthewoods|6 months ago|reply
I’ll go a step further.

I’ve never understood why people care so much about the linter. Just let people write code and don’t worry about the linter. I don’t need to fight a linter which makes my code worse when I could just write it in a way that doesn’t suck. I promise it’ll be fine. I’m too busy doing actual software engineering to care if code is not perfectly formatted to some arbitrary style specification.

I feel like style lingers are horseshoe theory. Use them enough and eventually you wrap back around to just living without them.

[+] Cthulhu_|6 months ago|reply
But (at least for a long time), "run the linter automatically" wasn't available, not until Go's gofmt put the idea into people's heads that they could leave it to a tool. I think there were some formatting tools before then, but e.g. jslint/eslint had a lot of gaps which I unfortunately ended up pointing out in code reviews a lot. Which was nitpicking / bikeshedding, in hindsight.
[+] kelseyfrog|6 months ago|reply
The tradeoff here is not being able to use a universal set of tooling to interact with source files. Anything but text makes grep, diff, sed, and version control less effective. You end up locked into specialized tools, formats, or IDE extensions, while the Unix philosophy thrives on composability with plain text.

There's a scissor that cuts through the formatting debate: If initial space width was configurable in their editor of choice, would those who prefer tabs have any other arguments?

[+] gr__or|6 months ago|reply
Text surely is a hill, but I believe it's a local one, we got stuck on due to our short-sighted inability to go into a valley for a few miles until we find the (projectional) mountain.

All of your examples work better for code with structural knowledge:

- grep: symbol search (I use it about 100x as often as a text grep) or https://github.com/ast-grep/ast-grep

- diff: https://semanticdiff.com (and others), i.e.: hide noisy syntax only changes, attempt to capture moved code. I say attempt, because with projectional programming we could have a more expressive notion of code being moved

- sed: https://npmjs.com/package/@codemod/cli

- version control: I'd look towards languages like Unison to see what funky things we could do here, especially for libraries. A general example: no conflicts due to non-semantic changes (re-orderings, irrelevant whitespaces, etc.)

[+] jsharpe|6 months ago|reply
Exactly. This idea comes up time and time again, but the cost/benefit just doesn't make sense at all. You're adding an unbelievable amount of complex tooling just to avoid running a simple formatter.

The goal of having every developer viewing the code with their own preferences just isn't that important. On every team I've been on, we just use a standard style guide, enforced by formatter, and while not everyone agrees with every rule, it just doesn't matter. You get used to it.

Arguing and obsessing about code formatting is simply useless bikeshedding.

[+] accelbred|6 months ago|reply
What if the common intermediate encoding is text, not binary? Then grep/diff/sed all still work.

If we had a formatting tool that operated solely on AST, checked in code could be in a canonical form for a given AST. Editors could then parse the AST and display the source with a different formatting of the users choice, and convert to canonical form when writing the file to disk.

[+] rendaw|6 months ago|reply
Grep, diff, sed, and line-based non-semantic merge are all terrible tools for manipulating code... rather than dig ourselves in either further with those maybe a reason to come up with something better would be good.
[+] Avshalom|6 months ago|reply
The entire OS was built around these source files.

the unix philosophy on the other hand only "thrives" if every other tool is designed around (and contains code to parse) "plain text"

[+] MyOutfitIsVague|6 months ago|reply
The way I envision this working is with something like git filters. Checking out from version control converts it all into text in your preferred formatting, which you then work with as expected. Staging it converts it into the stored representation. In git, this would be done with smudge and clean filters, like how git LFS works. You'd also have viewers for forges and the like that are built to interpret all the stored representations as needed.

You still work with text, the text just isn't the canonical stored representation. You get diffs to resolve only when structure is changed.

You get most of the same benefit with a pre-commit linter hook, though.

[+] danielheath|6 months ago|reply
If you’re going to store the source in a canonical format and unpack that to suit each developer… why should the canonical format just be regular source code?

All the same tools can exist with a text backend, and you get grep/sed support for free too!

[+] eviks|6 months ago|reply
> If initial space width was configurable in their editor of choice, would those who prefer tabs have any other arguments?

Yes, of course, because tab width is * dynamically* flexible, so initial space width isn't enough

[+] charcircuit|6 months ago|reply
In practice how many tools do you really need to handle the custom format? Probably single digits and they could all use a common library to handle the formatting aspect of things.
[+] aleph_minus_one|6 months ago|reply
> Anything but text makes grep, diff, sed, and version control less effective.

Perhaps this is rather a design mistake in how UNIX handles things and is so focused on text.

[+] bee_rider|6 months ago|reply
Is it possible converted from the DIANA ir back to something that looks like source code? Then the result of the conversion backward could be grepped, etc…
[+] cowsandmilk|6 months ago|reply
How is diff less effective? I see the diff in the formatting I prefer? With sed, I can project the source into a formatting most convenient for what I’m trying to do with sed. And I have no idea what you’re on about version control. It ruins sending patch files that require a line number around, but most places don’t do that any more.

What I would be curious on is tracing from errors back to the source code. Nearly every language I’ve used prints line number and offset on the line for the error. How that worked in the Diana world would be interesting to learn.

[+] Ygg2|6 months ago|reply
> would those who prefer tabs have any other arguments?

Yes. Because Yaml exists. And mixing tabs and spaces is horrible in it. And the rules are very finnicky.

Optimal tab usage is emit 2-4 spaces.

[+] froh|6 months ago|reply
yes, contemporary editors and tools like treesitter have decided this debate in favor of plain text file representation, exactly for the reasons you give: universal accessibility by general purpose tools.

xslt was a Diana like pre-parsed representation of dsssl. oh how I miss dsssl (a scheme based sgml transformation language) but no. dsssl was a lisp! with hygienic macros! "ikes" they went and invented XSLT.

the "logic" escapes me to this day.

no. plain text it is. human readable. and grep/sed/diff able.

[+] davetron5000|6 months ago|reply
There’s also a typography element to formatting source code. The notion that all code formatting is mere personal preference isn’t true. Formatting code a certain way can help to communicate meaning and structure. This is lost when the minimal tokens are serialized and re-constituted using an automated tool.

https://naildrivin5.com/blog/2013/05/17/source-code-typograp...

[+] chowells|6 months ago|reply
I have to disagree with the premise. Formatting code is a critical communication channel. Well-formatted code should tell you:

1. The developer has enough experience to understand that formatting matters.

2. The developer has enough discipline to stick with their chosen formatting rules.

3. The developer has the taste necessary to choose good formatting rules.

4. The developer has the judgement necessary to identify when other concerns justify one-off violations of the rules.

These are really important attributes for a developer to have. They affect every aspect of the code, not just formatting. Formatting is just a very quick proxy to measure those by.

Unfortunately, things like autoformatting and linter rules are destroying the signal. Goodheart's law strikes again.

[+] aleph_minus_one|6 months ago|reply
Some (sometimes) desirable source code formatting cannot be deduced from the abstract syntax tree alone:

Consider the following (pseudo-)code example:

  bar.glob = 1;
  bar.plu.a1 = 21;
  bar.plu.coza = fol;
Should this code formatted this way? Or should it be formatted

  bar.glob     = 1;
  bar.plu.a1   = 21;
  bar.plu.coza = fol;
to emphasize that three assignments are done?

Or should this code be formatted

  bar.glob      = 1;
  bar.plu .a1   = 21;
  bar.plu .coza = fol;
to bring make the "depth" of the structure variables more tabular so that you can immediately see by the tabular shape which "depth" a member variable has?

We can go even further like

  bar.glob     =   1;
  bar.plu.a1   =  21;
  bar.plu.coza = fol;
which emphasizes that the author considers it to be very important that the reader can easily grasp the magnitudes of the numbers involved (which is why in Excel or LibreOffice Calc, numbers are right-aligned by default). Or combining this with making the depth "tabular":

  bar.glob      =   1;
  bar.plu .a1   =  21;
  bar.plu .coza = fol;
Each of these formattings emphasizes different aspects of the code that the author wants to emphasize. This information cannot be deduced from some abstract syntax tree alone. Rather, this needs additional information by the programmer in which sense the structure behind the code intended by the programmer is to be "interpreted".
[+] jillesvangurp|6 months ago|reply
There was a movement towards working with syntax trees directly and treating source code as a generated serialization of those syntax trees about 20-25 years ago. This probably started with refactoring as it was pioneered in the nineties. Things like Visual Age actually stored code in a database instead of on the file system. Later intentional programming (Charles Simonyi was pushing that) tried to also do things with this. And of course model driven development was a thing around the same time.

Refactorings (when done right) are syntax tree transformations that preserve things like referential integrity, etc. that ensure code does the same thing before and after applying a refactoring.

A rename becomes trivial if you are simply working on the symbol directly. For that to work with file based source trees, you need to parse the whole thing, keep track of where symbols are referred in files, rename the symbol and then update all the places in the source tree. That stuff becomes a lot easier when the code representation isn't a bunch of files but the syntax tree. The symbol just gets a different name. Anything that uses the symbol will still use the same symbol.

People like editing files of course and that has resulted in a lot of friction developing richer tools that don't store text but something that preserves more structure. The fact that we're still going on about formatting issues a quarter century later maybe shows that this is something to revisit. For many languages and editors, robust symbol renames are still somewhat science fiction. And that's just the most basic refactoring.

[+] crq-yml|6 months ago|reply
I think the problem can be defined equally as: we can't invest in something more abstract than "plain text" at this time. When we try, it gets downgraded to a plain text projection of the syntax.

The plain text encoding itself exists in a process of incremental, path-dependent development from Morse Code signals to Unicode resulting in a "Gigantic Lookup Table" (GLUT, my coining) approach to symbolic comprehension. The assumption is useful - lots of features can "just work" by knowing that a particular bit pattern is always a particular symbol.

If we push up the abstraction level, we get a different set of symbols that are better suited to the app, but not equivalent GLUT tooling. Instead we usually get parsing of plain text as a transport. For example, CSV parsing. It is sloppy; it is also good enough.

Edit: XML is also a key example. It goes out of its way to respect the text transport approach. There are dedicated XML editors. But people want to edit it as plain text and they can't quite get there because funny-business with character encodings gets in the way, adding a bunch of ampersands and semicolons onto the symbols they want to edit. Thus we have ended up with "the CSV of hypertext documents", Markdown.

[+] banashark|6 months ago|reply
Interesting read. I’ve often wondered why the projection we see needs to be the same as the stored artifact. Even something like a git diff should be viewable via a projection of the source IR.

With things like treesitter and the like, I sometimes daydream about what an efficient and effective HCI for an AST or IR would look like.

Things like f#s ordered compilation often make code reviews more simple for me, but that’s because a piece of the intermediate form (dependency order) is exposed to me as a first class item. I find it much more simple to reason about compared to small changes in code with more lax ordering requirements, where I often find myself jumping up and down and back and forth in a diff and all the related interfaces and abstract classes and implementations to understand what effect the delta is having on the program as a whole.

[+] PaulKeeble|6 months ago|reply
In theory we could have an IDE apply a reformatting to any piece of code we looked at and formatted any changes back to the standard for the code base on updates. One of the things I dislike is that sometimes autoformatting does a poor job and looses some information that manually formatting provides but honestly in go fmt is mostly fine it just works.

All of this seems doable, I just think for the most part we don't care very much about our preferences, it has very little impact on readability. Its definitely doable however we could view the code however we most wanted it and have it stored in a different formatting. Might not be 100% round trip stable but it probably doesn't matter.

There is always better where the defaults can be overridden and formatting forced and we only format new and changed lines to reduce potential instability but again go fmt doesn't really suffer from this so its possible to make things pretty reliable. Its simple really, there is a default formatting and the code is stored that way and we can then have our view of choice reformat the code as we want it, when its stored its stored in the default.

[+] lisper|6 months ago|reply
It never ceases to amaze me how many times people can essentially re-invent S-expressions without realizing that's what they are doing.
[+] rs186|6 months ago|reply
Ah, eslint-config-airbnb. My favorite airbnb config issues:

https://github.com/airbnb/javascript/issues/1271

https://github.com/airbnb/javascript/issues/1122

I literally spent over an hour when adapting an existing project to use the airbnb config, when code was perfectly correct, clear and maintainable. I ended up disabling those specific rules locally. I never used it in another project. (Looks like the whole project is no longer maintained. Good riddance.)

The airbnb config is, in my view, the perfect example of unnecessarily wasting people's productivity when linting is done badly.

[+] oftenwrong|6 months ago|reply
Storing an IR also means we can create languages beyond the limits of syntactical practicality. Imagine, for example, an entire comment/documentation dimension of the code. Instead of commenting on a line near some code, you could attach comments semantically to an expression, or to a variable, or to any unit of code.
[+] kesor|6 months ago|reply
This is how Chrome Dev Tools shows source code. The original is often minified or in whatever format the author left it. And when you check the "pretty" checkbox in dev tools, it shows up using whichever format Chrome developers decided it should look like.
[+] shmerl|6 months ago|reply
You can't easily search / grep etc. an IR, unless you use some kind of reverse translator. Readable source files have their benefits in being simple in that sense.
[+] __MatrixMan__|6 months ago|reply
Unison doesn't move the formatting choices further than the machine on which the code was written. The codebase only contains the AST.

Its such a cool idea, though I haven't spent much time using it in anger, so its hard to say if its a useful idea.

[+] wonger_|6 months ago|reply
Yeah, if any language has potential for AST source of truth instead of textual source of truth, it's Unison.

I'm just waiting for a breakthrough project to show that it's ready for wider adoption. Leaving text-based tooling is a big ask.

The principles behind Unison, for those who haven't read them yet: https://www.unison-lang.org/docs/the-big-idea/#richer-codeba...

> Each Unison definition is identified by a hash of its syntax tree.

[+] oftenwrong|6 months ago|reply
Unison's immutable definitions also enable a bunch of compelling capabilities. No merge conflicts. Incremental everything: build, test, lint, distribution, rendering as formatted text, et cetera. Trivial to apply "hot" updates to running systems.
[+] TheAlchemist|6 months ago|reply
I like that. We should have something like this for python.

Black is great, but maybe it's just me since it aligns with how I like the code formatted.

Would there be any downsides for python (or git ?) to define a standard way of formatting to save a valid file, and all the formatting necessary to read a file happens in the IDE showing the file ?

That would very much fit with python ethos 'There should be one-- and preferably only one --obvious way to do it.'

[+] lordnacho|6 months ago|reply
Aren't most projects these days written in a mix of languages, most of them text? You'd have to get them to change to use the same tools we currently use, or else you'd have to use special tools. The beauty of the modern stack is the base tools are near universal.

If you want everyone to see their own preference of format, either write a script or get AI to format it for you.

[+] ChrisMarshallNY|6 months ago|reply
I've heard that Google works [sort of] that way (don't know, myself). They have a lot of tools that allow devs to use what formatting they want, and it's made standard, during checkin.

I heard this, many years ago, when we used Perforce. The Perforce consultant that we dealt with, told us this, as an example of triggers. Back then, I was told that Google was a big Perforce shop (maybe just a part of Google. I dunno).

I have heard that this was one of the goals of developing IDLs. I think the vision was, that you could have a dozen different programmers, working in multiple languages (for example, C for the drivers, Haskell for the engine, and Lua for the UI). They would be converted to a common IDL, when submitted to configuration management, and then extracted from that, when the user looks at it.

I can't see that working, but a lot of stuff that I used to think was crazy, has happened, so, who knows?

[+] yojo|6 months ago|reply
I can confirm that Google was using Perforce for version control extensively, at least through 2008. I think it was somehow customized, but I definitely have lingering muscle memory around “p4 sync” and “p4 submit”.

I was on an internal tools team doing distinctly unsexy LAMP-stack work, but all the documentation I ever saw talked about perforce/p4.

[+] __loam|6 months ago|reply
Go was designed at Google with a built in style checker to explicitly address this and prevent bikeshedding.
[+] laserbeam|6 months ago|reply
Reminds me of dion systems. A few years ago a group of devs was working on a programming environment that feels very close to what DIANA is describing.

The project is dead enough that they no longer own the TLD for the company. As far as I know, the only remnants of the project are youtube recordings of demos held at conferences.