top | item 4649508

Hacking ls -l

199 points| drp4929 | 13 years ago |lemis.com

88 comments

order
[+] memset|13 years ago|reply
Guys! The point of this article is not to prescribe the only method of displaying human-readable file sizes. Obviously one could use `ls -lh`; the author clearly demonstrates that he is willing and able to read man pages to find answers.

Rather, this is a pretty interesting look into what it actually entails to make what ought to be a very simple and straightforward change.

It turns out that these simple changes are hard! Not just in identifying the piece of code to modify, but that man pages are often incomplete or unclear. It also illustrates the complexities behind making software portable - in this case, using the nation-neutral place separator. It also reminds us that solving what is on the surface a simple problem lets one uncover all sorts of interesting and messy details underneath - including more problems to solve!

These are steps that he'd have to take no matter what the code or feature. This article is not "complexity for complexity's sake", it's illustrating the complexity of making changes to any piece of code - and that it is surprisingly difficult for something that one would think is very easy!

[+] cheald|13 years ago|reply
On the other hand, you could consider it a cautionary tale about not reinventing wheels, because a problem that may seem trivial at first often turns out to be far more complex than expected.

This is why you try to re-use work when possible, rather than endlessly reinventing things, because while sure, adding a comma to the printf string is easy enough, your assumptions (English locale, compiler not trying to be clever) are going to quickly become visible as things fall apart because your assumptions aren't in line with the system's assumptions.

What this story really demonstrates is that without a clear understanding of how a system is designed and the basic assumptions it makes, just "hacking on the code" is just as likely to break things as it is to fix them.

[+] guylhem|13 years ago|reply
It is not very easy because it is an unusual request.

But I still wonder if this is easier than ls -l | sed -e :a -e 's/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/;ta' ??

The investigations would be interesting it they were more complete, i.e. if the actual result was a change in the locale which could be appliable to other tools printing numbers besides ls (in the author TODO).

I mean, will it work with bc?

At the moment it's not better than an shell script alias giving the output to sed, but it is more complex - you have to recompile a binary for every OS you use.

[+] DanBC|13 years ago|reply
> It turns out that these simple changes are hard! Not just in identifying the piece of code to modify, but that man pages are often incomplete or unclear.

This is a great shame. I like OpenBSD's approach to man pages - incorrect documentation is a bug and can be as severe as a bug in code; correct documentation is important.

Fixing up man pages is something that non-technical volunteers could help with, except when it's hard to grok what the code actually does vs what it should do.

[+] pixelbeat|13 years ago|reply
I enable this for GNU ls like:

    alias ls="BLOCK_SIZE=\'1 ls --color=auto"
The above is a bit hacky and not very UNIXy as it's lumping more logic into ls, rather than splitting out into functional units.

Number formatting being a very common requirement, I've proposed a design for a new numfmt GNU coreutil

http://lists.gnu.org/archive/html/coreutils/2012-02/msg00085...

which would be used like:

    ls -l | numfmt --field=5 --format=%'d
[+] Aardwolf|13 years ago|reply
I really don't understand why block size is 512 by default. It should really be 1 by default.

Except for someone with an ancient hard disk who thinks in blocks instead of (mega, giga, etc...)bytes, who ever needs or wants that?

[+] fafner|13 years ago|reply
Why not

    alias ls="ls --block-size=\'1 --color=auto"
[+] osteele|13 years ago|reply
Mr. Lehey managed to improve the system in such a way that it will subsequent changes for him and others easier, independently of whether the specific change to `ls` is never adopted. It's “five whys” applied to “why is this hard” and “how can I make it easier”. It's more effort, with a greater chance that much of it will survive the current context and requirements.

Some people improve the area they travel through, others leave debris, and many are noops who make no difference to those who come after. If there's not enough entropy fighters like Mr. Lehey working a system, it turns to kipple.

[+] jrockway|13 years ago|reply
Most annoying is that gcc warns about perfectly valid and logical code. That causes people to ignore warnings, and before you know it, you have a piece of software that has more warnings than lines of code.

Alternatively, when you cleverly figure out how to work around the warning, like the author does, you now prevent that rule from triggering even when it's right. Clearly a better unit test is needed.

[+] TwoBit|13 years ago|reply
The printf ' format specifier is not Standard C. It's in neither the C99 Standard nor the new C11 Standard. So it's not actually valid, and it's a coincidence if it happens to work with your Standard Library.

Consider that the compiler generating that warning knows only Standard C, and in fact you could be pairing it with any C library, including those that are strictly conforming and don't support the ' extension.

[+] lelf|13 years ago|reply
> Alternatively, when you cleverly figure out how to work around the warning,

Or just read the docs (it should be "%'*jd "). Then no warnings. (IIRC ' is in C99 and -std=gnu99 targets c99 + gnu extensions.)

The same story with the rest. Two ways of doing things — learn & think and just do it right or twiddle until it seems to likely maybe work (possibly). The article is about the latter. Plus "blame the compiler".

[+] Evbn|13 years ago|reply
That warning seems like a nasty hack anyway, if the compiler can't inline a local when running safety checks.

It is super scary that the compiler appears to be using a different constant from printf for its format checker, that shows it probably isn't using a pattern supplied by printf.

[+] cheald|13 years ago|reply
While I appreciate the story, what's wrong with `ls -lh`?
[+] paddyoloughlin|13 years ago|reply
For me, -h makes it more difficult to quickly compare the sizes of files in one list by glance. This is something that I have to do often enough that it has prevented me from adding -h to my ls alias. I'd have to use it for a bit to be sure, but the post's suggestion seems a pretty good 'best of both worlds' solution to me.
[+] pepve|13 years ago|reply
I guess people can be very different in this respect. I really need the full number (or at least all of the numbers in the same unit) to be displayed. I don't find the 'human' format helpful at all when looking at ls output.
[+] ghostfish|13 years ago|reply
My thoughts exactly. This is just complexity for complexity's sake. Useful as an exercise, but the -h flag already does this is an even more readable manner.
[+] al1x|13 years ago|reply
[email protected] made a similar post to the freebsd-questions mailing list a month ago. In his case the question was how to print an md5sum along with the file names in a given directory. I saved it because I thought it was a clever hack.

http://lists.freebsd.org/pipermail/freebsd-questions/2012-Se...

A lot of times I catch myself in the mindset of taking a step back and saying "here are the set of tools I have at hand to accomplish a task" without realizing that I should simultaneously be taking a step "in"--so to speak--and acknowledging that the tools I have to work with are not immutable tools cast of iron; they are malleable and can be re-tooled to suit my purposes.. and that sometimes going that route can be the simplest--and in fact "best"--solution.

[+] zapman449|13 years ago|reply
Or, you could use 'ls -h'...

(that said, I do see the utility, since it gives a more obvious visual queue as to the order of size differences... but if you're doing anything with the sizes programatically, you have to remove the commas afterwards... Short version: if you're going to do this, make it a unique flag, or a new flag modifier to the -l flag... don't overload the -l flag without recourse...)

[+] Evbn|13 years ago|reply
Ideally a parser should respect locale, and use a sane format (not commas) for multiple numbers in a list.

Even better if the _ separator used by programming languages were a supported locale LC=C_FOR_HUMANS :-)

[+] meyering|13 years ago|reply
FYI, there is no need to change GNU ls to get that behavior. You can make it use your locale's separator with either the --block-size="'1" option or by setting the LS_BLOCK_SIZE envvar to that same string:

    $ LC_ALL=en_US.UTF8 ls -og --block-size="'1" .
    -rw-------. 1 5,145,416 Oct  5 16:44 A
    -rw-------. 1 5,137,692 Oct  4 14:37 B
    -rw-------. 1 5,147,168 Oct  8 07:52 C
This feature is documented in the "Block size" section of the coreutils manual: i.e., you can type this to see it:

    info coreutils 'block size'
[+] dsr_|13 years ago|reply
Now let's consider software lifecycle in a large context: longevity of forks.

If he doesn't send the changes off to upstream, and make a case good enough for them to be approved, then all this dooms him to maintaining his fork on all the platforms where he wants it until he gets sick of it or convinces someone else to do it for him.

[+] cjg_|13 years ago|reply
FYI, Greg Lehey is a longtime FreeBSD committer.
[+] Aardwolf|13 years ago|reply
Man, such fragile stuff. Why not code a function yourself that turns a number into a string representing it decimally with the commas every three digits. I normally like and use good library functions and standards, but if they're that fragile and depend on your environment then no thanks.
[+] barrkel|13 years ago|reply
Because that would not be correct in Germany or other locales which use . for digit group separation and , for the decimal separator.
[+] michael_h|13 years ago|reply
not everyone uses commas to separate their number groupings. his solution will work for any locale.
[+] jtgeibel|13 years ago|reply
For such large numbers, would it make more sense to use groups of 6 instead of 3? This would allow you to easily identify the megabyte position with the next separator at the terabyte position.
[+] njharman|13 years ago|reply
Man, this sounds like every change I try to make to "legacy" code. There's so much debt and smell. I find it very, very hard to leave alone.
[+] thaumasiotes|13 years ago|reply
Surely the appropriate option character for this new, human-readable output is "-h".

Makes you wonder whether anyone ever considered the problem before...

[+] chris_wot|13 years ago|reply
Human readable form in bytes? I thought it you had a file greater than a megabyte hen it shows in MB? What if you want to read it in bytes with the thousands separator?
[+] codegeek|13 years ago|reply
i always use one hack for ls. alias lsd="ls -ltrF | grep ^d"

This way, I quickly run lsd to only look for directories.

[+] andreasvc|13 years ago|reply
That's funny, I also have an alias for lsd which does this. But you can do it without grep: lsd='ls -d */'
[+] Evbn|13 years ago|reply
But lsd makes you see things that aren't even there.