Guys! The point of this article is not to prescribe the only method of displaying human-readable file sizes. Obviously one could use `ls -lh`; the author clearly demonstrates that he is willing and able to read man pages to find answers.
Rather, this is a pretty interesting look into what it actually entails to make what ought to be a very simple and straightforward change.
It turns out that these simple changes are hard! Not just in identifying the piece of code to modify, but that man pages are often incomplete or unclear. It also illustrates the complexities behind making software portable - in this case, using the nation-neutral place separator. It also reminds us that solving what is on the surface a simple problem lets one uncover all sorts of interesting and messy details underneath - including more problems to solve!
These are steps that he'd have to take no matter what the code or feature. This article is not "complexity for complexity's sake", it's illustrating the complexity of making changes to any piece of code - and that it is surprisingly difficult for something that one would think is very easy!
On the other hand, you could consider it a cautionary tale about not reinventing wheels, because a problem that may seem trivial at first often turns out to be far more complex than expected.
This is why you try to re-use work when possible, rather than endlessly reinventing things, because while sure, adding a comma to the printf string is easy enough, your assumptions (English locale, compiler not trying to be clever) are going to quickly become visible as things fall apart because your assumptions aren't in line with the system's assumptions.
What this story really demonstrates is that without a clear understanding of how a system is designed and the basic assumptions it makes, just "hacking on the code" is just as likely to break things as it is to fix them.
It is not very easy because it is an unusual request.
But I still wonder if this is easier than ls -l | sed -e :a -e 's/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/;ta' ??
The investigations would be interesting it they were more complete, i.e. if the actual result was a change in the locale which could be appliable to other tools printing numbers besides ls (in the author TODO).
I mean, will it work with bc?
At the moment it's not better than an shell script alias giving the output to sed, but it is more complex - you have to recompile a binary for every OS you use.
> It turns out that these simple changes are hard! Not just in identifying the piece of code to modify, but that man pages are often incomplete or unclear.
This is a great shame. I like OpenBSD's approach to man pages - incorrect documentation is a bug and can be as severe as a bug in code; correct documentation is important.
Fixing up man pages is something that non-technical volunteers could help with, except when it's hard to grok what the code actually does vs what it should do.
Mr. Lehey managed to improve the system in such a way that it will subsequent changes for him and others easier, independently of whether the specific change to `ls` is never adopted. It's “five whys” applied to “why is this hard” and “how can I make it easier”. It's more effort, with a greater chance that much of it will survive the current context and requirements.
Some people improve the area they travel through, others leave debris, and many are noops who make no difference to those who come after. If there's not enough entropy fighters like Mr. Lehey working a system, it turns to kipple.
Most annoying is that gcc warns about perfectly valid and logical code. That causes people to ignore warnings, and before you know it, you have a piece of software that has more warnings than lines of code.
Alternatively, when you cleverly figure out how to work around the warning, like the author does, you now prevent that rule from triggering even when it's right. Clearly a better unit test is needed.
The printf ' format specifier is not Standard C. It's in neither the C99 Standard nor the new C11 Standard. So it's not actually valid, and it's a coincidence if it happens to work with your Standard Library.
Consider that the compiler generating that warning knows only Standard C, and in fact you could be pairing it with any C library, including those that are strictly conforming and don't support the ' extension.
> Alternatively, when you cleverly figure out how to work around the warning,
Or just read the docs (it should be "%'*jd "). Then no warnings. (IIRC ' is in C99 and -std=gnu99 targets c99 + gnu extensions.)
The same story with the rest. Two ways of doing things — learn & think and just do it right or twiddle until it seems to likely maybe work (possibly). The article is about the latter. Plus "blame the compiler".
That warning seems like a nasty hack anyway, if the compiler can't inline a local when running safety checks.
It is super scary that the compiler appears to be using a different constant from printf for its format checker, that shows it probably isn't using a pattern supplied by printf.
For me, -h makes it more difficult to quickly compare the sizes of files in one list by glance.
This is something that I have to do often enough that it has prevented me from adding -h to my ls alias.
I'd have to use it for a bit to be sure, but the post's suggestion seems a pretty good 'best of both worlds' solution to me.
I guess people can be very different in this respect. I really need the full number (or at least all of the numbers in the same unit) to be displayed. I don't find the 'human' format helpful at all when looking at ls output.
My thoughts exactly. This is just complexity for complexity's sake. Useful as an exercise, but the -h flag already does this is an even more readable manner.
[email protected] made a similar post to the freebsd-questions mailing list a month ago. In his case the question was how to print an md5sum along with the file names in a given directory. I saved it because I thought it was a clever hack.
A lot of times I catch myself in the mindset of taking a step back and saying "here are the set of tools I have at hand to accomplish a task" without realizing that I should simultaneously be taking a step "in"--so to speak--and acknowledging that the tools I have to work with are not immutable tools cast of iron; they are malleable and can be re-tooled to suit my purposes.. and that sometimes going that route can be the simplest--and in fact "best"--solution.
(that said, I do see the utility, since it gives a more obvious visual queue as to the order of size differences... but if you're doing anything with the sizes programatically, you have to remove the commas afterwards... Short version: if you're going to do this, make it a unique flag, or a new flag modifier to the -l flag... don't overload the -l flag without recourse...)
FYI, there is no need to change GNU ls to get that behavior. You can make it use your locale's separator with either the --block-size="'1" option or by setting the LS_BLOCK_SIZE envvar to that same string:
$ LC_ALL=en_US.UTF8 ls -og --block-size="'1" .
-rw-------. 1 5,145,416 Oct 5 16:44 A
-rw-------. 1 5,137,692 Oct 4 14:37 B
-rw-------. 1 5,147,168 Oct 8 07:52 C
This feature is documented in the "Block size" section of the coreutils manual: i.e., you can type this to see it:
Now let's consider software lifecycle in a large context: longevity of forks.
If he doesn't send the changes off to upstream, and make a case good enough for them to be approved, then all this dooms him to maintaining his fork on all the platforms where he wants it until he gets sick of it or convinces someone else to do it for him.
Man, such fragile stuff. Why not code a function yourself that turns a number into a string representing it decimally with the commas every three digits. I normally like and use good library functions and standards, but if they're that fragile and depend on your environment then no thanks.
For such large numbers, would it make more sense to use groups of 6 instead of 3? This would allow you to easily identify the megabyte position with the next separator at the terabyte position.
Human readable form in bytes? I thought it you had a file greater than a megabyte hen it shows in MB? What if you want to read it in bytes with the thousands separator?
[+] [-] memset|13 years ago|reply
Rather, this is a pretty interesting look into what it actually entails to make what ought to be a very simple and straightforward change.
It turns out that these simple changes are hard! Not just in identifying the piece of code to modify, but that man pages are often incomplete or unclear. It also illustrates the complexities behind making software portable - in this case, using the nation-neutral place separator. It also reminds us that solving what is on the surface a simple problem lets one uncover all sorts of interesting and messy details underneath - including more problems to solve!
These are steps that he'd have to take no matter what the code or feature. This article is not "complexity for complexity's sake", it's illustrating the complexity of making changes to any piece of code - and that it is surprisingly difficult for something that one would think is very easy!
[+] [-] cheald|13 years ago|reply
This is why you try to re-use work when possible, rather than endlessly reinventing things, because while sure, adding a comma to the printf string is easy enough, your assumptions (English locale, compiler not trying to be clever) are going to quickly become visible as things fall apart because your assumptions aren't in line with the system's assumptions.
What this story really demonstrates is that without a clear understanding of how a system is designed and the basic assumptions it makes, just "hacking on the code" is just as likely to break things as it is to fix them.
[+] [-] guylhem|13 years ago|reply
But I still wonder if this is easier than ls -l | sed -e :a -e 's/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/;ta' ??
The investigations would be interesting it they were more complete, i.e. if the actual result was a change in the locale which could be appliable to other tools printing numbers besides ls (in the author TODO).
I mean, will it work with bc?
At the moment it's not better than an shell script alias giving the output to sed, but it is more complex - you have to recompile a binary for every OS you use.
[+] [-] DanBC|13 years ago|reply
This is a great shame. I like OpenBSD's approach to man pages - incorrect documentation is a bug and can be as severe as a bug in code; correct documentation is important.
Fixing up man pages is something that non-technical volunteers could help with, except when it's hard to grok what the code actually does vs what it should do.
[+] [-] pixelbeat|13 years ago|reply
Number formatting being a very common requirement, I've proposed a design for a new numfmt GNU coreutil
http://lists.gnu.org/archive/html/coreutils/2012-02/msg00085...
which would be used like:
[+] [-] Aardwolf|13 years ago|reply
Except for someone with an ancient hard disk who thinks in blocks instead of (mega, giga, etc...)bytes, who ever needs or wants that?
[+] [-] fafner|13 years ago|reply
[+] [-] osteele|13 years ago|reply
Some people improve the area they travel through, others leave debris, and many are noops who make no difference to those who come after. If there's not enough entropy fighters like Mr. Lehey working a system, it turns to kipple.
[+] [-] secure|13 years ago|reply
By the way, by looking at http://www.lemis.com/grog/index.php you can see that the author uses FreeBSD, just in case you were wondering about /usr/src
[+] [-] jrockway|13 years ago|reply
Alternatively, when you cleverly figure out how to work around the warning, like the author does, you now prevent that rule from triggering even when it's right. Clearly a better unit test is needed.
[+] [-] TwoBit|13 years ago|reply
Consider that the compiler generating that warning knows only Standard C, and in fact you could be pairing it with any C library, including those that are strictly conforming and don't support the ' extension.
[+] [-] lelf|13 years ago|reply
Or just read the docs (it should be "%'*jd "). Then no warnings. (IIRC ' is in C99 and -std=gnu99 targets c99 + gnu extensions.)
The same story with the rest. Two ways of doing things — learn & think and just do it right or twiddle until it seems to likely maybe work (possibly). The article is about the latter. Plus "blame the compiler".
[+] [-] Evbn|13 years ago|reply
It is super scary that the compiler appears to be using a different constant from printf for its format checker, that shows it probably isn't using a pattern supplied by printf.
[+] [-] joeyh|13 years ago|reply
http://joeyh.name/~joey/blog/entry/ls:_the_missing_options/
(Well, actually, I never got around to writing -z, but it's clear what it should do, and any ls hackers are encouraged to finish that up.)
[+] [-] haldean|13 years ago|reply
[+] [-] cheald|13 years ago|reply
[+] [-] paddyoloughlin|13 years ago|reply
[+] [-] pepve|13 years ago|reply
[+] [-] ghostfish|13 years ago|reply
[+] [-] al1x|13 years ago|reply
http://lists.freebsd.org/pipermail/freebsd-questions/2012-Se...
A lot of times I catch myself in the mindset of taking a step back and saying "here are the set of tools I have at hand to accomplish a task" without realizing that I should simultaneously be taking a step "in"--so to speak--and acknowledging that the tools I have to work with are not immutable tools cast of iron; they are malleable and can be re-tooled to suit my purposes.. and that sometimes going that route can be the simplest--and in fact "best"--solution.
[+] [-] rcthompson|13 years ago|reply
http://ubuntuforums.org/showthread.php?t=684239
[+] [-] zapman449|13 years ago|reply
(that said, I do see the utility, since it gives a more obvious visual queue as to the order of size differences... but if you're doing anything with the sizes programatically, you have to remove the commas afterwards... Short version: if you're going to do this, make it a unique flag, or a new flag modifier to the -l flag... don't overload the -l flag without recourse...)
[+] [-] Evbn|13 years ago|reply
Even better if the _ separator used by programming languages were a supported locale LC=C_FOR_HUMANS :-)
[+] [-] meyering|13 years ago|reply
[+] [-] dsr_|13 years ago|reply
If he doesn't send the changes off to upstream, and make a case good enough for them to be approved, then all this dooms him to maintaining his fork on all the platforms where he wants it until he gets sick of it or convinces someone else to do it for him.
[+] [-] cjg_|13 years ago|reply
[+] [-] Aardwolf|13 years ago|reply
[+] [-] barrkel|13 years ago|reply
[+] [-] michael_h|13 years ago|reply
[+] [-] jtgeibel|13 years ago|reply
[+] [-] njharman|13 years ago|reply
[+] [-] thaumasiotes|13 years ago|reply
Makes you wonder whether anyone ever considered the problem before...
[+] [-] chris_wot|13 years ago|reply
[+] [-] codegeek|13 years ago|reply
This way, I quickly run lsd to only look for directories.
[+] [-] andreasvc|13 years ago|reply
[+] [-] Evbn|13 years ago|reply
[+] [-] unknown|13 years ago|reply
[deleted]
[+] [-] righyeah|13 years ago|reply
[deleted]
[+] [-] sophiabatka464|13 years ago|reply
[deleted]