As another user mentioned, many POSIX and/or GNU utilities havent aged well.
I respect trying to stay portable, but peoples needs change over time and these
tools simply havent kept up. Like the other user, I use Fd now instead:
On end-user systems with fast I/O, it is usually a better use of resources to have a battery-/suspend-/reboot-aware indexing system that has path/extension white/blacklists and prioritizes file monitored changes. This way, searching can happen against an optimized text search DBMS much faster than waiting for zillions of IOPS that are basically wasted by instead spending them ahead-of-time in the background/idle indexing text files/metadata/structured data once when it changed/s and already know exactly where all occurrences of kWhateverCondition_PP3V42_G3H or \A[AB]{1,3}c+d\z live under ~/Projects without reading any actual files.
As another user mentioned, many POSIX and/or GNU utilities havent aged well
What do you mean by that? Users have gotten more stupid and uneducated over time? I've seen "when in doubt, escape it" in several books about UNIX and shell, and escapes and wildcard expansion are one of the most beginner lessons in a lot of other materials. However, the Internet has let everyone have a voice, for better or worse, and as a result anyone who barely knows something can now write a misguided "tutorial" about it.
In other words: don't blame the tools, nor advocate writing dumbed-down replacements which lack the composability and generality the originals had; blame the proliferation of barely-correct educational material that has spread cancerously over the Internet.
In this case the shell hasn’t aged well. Besides the globing feature, the bash language is arcane and few can actually reliably write correct bash despite it being ubiquitous. Personally I would like to see a completely reimagined take on the shell.
I would argue that this case shows that the shell(s) have a problem with a feature being enabled by default. In hindsight it might have been smarter to put globbing expressions in special quotes than the other way round.
For me, Silver Searcher often fails to find matches in files that 'grep' will, easily. I've not been able to determine why this is, but it happens often enough that I consider SS to be highly unreliable, even if its fast.
So, it seems that there are still some issues to be sorted here.
There's a note at the end about how with nullglob set, ls on a glob with no matches does something surprising. This is a great illustration of how an empty list and the absence of a list are different. Sadly it's rather hard to make that distinction in shells!
I do wish that either shells had a more explicit syntax for globbing, or other commands didn't use the same syntax for patterns. Then confusion like this couldn't occur. An example of the former would be if you had to write:
ls $(glob *.txt)
Here, the shell would not treat * specially, but rather you would have to explicitly expand it. This would be a pain, but at least you wouldn't do it by mistake!
I set failglob: `shopt -s failglob`. Makes the whole command fail if there's no matches. That combined with `set -e` which aborts the script in the event of any command failing makes me feel somewhat safe.
Indeed I add the following two lines to every bash script I write:
Details: I had it on when you set 'shopt -s strict:all', but I neglected to turn it on for 'oil:basic' and 'oil:all'. Those are oil-specific option groups so you don't have to remember all the option names.
But I just fixed that and it will be out with the next Oil release.
If anyone wants to add more strictness options to Oil to avoid these types of mistakes, let us know! (e.g. on https://github.com/oilshell/oil, or there's more contact info on the home page).
I don't know if it's the default or if it's something I've set in my config a decade or two ago but my zsh behaves like this by default (i.e. I get an error for blobs that match nothing instead of silently passing the * along). That seems much saner to me:
$ find . -name *.txt
zsh: no matches found: *.txt
That'll teach you to quote your `find` patterns real fast...
Tangential: another safeguard you can adopt is avoiding "hard delete" commands like rm and find -delete. Untrain yourself from these commands by never using them. On Mac systems, the "trash" program (brew install trash) sends files to your system trash. You can use `trash [file]` and `find .. -print0 | xargs -0 trash --`. rm is a dangerous command you should only very rarely be using.
I fish something out of the trash a few times a year and lemme tell ya; it's worth the investment.
Another tip if you fancy debugging shells is using
For a long time (probably since the first time I forgot to quote something and got burned, so around 40 years), I've thought that there should be some mechanism for the shell to pass in information about how each argument came about.
For each argument, it would tell the program if it was supplied directly, or came from wildcard expansion. For those from wildcard expansion, it would tell the program what the wildcard was.
Most programs would not care, but some programs could use this to catch common quoting errors.
> For those from wildcard expansion, it would tell the program what the wildcard was.
Different shells have different globbing mechanisms. Why should all programs tie themselves to the mechanisms of any one particular shell?
The simpler the calling mechanism for executables, the simpler it is to write them in any existing or future language. This also gives more flexibility to future shells.
UNIX is pretty much designed thinking of users as programmers. Making it easy to write programs building on other programs is as important as being able to call them. With that in mind, I don't think it's a good compromise to complicate the writing of executables in order to protect users from their own mistakes.
I'm not sure why people are saying it would make executables harder to write; it could most easily be done with environment variables as opposed to modifying the signature of `int main()`. Something along the lines of: `GLOBLESS_ARGC=5` means that `GLOBLESS_ARG{0..4}` have the original arguments as supplied by the shell user.
Shell (because this is technically a shell, not a find issue) is the worst language that everyone should learn. It's a language you'll actually encounter, and it's one that's hard to avoid (unlike PHP).
This was obvious to me, but one version of this that surprised me is when using scp. If you glob a remote destination like "scp myserver:*.jpg ./" It will probably work! But how? Because the remote path will likely not match any local files and the path with the asterisk will be passed to scp and scp will do the globbing on the remote side.
I believe that programming languages should never make the meaning of a program depend on the context in which it is executed. So many obscure bugs are directly caused by such behaviors. There should be exactly one possible interpretation for a given statement, and if that cannot be executed, then the program should abort. In this case, the glob should never have been passed on to find. It should either have expanded to the empty array, or failed.
One could argue that it is merely a side effect that a shell constitutes a programming language. And one could also note that find should employ the same tradition as the shell (ie. Use /*.jpg to recourse into folders)
My instant reaction to the example was “that won’t work; you’re shell will say something like ‘no matches’”.
Using an unescaped star in a find command never works for me, which is a lot better than it sometimes working and sometimes breaking!
Reading the article and the comments, it seems like bash doesn’t do this? I suppose it’s one nice thing about oh-my-zsh, whose default confit I use almost unchanged.
Putting ‘shellcheck’ in your CI pipeline is a must for me now, after one too many mistakes.
I just finished cleaning away all existing ‘error’ and ‘warning’ level issues in our codebase so that the ‘shellcheck’ CI step can be really strict on code quality.
Common? Yes. Simple enough to stop doing this mistake after two times? Also yes. One you internalize in which case shell is responsible for globbing and in which case command itself, it is pretty clear cut.
Isn't this in the Unix hater manual? I never use -name without ''. I guess this is just muscle memory from early on when I run into this issue that in Unix *.py can mean very different things depending on where it gets resolved.
> * Most importantly, the 'find' command uses a different algorithm than shell globbing does when matching wildcard characters. More specifically, the find command will apply the search pattern against the base of the file name with all leading directories removed. This is contrasted from shell globbing which will expand the wildcard between each path component separately. When no path components are specified, the wildcard will match only files in the current directory.*
So there does seem to be a `find` specific issue here
[+] [-] svnpenn|6 years ago|reply
https://github.com/sharkdp/fd
As well as Silver Searcher:
https://github.com/ggreer/the_silver_searcher
while Grep performance is pretty good, its also gotten pretty stale with regard to its defaults and options.
[+] [-] anonsivalley652|6 years ago|reply
https://github.com/BurntSushi/ripgrep
On end-user systems with fast I/O, it is usually a better use of resources to have a battery-/suspend-/reboot-aware indexing system that has path/extension white/blacklists and prioritizes file monitored changes. This way, searching can happen against an optimized text search DBMS much faster than waiting for zillions of IOPS that are basically wasted by instead spending them ahead-of-time in the background/idle indexing text files/metadata/structured data once when it changed/s and already know exactly where all occurrences of kWhateverCondition_PP3V42_G3H or \A[AB]{1,3}c+d\z live under ~/Projects without reading any actual files.
[+] [-] kragen|6 years ago|reply
[+] [-] userbinator|6 years ago|reply
What do you mean by that? Users have gotten more stupid and uneducated over time? I've seen "when in doubt, escape it" in several books about UNIX and shell, and escapes and wildcard expansion are one of the most beginner lessons in a lot of other materials. However, the Internet has let everyone have a voice, for better or worse, and as a result anyone who barely knows something can now write a misguided "tutorial" about it.
In other words: don't blame the tools, nor advocate writing dumbed-down replacements which lack the composability and generality the originals had; blame the proliferation of barely-correct educational material that has spread cancerously over the Internet.
[+] [-] BeeOnRope|6 years ago|reply
[+] [-] mikenew|6 years ago|reply
This is what I want as a default for pretty much any text search.
[+] [-] Aardwolf|6 years ago|reply
Sure, disks may have sectors and there may be some use cases, but we use bytes, kilobytes, kibibytes, etc now for data sizes.
[+] [-] weberc2|6 years ago|reply
[+] [-] choeger|6 years ago|reply
[+] [-] fit2rule|6 years ago|reply
So, it seems that there are still some issues to be sorted here.
[+] [-] twic|6 years ago|reply
http://bash.cumulonim.biz/NullGlob.html
There's a note at the end about how with nullglob set, ls on a glob with no matches does something surprising. This is a great illustration of how an empty list and the absence of a list are different. Sadly it's rather hard to make that distinction in shells!
I do wish that either shells had a more explicit syntax for globbing, or other commands didn't use the same syntax for patterns. Then confusion like this couldn't occur. An example of the former would be if you had to write:
Here, the shell would not treat * specially, but rather you would have to explicitly expand it. This would be a pain, but at least you wouldn't do it by mistake![+] [-] johncs|6 years ago|reply
Indeed I add the following two lines to every bash script I write:
[+] [-] chubot|6 years ago|reply
So you get this:
Details: I had it on when you set 'shopt -s strict:all', but I neglected to turn it on for 'oil:basic' and 'oil:all'. Those are oil-specific option groups so you don't have to remember all the option names.But I just fixed that and it will be out with the next Oil release.
https://github.com/oilshell/oil/commit/ddac119254f9a7045dca7...
If anyone wants to add more strictness options to Oil to avoid these types of mistakes, let us know! (e.g. on https://github.com/oilshell/oil, or there's more contact info on the home page).
[+] [-] simias|6 years ago|reply
[+] [-] nothrabannosir|6 years ago|reply
I fish something out of the trash a few times a year and lemme tell ya; it's worth the investment.
Another tip if you fancy debugging shells is using
this provides the same functionality as the c program in TFA, without needing gcc.[+] [-] Cpoll|6 years ago|reply
Other tips include colouring your prompt highlighted red on production boxes so you never accidentally think you're somewhere safe.
Nowadays I try to avoid SSHing into mission critical machines though.
[+] [-] icebraining|6 years ago|reply
[+] [-] rezonant|6 years ago|reply
or just using bash:
function showargs() { for x in "$@"; do echo "arg: $x"; done }
[+] [-] heinrichhartman|6 years ago|reply
[+] [-] tzs|6 years ago|reply
For each argument, it would tell the program if it was supplied directly, or came from wildcard expansion. For those from wildcard expansion, it would tell the program what the wildcard was.
Most programs would not care, but some programs could use this to catch common quoting errors.
[+] [-] kalium_xyz|6 years ago|reply
[+] [-] jolmg|6 years ago|reply
Different shells have different globbing mechanisms. Why should all programs tie themselves to the mechanisms of any one particular shell?
The simpler the calling mechanism for executables, the simpler it is to write them in any existing or future language. This also gives more flexibility to future shells.
UNIX is pretty much designed thinking of users as programmers. Making it easy to write programs building on other programs is as important as being able to call them. With that in mind, I don't think it's a good compromise to complicate the writing of executables in order to protect users from their own mistakes.
[+] [-] coolreader18|6 years ago|reply
[+] [-] asdff|6 years ago|reply
[+] [-] dehrmann|6 years ago|reply
[+] [-] seiferteric|6 years ago|reply
[+] [-] tyingq|6 years ago|reply
Like, "ssh remote 'cd /whatever; tar -cf - someglobpattern' | tar -xvf"
[+] [-] ulrikrasmussen|6 years ago|reply
[+] [-] madsbuch|6 years ago|reply
[+] [-] umanwizard|6 years ago|reply
Using an unescaped star in a find command never works for me, which is a lot better than it sometimes working and sometimes breaking!
Reading the article and the comments, it seems like bash doesn’t do this? I suppose it’s one nice thing about oh-my-zsh, whose default confit I use almost unchanged.
[+] [-] CGamesPlay|6 years ago|reply
[+] [-] erikbye|6 years ago|reply
[+] [-] mehrdada|6 years ago|reply
[+] [-] fctorial|6 years ago|reply
[+] [-] thundergolfer|6 years ago|reply
I just finished cleaning away all existing ‘error’ and ‘warning’ level issues in our codebase so that the ‘shellcheck’ CI step can be really strict on code quality.
[+] [-] arcade79|6 years ago|reply
The find command works exactly as expected.
[+] [-] thunderbong|6 years ago|reply
Very useful article. And very informative.
Summary:
Instead of -
Use quotes around pattern i.e. Edit: Oops, the double-quotes should have been single quotes! Thanks, @lucd. Happy case, like I said![+] [-] unknown|6 years ago|reply
[deleted]
[+] [-] lucd|6 years ago|reply
[+] [-] mynegation|6 years ago|reply
[+] [-] vikinghckr|6 years ago|reply
[+] [-] StreamBright|6 years ago|reply
[+] [-] madsbuch|6 years ago|reply
Thanks! I would have jumped right in!
[+] [-] joana035|6 years ago|reply
Both features are also often covered in entry level material for introduction to shell.
[+] [-] shmageggy|6 years ago|reply
> * Most importantly, the 'find' command uses a different algorithm than shell globbing does when matching wildcard characters. More specifically, the find command will apply the search pattern against the base of the file name with all leading directories removed. This is contrasted from shell globbing which will expand the wildcard between each path component separately. When no path components are specified, the wildcard will match only files in the current directory.*
So there does seem to be a `find` specific issue here
[+] [-] umanwizard|6 years ago|reply
[+] [-] siddharthgoel88|6 years ago|reply
[+] [-] sys_64738|6 years ago|reply
This is pretty elementary which any seasoned Linux person should know.
[+] [-] jancsika|6 years ago|reply
Translation: you cannot type "man" followed by an asterisk because that would have required forethought in how one learns a programming language.
Argle: Hey Bargle, is that new bridge built to spec?
Bargle: It's like I always say, man: good enough for shell script manual operator discoverability.
Argle: Yeah, you're always saying that...