top | item 22278602

A common mistake involving wildcards and the find command

237 points| robertelder | 6 years ago |blog.robertelder.org | reply

143 comments

order
[+] svnpenn|6 years ago|reply
As another user mentioned, many POSIX and/or GNU utilities havent aged well. I respect trying to stay portable, but peoples needs change over time and these tools simply havent kept up. Like the other user, I use Fd now instead:

https://github.com/sharkdp/fd

As well as Silver Searcher:

https://github.com/ggreer/the_silver_searcher

while Grep performance is pretty good, its also gotten pretty stale with regard to its defaults and options.

[+] anonsivalley652|6 years ago|reply
For grepping, rg is faster and saner.

https://github.com/BurntSushi/ripgrep

On end-user systems with fast I/O, it is usually a better use of resources to have a battery-/suspend-/reboot-aware indexing system that has path/extension white/blacklists and prioritizes file monitored changes. This way, searching can happen against an optimized text search DBMS much faster than waiting for zillions of IOPS that are basically wasted by instead spending them ahead-of-time in the background/idle indexing text files/metadata/structured data once when it changed/s and already know exactly where all occurrences of kWhateverCondition_PP3V42_G3H or \A[AB]{1,3}c+d\z live under ~/Projects without reading any actual files.

[+] kragen|6 years ago|reply
Wouldn't you have the same problem with fd if you invoke it on unquoted globs? The problem is the semantics of the shell, not find.
[+] userbinator|6 years ago|reply
As another user mentioned, many POSIX and/or GNU utilities havent aged well

What do you mean by that? Users have gotten more stupid and uneducated over time? I've seen "when in doubt, escape it" in several books about UNIX and shell, and escapes and wildcard expansion are one of the most beginner lessons in a lot of other materials. However, the Internet has let everyone have a voice, for better or worse, and as a result anyone who barely knows something can now write a misguided "tutorial" about it.

In other words: don't blame the tools, nor advocate writing dumbed-down replacements which lack the composability and generality the originals had; blame the proliferation of barely-correct educational material that has spread cancerously over the Internet.

[+] BeeOnRope|6 years ago|reply
That's not going to directly solve this problem since the globs are expanded before the invoked process is even called with its arguments.
[+] mikenew|6 years ago|reply
> Smart case: the search is case-insensitive by default. It switches to case-sensitive if the pattern contains an uppercase character*.

This is what I want as a default for pretty much any text search.

[+] Aardwolf|6 years ago|reply
Imho also anything that outputs or uses multiples of 512 bytes (blocks) by default

Sure, disks may have sectors and there may be some use cases, but we use bytes, kilobytes, kibibytes, etc now for data sizes.

[+] weberc2|6 years ago|reply
In this case the shell hasn’t aged well. Besides the globing feature, the bash language is arcane and few can actually reliably write correct bash despite it being ubiquitous. Personally I would like to see a completely reimagined take on the shell.
[+] choeger|6 years ago|reply
I would argue that this case shows that the shell(s) have a problem with a feature being enabled by default. In hindsight it might have been smarter to put globbing expressions in special quotes than the other way round.
[+] fit2rule|6 years ago|reply
For me, Silver Searcher often fails to find matches in files that 'grep' will, easily. I've not been able to determine why this is, but it happens often enough that I consider SS to be highly unreliable, even if its fast.

So, it seems that there are still some issues to be sorted here.

[+] twic|6 years ago|reply
I often set the nullglob option in scripts, because it makes the handling of globs which don't match anything a bit more predictable:

http://bash.cumulonim.biz/NullGlob.html

There's a note at the end about how with nullglob set, ls on a glob with no matches does something surprising. This is a great illustration of how an empty list and the absence of a list are different. Sadly it's rather hard to make that distinction in shells!

I do wish that either shells had a more explicit syntax for globbing, or other commands didn't use the same syntax for patterns. Then confusion like this couldn't occur. An example of the former would be if you had to write:

  ls $(glob *.txt)
Here, the shell would not treat * specially, but rather you would have to explicitly expand it. This would be a pain, but at least you wouldn't do it by mistake!
[+] johncs|6 years ago|reply
I set failglob: `shopt -s failglob`. Makes the whole command fail if there's no matches. That combined with `set -e` which aborts the script in the event of any command failing makes me feel somewhat safe.

Indeed I add the following two lines to every bash script I write:

    set -exu
    shopt -s failglob
[+] chubot|6 years ago|reply
Yes, Oil (https://oilshell.org/) now has nullglob on when you run bin/oil instead of bin/osh.

So you get this:

    oil$ find . -name *.jpg
    find: missing argument to `-name'

    oil$ find . -name '*.jpg'
    (works)
Details: I had it on when you set 'shopt -s strict:all', but I neglected to turn it on for 'oil:basic' and 'oil:all'. Those are oil-specific option groups so you don't have to remember all the option names.

But I just fixed that and it will be out with the next Oil release.

https://github.com/oilshell/oil/commit/ddac119254f9a7045dca7...

If anyone wants to add more strictness options to Oil to avoid these types of mistakes, let us know! (e.g. on https://github.com/oilshell/oil, or there's more contact info on the home page).

[+] simias|6 years ago|reply
I don't know if it's the default or if it's something I've set in my config a decade or two ago but my zsh behaves like this by default (i.e. I get an error for blobs that match nothing instead of silently passing the * along). That seems much saner to me:

    $ find . -name *.txt
    zsh: no matches found: *.txt
That'll teach you to quote your `find` patterns real fast...
[+] nothrabannosir|6 years ago|reply
Tangential: another safeguard you can adopt is avoiding "hard delete" commands like rm and find -delete. Untrain yourself from these commands by never using them. On Mac systems, the "trash" program (brew install trash) sends files to your system trash. You can use `trash [file]` and `find .. -print0 | xargs -0 trash --`. rm is a dangerous command you should only very rarely be using.

I fish something out of the trash a few times a year and lemme tell ya; it's worth the investment.

Another tip if you fancy debugging shells is using

  python -c "print(__import__('sys').argv[1:])" sample "arg here" * foo
this provides the same functionality as the c program in TFA, without needing gcc.
[+] Cpoll|6 years ago|reply
I learned from a sysadmin to use `mv` instead of `rm` when you're removing potentially critical files.

Other tips include colouring your prompt highlighted red on production boxes so you never accidentally think you're somewhere safe.

Nowadays I try to avoid SSHing into mission critical machines though.

[+] icebraining|6 years ago|reply
Trash is the poor man's backup system.
[+] rezonant|6 years ago|reply
> this provides the same functionality as the c program in TFA, without needing gcc.

or just using bash:

function showargs() { for x in "$@"; do echo "arg: $x"; done }

[+] heinrichhartman|6 years ago|reply
No need for python, this works just as good:

    printf ":%s:" *foo
[+] tzs|6 years ago|reply
For a long time (probably since the first time I forgot to quote something and got burned, so around 40 years), I've thought that there should be some mechanism for the shell to pass in information about how each argument came about.

For each argument, it would tell the program if it was supplied directly, or came from wildcard expansion. For those from wildcard expansion, it would tell the program what the wildcard was.

Most programs would not care, but some programs could use this to catch common quoting errors.

[+] kalium_xyz|6 years ago|reply
The unix haters guide lists the shell handling the wildcard like it does as a major flaw with unix shells.
[+] jolmg|6 years ago|reply
> For those from wildcard expansion, it would tell the program what the wildcard was.

Different shells have different globbing mechanisms. Why should all programs tie themselves to the mechanisms of any one particular shell?

The simpler the calling mechanism for executables, the simpler it is to write them in any existing or future language. This also gives more flexibility to future shells.

UNIX is pretty much designed thinking of users as programmers. Making it easy to write programs building on other programs is as important as being able to call them. With that in mind, I don't think it's a good compromise to complicate the writing of executables in order to protect users from their own mistakes.

[+] coolreader18|6 years ago|reply
I'm not sure why people are saying it would make executables harder to write; it could most easily be done with environment variables as opposed to modifying the signature of `int main()`. Something along the lines of: `GLOBLESS_ARGC=5` means that `GLOBLESS_ARG{0..4}` have the original arguments as supplied by the shell user.
[+] asdff|6 years ago|reply
it's trivial to set this up yourself by echoing expanded commands into a log file as your program runs
[+] dehrmann|6 years ago|reply
Shell (because this is technically a shell, not a find issue) is the worst language that everyone should learn. It's a language you'll actually encounter, and it's one that's hard to avoid (unlike PHP).
[+] seiferteric|6 years ago|reply
This was obvious to me, but one version of this that surprised me is when using scp. If you glob a remote destination like "scp myserver:*.jpg ./" It will probably work! But how? Because the remote path will likely not match any local files and the path with the asterisk will be passed to scp and scp will do the globbing on the remote side.
[+] tyingq|6 years ago|reply
I mostly use ssh and tar instead of scp, because I'm usually after more than one file.

Like, "ssh remote 'cd /whatever; tar -cf - someglobpattern' | tar -xvf"

[+] ulrikrasmussen|6 years ago|reply
I believe that programming languages should never make the meaning of a program depend on the context in which it is executed. So many obscure bugs are directly caused by such behaviors. There should be exactly one possible interpretation for a given statement, and if that cannot be executed, then the program should abort. In this case, the glob should never have been passed on to find. It should either have expanded to the empty array, or failed.
[+] madsbuch|6 years ago|reply
One could argue that it is merely a side effect that a shell constitutes a programming language. And one could also note that find should employ the same tradition as the shell (ie. Use /*.jpg to recourse into folders)
[+] umanwizard|6 years ago|reply
My instant reaction to the example was “that won’t work; you’re shell will say something like ‘no matches’”.

Using an unescaped star in a find command never works for me, which is a lot better than it sometimes working and sometimes breaking!

Reading the article and the comments, it seems like bash doesn’t do this? I suppose it’s one nice thing about oh-my-zsh, whose default confit I use almost unchanged.

[+] CGamesPlay|6 years ago|reply
Ditto, fish shell also provides this behavior in the default config.
[+] erikbye|6 years ago|reply
It's a default in zsh, nevermind oh-my-zsh.
[+] thundergolfer|6 years ago|reply
Putting ‘shellcheck’ in your CI pipeline is a must for me now, after one too many mistakes.

I just finished cleaning away all existing ‘error’ and ‘warning’ level issues in our codebase so that the ‘shellcheck’ CI step can be really strict on code quality.

[+] arcade79|6 years ago|reply
Subject of the post is wrong. This is a common mistake between the user and whichever shell the user is using, not the user and the command itself.

The find command works exactly as expected.

[+] thunderbong|6 years ago|reply
Phew! I'm glad I've been hitting the "Happy Case" scenario all these years!

Very useful article. And very informative.

Summary:

Instead of -

  find . -name *.jpg

Use quotes around pattern i.e.

  find . -name '*.jpg'

Edit: Oops, the double-quotes should have been single quotes! Thanks, @lucd. Happy case, like I said!
[+] lucd|6 years ago|reply
Wildcards are still expanded inside double quotes.. Yiu have to use single quotes.
[+] mynegation|6 years ago|reply
Common? Yes. Simple enough to stop doing this mistake after two times? Also yes. One you internalize in which case shell is responsible for globbing and in which case command itself, it is pretty clear cut.
[+] vikinghckr|6 years ago|reply
`find` has one of the worst user experiences out of UNIX tools. I prefer to use `find . | grep foo` to find files.
[+] StreamBright|6 years ago|reply
Isn't this in the Unix hater manual? I never use -name without ''. I guess this is just muscle memory from early on when I run into this issue that in Unix *.py can mean very different things depending on where it gets resolved.
[+] madsbuch|6 years ago|reply
That was a lot of text to explain that one should be cautious of the wildcard expansion some shells provide.

Thanks! I would have jumped right in!

[+] joana035|6 years ago|reply
The title seems a bit off since shell expansions and arguments has nothing to do with the find command.

Both features are also often covered in entry level material for introduction to shell.

[+] shmageggy|6 years ago|reply
Well it seems the root of the problem is

> * Most importantly, the 'find' command uses a different algorithm than shell globbing does when matching wildcard characters. More specifically, the find command will apply the search pattern against the base of the file name with all leading directories removed. This is contrasted from shell globbing which will expand the wildcard between each path component separately. When no path components are specified, the wildcard will match only files in the current directory.*

So there does seem to be a `find` specific issue here

[+] umanwizard|6 years ago|reply
Most commands don’t accept the shell-like wildcard `*` as part of their command syntax; find does. That’s the connection.
[+] siddharthgoel88|6 years ago|reply
It is just awesome that I stumbled upon this post. I remember previously I had faced similar issue while running a command like

  find . -name *.gradle | blah blah 
Instead of finding the root cause, I by-passed it by

  find . | grep "\.gradle" | blah blah
It just feels great to now connect the dot and know the real reason for the issue.
[+] sys_64738|6 years ago|reply
find . -name \*.jpg

This is pretty elementary which any seasoned Linux person should know.

[+] jancsika|6 years ago|reply
> You can type: man glob

Translation: you cannot type "man" followed by an asterisk because that would have required forethought in how one learns a programming language.

Argle: Hey Bargle, is that new bridge built to spec?

Bargle: It's like I always say, man: good enough for shell script manual operator discoverability.

Argle: Yeah, you're always saying that...