top | item 28258189

An Opinionated Guide to Xargs

402 points| todsacerdoti | 4 years ago |oilshell.org | reply

130 comments

order
[+] pwg|4 years ago|reply
Since the blog author is commenting here, you have this statement part way down your blog:

> That is, grep doesn't support an analogous -0 flag.

However, the GNU grep variant does have an analogous flag:

-z, --null-data

Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline. Like the -Z or --null option, this option can be used with commands like sort -z to process arbitrary file names.

[+] chubot|4 years ago|reply
Ah cool, I didn't know that! I'll update the blog post. (What a cacophony of flags)

Edit: It seems that grep -0 isn't taken for something else and they should have used it for consistency? The man page says it's meant to be used with find -print0, xargs -0, perl -0, and sort -z (another inconsistency)

[+] kazinator|4 years ago|reply
In 2002, I implemented xargs in Lisp, in the Meta-CVS project.

It is quite necessary, because you cannot pass an arbitrarily large command line or environment in exec system calls.

Of course, this doesn't have the problem requiring -0 because we're not reading textual lines from standard input, but working with lists of strings.

  ;;; This source file is part of the Meta-CVS program,
  ;;; which is distributed under the GNU license.
  ;;; Copyright 2002 Kaz Kylheku

  (in-package :meta-cvs)

  (defconstant *argument-limit* (* 64 1024))

  (defun execute-program-xargs (fixed-args &optional extra-args fixed-trail-args)
    (let* ((fixed-size (reduce #'(lambda (x y)
                                   (+ x (length y) 1))
                               (append fixed-args fixed-trail-args)
                               :initial-value 0))
           (size fixed-size))
      (if extra-args
        (let ((chopped-arg ())
              (combined-status t))
          (dolist (arg extra-args)
            (push arg chopped-arg)
            (when (> (incf size (1+ (length arg))) *argument-limit*)
              (setf combined-status
                    (and combined-status
                         (execute-program (append fixed-args
                                                  (nreverse chopped-arg)
                                                  fixed-trail-args))))
              (setf chopped-arg nil)
              (setf size fixed-size)))
          (when chopped-arg
            (execute-program (append fixed-args (nreverse chopped-arg)
                                     fixed-trail-args)))
          combined-status)
        (execute-program (append fixed-args fixed-trail-args)))))
[+] fiddlerwoaroof|4 years ago|reply
I frequently find myself reaching for this pattern instead of xargs:

    do_something | ( while read -r v; do
    . . .
    done )
I’ve found that it has fewer edge cases (except it creates a subshell, which can be avoided in some shells by using braces instead of parens)
[+] aaaaaaaaaaab|4 years ago|reply
Some additional tips:

1. You don't need the parentheses.

2. If you use process substitution [1] instead of a pipe, you will stay in the same process and can modify variables of the enclosing scope:

    i=0
    while read -r v; do
        ...
        i=$(( i + 1))
    done < <(do_something)
The drawback is that this way `do_something` has to come after `done`, but that's bash for you ¯\_(ツ)_/¯

[1] https://www.gnu.org/software/bash/manual/html_node/Process-S...

[+] aaaaaaaaaaab|4 years ago|reply
Also for the `while` enthusiasts, here's how you zip the output of two processes in bash:

    paste -d \\n <(do_something1) <(do_something2) | while read -r var1 && read -r var2; do
        ... # var1 comes from do_something1, var2 comes from do_something2
    done
[+] ptspts|4 years ago|reply
For thousands of arguments this sloution is much slower (high CPU usage) than xargs, because either it implements the logic as a shell script (slow) or it runs an external program for each argument (slow).
[+] tomcam|4 years ago|reply
Thank you. Your comment coalesced a number of things in my mind that I hadn’t grasped properly as a UNIX midwit, especially the braces thing.
[+] thayne|4 years ago|reply
creating a subshell can lead to some surprising behavior if you aren't careful though.
[+] WhatIsDukkha|4 years ago|reply
I tend to reach for gnu parallel instead of xargs -

https://www.gnu.org/software/parallel/parallel_alternatives....

parallel is probably on the complex side but its also been actively developed, bugfixed and had a lot of road miles from large computing users.

[+] orhmeh09|4 years ago|reply
The nagware prompts of parallel are so objectionable that I will do a lot of things to avoid using it at all. So pretentious!
[+] cormacrelf|4 years ago|reply
If you need more visibility into long running processes, pueue is another alternative. You can of course use `xargs -P1 pueue add ./process_file.sh` to add the jobs in the first place. Sends a job to pueued, returns immediately. Great for re-encoding dozens of videos. For jobs that aren’t already multi-core, set the queue parallelism with pueue, after you’ve seen your cpu is under-utilised.

Obviously downside to the visibility and dynamism is that it redirects stdout. You can read it back later, in order. But it’s not there for continued processing immediately.

[+] senkora|4 years ago|reply
I always think of xargs as the inverse of echo. echo converts arguments to text streams, and xargs converts text streams to arguments.
[+] 0xdeadb00f|4 years ago|reply
That's a pretty neat way of thinking about it!
[+] jordemort|4 years ago|reply
I appreciate this. If I wrote my own opinionated guide to xargs, it would be a single profane sentence.
[+] thrwyexecbrain|4 years ago|reply
In Bash (not every shell supports this) functions can be exported, which enables this nice pattern with xargs:

    myfunc() {
        printf " %s" "I got these arguments:" "$@" $'\n'
    }
    export -f myfunc
    seq 6 | xargs -n2 bash -c 'myfunc "$@"' "$0"
[+] westurner|4 years ago|reply
Wanting verbose logging from xargs, years ago I wrote a script called `el` (edit lines) that basically does `xargs -0` with logging. https://github.com/westurner/dotfiles/blob/develop/scripts/e...

It turns out that e.g. -print0 and -0 are the only safe way: line endings aren't escaped:

    find . -type f -print0 | el -0 --each -x echo
GNU Parallel is a much better tool: https://en.wikipedia.org/wiki/GNU_parallel
[+] chubot|4 years ago|reply
(author here) Hm I don't see either of these points because:

GNU xargs has --verbose which logs every command. Does that not do what you want? (Maybe I should mention its existence in the post)

xargs -P can do everything GNU parallel do, which I mention in the post. Any counterexamples? GNU parallel is a very ugly DSL IMO, and I don't see what it adds.

--

edit: Logging can also be done with by recursively invoking shell functions that log with the $0 Dispatch Pattern, explained in the post. I don't see a need for another tool; this is the Unix philosophy and compositionality of shell at work :)

[+] LeoPanthera|4 years ago|reply
Yeah but xargs doesn't refuse to run until I have agreed to a EULA stating I will cite it in my next academic paper.
[+] legobmw99|4 years ago|reply
This is only tangentially related, but after all the posts here the last few days about thought terminating cliches, I can’t help but reflect on the “X considered harmful” title cliche
[+] JadeNB|4 years ago|reply
Is it thought terminating, though? "X considered harmful" seems more intended to spark discussion in an intentionally inflammatory way than to stifle it.

(In any case, this surely is tangential, since the title is not "X considered harmful" for any value of X—at best it comments on a post by that title, as, indeed, you are doing.)

[+] Zababa|4 years ago|reply
I've been thinking about titles, and it's hard to make a good one that doesn't look like a total cliché. "X considered harmful", "an opinionated guide to X", some kind of joke or reference, what could be a collection of tags (X, Y and Z), "things I have learned doing X", etc.
[+] phone8675309|4 years ago|reply
What every X should know about Y, an opinionated take on Z considered harmful
[+] MichaelGroves|4 years ago|reply
Would you say the title terminated your consideration of the article?
[+] yudlejoza|4 years ago|reply
Of xargs, for, and while, I have limited myself to while. It's more typing everytime but saves me from having to remember so many quirks of each command.

    cat input.file | ... | while read -r unit; do <cmd> ${unit}; done | ...
between 'while read -r unit' and 'while IFS= read -r unit' I can probably handle 90% of the cases. (maybe I should always use IFS since I tend to forget the proper way to use it).
[+] scottlamb|4 years ago|reply
That way will bite you when the tasks in question are cheaper than fork+exec. There was a thread just the other day in which folks were creating 8 million empty files with a bash loop over touch. But it's 60X faster (really, I measured) to use xargs, which will do batches (and parallelism if you tell it to).

https://news.ycombinator.com/item?id=28192946

[+] patrickdavey|4 years ago|reply
Would you mind expanding with a couple of examples? (E.g. using "foo bar" as a single line or split by whitespace).

I suspect I'll really like your way of doing things, but an example would be very handy.

[+] HMH|4 years ago|reply
I always wonder why something like xargs is not a shell built-in. It's such a common pattern, but I dread formulating the correct incantation every time.

I was happy to read that the author comes to the same conclusion and proposes an `each` builtin (albeit only for the Oil shell)! Like that there is no need to learn another mini language as pointed out.

[+] JNRowe|4 years ago|reply
If you're a zsh user it offers a version of something like xargs in zargs¹. As the documentation shows it can be really quite powerful in part because of zsh's excellent globbing facilities, and I think without that support it wouldn't be all that useful as a built-in.

I'd also perhaps argue that the reason we don't want xargs to be a built-in is precisely because of zargs and the point in your second paragraph. If it was built-in it would no doubt be obscenely different in each shell, and five decades later a standard that no one follows would eventually specify its behaviour ;)

¹ https://zsh.sourceforge.io/Doc/Release/User-Contributions.ht... - search for "zargs", it has no anchor. Sorry.

[+] masklinn|4 years ago|reply
> Shell functions and $1, instead of xargs -I {}

> -n instead of -L (to avoid an ad hoc data language)

Apparently GNU xargs is missing it, but BSD xargs has -J, which is a `-I` which works with `-n`: with `-I` each replstr gets replaced by one of the inputs, with `-J` the replstr gets replaced by the entire batch (as determined by `-n`).

[+] reilly3000|4 years ago|reply
I’m unconvinced by the post OP was responding to. It’s a utility, it provides some means to get things done. *nix provides many means of parsing text and running commands, each have their idioms based on their own axioms. It seems as if a composer is lambasting the clarinet because they don’t care for its fingerings. I’ve only used xargs sparingly, can somebody enlighten me as to why it’s bad, aside from the fact that there are other ways to do some things it does?
[+] michaelcampbell|4 years ago|reply
> I've used -P 32 to make day-long jobs take an hour! You can't do that with a for loop.

    for file in *; do
      command_using_file &
    done
    wait
?

I use variations on this all the time; pause while load is high, pause while 'x' or more things are running, sleep between invocations, etc.

It may not be as convenient for some cases, but "can't do that..." is not quite correct either.

The post is starting to feel like a hammer/nail argument, IMO.

[+] aaaaaaaaaaab|4 years ago|reply
I would recommend using -0 instead of -d, as the latter is not supported on BSD (and macOS) xargs:

    do_something | tr \\n \\0 | xargs -0 ...
[+] derriz|4 years ago|reply
I wish this was the default behavior of xargs (the 'tr \\n \\0 | xargs -0' bit). I don't know why xargs splits on spaces and tabs as well as newlines by default and doesn't even have a flag to just split on lines.

Ok filenames can theoretically have newlines in them but I'd be happy to deal with that weird case. I can't recall ever having encountered it in years of using bash on various systems.

Shell pipes would then orthogonally provide the stuff like substitution that xargs does in it's own unique way (that I just can't be bothered learning) - instead you'd just pipe the find output through sed or 'grep -v' or whatever you wanted before piping into xargs.

I guess that's what aliases but I'm too lazy anymore to bother with configuring often short-lived systems all the time.

[+] thayne|4 years ago|reply
I'm not sure I like the `$1` and shell function pattern. It might avoid the -I minilanguage, but at the cost of "being clever" in a way that takes a minute to wrap your head around. It's a neat trick, but I don't think it would be easy to understand if you are reading the code for the first time.
[+] Karellen|4 years ago|reply
I find that using the example of `rm` to discuss whether to pick `find -exec` or `find | xargs` rather strange, given the existence of `find -delete`. Maybe pick a different example operation to automate.