Since the blog author is commenting here, you have this statement part way down your blog:
> That is, grep doesn't support an analogous -0 flag.
However, the GNU grep variant does have an analogous flag:
-z, --null-data
Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline. Like the -Z or --null option, this option can be used with commands like sort -z to process arbitrary file names.
Ah cool, I didn't know that! I'll update the blog post. (What a cacophony of flags)
Edit: It seems that grep -0 isn't taken for something else and they should have used it for consistency? The man page says it's meant to be used with find -print0, xargs -0, perl -0, and sort -z (another inconsistency)
Also for the `while` enthusiasts, here's how you zip the output of two processes in bash:
paste -d \\n <(do_something1) <(do_something2) | while read -r var1 && read -r var2; do
... # var1 comes from do_something1, var2 comes from do_something2
done
For thousands of arguments this sloution is much slower (high CPU usage) than xargs, because either it implements the logic as a shell script (slow) or it runs an external program for each argument (slow).
If you need more visibility into long running processes, pueue is another alternative. You can of course use `xargs -P1 pueue add ./process_file.sh` to add the jobs in the first place. Sends a job to pueued, returns immediately. Great for re-encoding dozens of videos. For jobs that aren’t already multi-core, set the queue parallelism with pueue, after you’ve seen your cpu is under-utilised.
Obviously downside to the visibility and dynamism is that it redirects stdout. You can read it back later, in order. But it’s not there for continued processing immediately.
(author here) Hm I don't see either of these points because:
GNU xargs has --verbose which logs every command. Does that not do what you want? (Maybe I should mention its existence in the post)
xargs -P can do everything GNU parallel do, which I mention in the post. Any counterexamples? GNU parallel is a very ugly DSL IMO, and I don't see what it adds.
--
edit: Logging can also be done with by recursively invoking shell functions that log with the $0 Dispatch Pattern, explained in the post. I don't see a need for another tool; this is the Unix philosophy and compositionality of shell at work :)
I'm surprised the links don't mention find. The -print0 flag makes it safe for crazy filenames, which pairs with the xargs -0 flag, or the perl -0 flag, etc. And you have -maxdepth if you don't want it to trawl.
This is only tangentially related, but after all the posts here the last few days about thought terminating cliches, I can’t help but reflect on the “X considered harmful” title cliche
Is it thought terminating, though? "X considered harmful" seems more intended to spark discussion in an intentionally inflammatory way than to stifle it.
(In any case, this surely is tangential, since the title is not "X considered harmful" for any value of X—at best it comments on a post by that title, as, indeed, you are doing.)
I've been thinking about titles, and it's hard to make a good one that doesn't look like a total cliché. "X considered harmful", "an opinionated guide to X", some kind of joke or reference, what could be a collection of tags (X, Y and Z), "things I have learned doing X", etc.
Of xargs, for, and while, I have limited myself to while. It's more typing everytime but saves me from having to remember so many quirks of each command.
cat input.file | ... | while read -r unit; do <cmd> ${unit}; done | ...
between 'while read -r unit' and 'while IFS= read -r unit' I can probably handle 90% of the cases. (maybe I should always use IFS since I tend to forget the proper way to use it).
That way will bite you when the tasks in question are cheaper than fork+exec. There was a thread just the other day in which folks were creating 8 million empty files with a bash loop over touch. But it's 60X faster (really, I measured) to use xargs, which will do batches (and parallelism if you tell it to).
I always wonder why something like xargs is not a shell built-in. It's such a common pattern, but I dread formulating the correct incantation every time.
I was happy to read that the author comes to the same conclusion and proposes an `each` builtin (albeit only for the Oil shell)! Like that there is no need to learn another mini language as pointed out.
If you're a zsh user it offers a version of something like xargs in zargs¹. As the documentation shows it can be really quite powerful in part because of zsh's excellent globbing facilities, and I think without that support it wouldn't be all that useful as a built-in.
I'd also perhaps argue that the reason we don't want xargs to be a built-in is precisely because of zargs and the point in your second paragraph. If it was built-in it would no doubt be obscenely different in each shell, and five decades later a standard that no one follows would eventually specify its behaviour ;)
> -n instead of -L (to avoid an ad hoc data language)
Apparently GNU xargs is missing it, but BSD xargs has -J, which is a `-I` which works with `-n`: with `-I` each replstr gets replaced by one of the inputs, with `-J` the replstr gets replaced by the entire batch (as determined by `-n`).
I’m unconvinced by the post OP was responding to. It’s a utility, it provides some means to get things done. *nix provides many means of parsing text and running commands, each have their idioms based on their own axioms. It seems as if a composer is lambasting the clarinet because they don’t care for its fingerings. I’ve only used xargs sparingly, can somebody enlighten me as to why it’s bad, aside from the fact that there are other ways to do some things it does?
I wish this was the default behavior of xargs (the 'tr \\n \\0 | xargs -0' bit). I don't know why xargs splits on spaces and tabs as well as newlines by default and doesn't even have a flag to just split on lines.
Ok filenames can theoretically have newlines in them but I'd be happy to deal with that weird case. I can't recall ever having encountered it in years of using bash on various systems.
Shell pipes would then orthogonally provide the stuff like substitution that xargs does in it's own unique way (that I just can't be bothered learning) - instead you'd just pipe the find output through sed or 'grep -v' or whatever you wanted before piping into xargs.
I guess that's what aliases but I'm too lazy anymore to bother with configuring often short-lived systems all the time.
I'm not sure I like the `$1` and shell function pattern. It might avoid the -I minilanguage, but at the cost of "being clever" in a way that takes a minute to wrap your head around. It's a neat trick, but I don't think it would be easy to understand if you are reading the code for the first time.
I find that using the example of `rm` to discuss whether to pick `find -exec` or `find | xargs` rather strange, given the existence of `find -delete`. Maybe pick a different example operation to automate.
[+] [-] pwg|4 years ago|reply
> That is, grep doesn't support an analogous -0 flag.
However, the GNU grep variant does have an analogous flag:
-z, --null-data
Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline. Like the -Z or --null option, this option can be used with commands like sort -z to process arbitrary file names.
[+] [-] chubot|4 years ago|reply
Edit: It seems that grep -0 isn't taken for something else and they should have used it for consistency? The man page says it's meant to be used with find -print0, xargs -0, perl -0, and sort -z (another inconsistency)
[+] [-] kazinator|4 years ago|reply
It is quite necessary, because you cannot pass an arbitrarily large command line or environment in exec system calls.
Of course, this doesn't have the problem requiring -0 because we're not reading textual lines from standard input, but working with lists of strings.
[+] [-] fiddlerwoaroof|4 years ago|reply
[+] [-] aaaaaaaaaaab|4 years ago|reply
1. You don't need the parentheses.
2. If you use process substitution [1] instead of a pipe, you will stay in the same process and can modify variables of the enclosing scope:
The drawback is that this way `do_something` has to come after `done`, but that's bash for you ¯\_(ツ)_/¯[1] https://www.gnu.org/software/bash/manual/html_node/Process-S...
[+] [-] aaaaaaaaaaab|4 years ago|reply
[+] [-] ptspts|4 years ago|reply
[+] [-] tomcam|4 years ago|reply
[+] [-] thayne|4 years ago|reply
[+] [-] WhatIsDukkha|4 years ago|reply
https://www.gnu.org/software/parallel/parallel_alternatives....
parallel is probably on the complex side but its also been actively developed, bugfixed and had a lot of road miles from large computing users.
[+] [-] chubot|4 years ago|reply
What does it do that xargs and shell can't? (honest question)
[+] [-] orhmeh09|4 years ago|reply
[+] [-] cormacrelf|4 years ago|reply
Obviously downside to the visibility and dynamism is that it redirects stdout. You can read it back later, in order. But it’s not there for continued processing immediately.
[+] [-] senkora|4 years ago|reply
[+] [-] 0xdeadb00f|4 years ago|reply
[+] [-] jordemort|4 years ago|reply
[+] [-] thrwyexecbrain|4 years ago|reply
[+] [-] westurner|4 years ago|reply
It turns out that e.g. -print0 and -0 are the only safe way: line endings aren't escaped:
GNU Parallel is a much better tool: https://en.wikipedia.org/wiki/GNU_parallel[+] [-] chubot|4 years ago|reply
GNU xargs has --verbose which logs every command. Does that not do what you want? (Maybe I should mention its existence in the post)
xargs -P can do everything GNU parallel do, which I mention in the post. Any counterexamples? GNU parallel is a very ugly DSL IMO, and I don't see what it adds.
--
edit: Logging can also be done with by recursively invoking shell functions that log with the $0 Dispatch Pattern, explained in the post. I don't see a need for another tool; this is the Unix philosophy and compositionality of shell at work :)
[+] [-] LeoPanthera|4 years ago|reply
[+] [-] l0b0|4 years ago|reply
[1] https://mywiki.wooledge.org/ParsingLs
[2] https://unix.stackexchange.com/q/128985/3645
[+] [-] tyingq|4 years ago|reply
[+] [-] legobmw99|4 years ago|reply
[+] [-] abetusk|4 years ago|reply
[0] https://meyerweb.com/eric/comment/chech.html
[+] [-] JadeNB|4 years ago|reply
(In any case, this surely is tangential, since the title is not "X considered harmful" for any value of X—at best it comments on a post by that title, as, indeed, you are doing.)
[+] [-] Zababa|4 years ago|reply
[+] [-] phone8675309|4 years ago|reply
[+] [-] MichaelGroves|4 years ago|reply
[+] [-] yudlejoza|4 years ago|reply
[+] [-] scottlamb|4 years ago|reply
https://news.ycombinator.com/item?id=28192946
[+] [-] patrickdavey|4 years ago|reply
I suspect I'll really like your way of doing things, but an example would be very handy.
[+] [-] unknown|4 years ago|reply
[deleted]
[+] [-] HMH|4 years ago|reply
I was happy to read that the author comes to the same conclusion and proposes an `each` builtin (albeit only for the Oil shell)! Like that there is no need to learn another mini language as pointed out.
[+] [-] JNRowe|4 years ago|reply
I'd also perhaps argue that the reason we don't want xargs to be a built-in is precisely because of zargs and the point in your second paragraph. If it was built-in it would no doubt be obscenely different in each shell, and five decades later a standard that no one follows would eventually specify its behaviour ;)
¹ https://zsh.sourceforge.io/Doc/Release/User-Contributions.ht... - search for "zargs", it has no anchor. Sorry.
[+] [-] masklinn|4 years ago|reply
> -n instead of -L (to avoid an ad hoc data language)
Apparently GNU xargs is missing it, but BSD xargs has -J, which is a `-I` which works with `-n`: with `-I` each replstr gets replaced by one of the inputs, with `-J` the replstr gets replaced by the entire batch (as determined by `-n`).
[+] [-] pgtan|4 years ago|reply
https://www.ibm.com/docs/en/aix/7.2?topic=apply-command
[+] [-] 2OEH8eoCRo0|4 years ago|reply
[+] [-] reilly3000|4 years ago|reply
[+] [-] michaelcampbell|4 years ago|reply
I use variations on this all the time; pause while load is high, pause while 'x' or more things are running, sleep between invocations, etc.
It may not be as convenient for some cases, but "can't do that..." is not quite correct either.
The post is starting to feel like a hammer/nail argument, IMO.
[+] [-] aaaaaaaaaaab|4 years ago|reply
[+] [-] derriz|4 years ago|reply
Ok filenames can theoretically have newlines in them but I'd be happy to deal with that weird case. I can't recall ever having encountered it in years of using bash on various systems.
Shell pipes would then orthogonally provide the stuff like substitution that xargs does in it's own unique way (that I just can't be bothered learning) - instead you'd just pipe the find output through sed or 'grep -v' or whatever you wanted before piping into xargs.
I guess that's what aliases but I'm too lazy anymore to bother with configuring often short-lived systems all the time.
[+] [-] thayne|4 years ago|reply
[+] [-] Karellen|4 years ago|reply