top | item 29845232

Why “process substitution” is a late feature in Unix shells

168 points| r4um | 4 years ago |utcc.utoronto.ca | reply

82 comments

order
[+] chubot|4 years ago|reply
Fun fact, in bash you lose the exit codes of process subs. It doesn't even wait() on those processes in many cases.

So there is no way to abort a bash script if something like <(sort nonexistent) fails.

OSH lets you opt into stricter behavior:

    $ osh -c 
    set -e
    shopt -s oil:basic

    diff <(sort left) <(sort nonexistent)
    echo "should not get here"'
    '
    sort: cannot read: nonexistent: No such file or directory

    diff <(sort left) <(sort nonexistent)
                      ^~
    [ -c flag ]:1: fatal: Exiting with status 2 (command in PID 29359)

In contrast, bash will just keep going and ignore failure.

You can also get all the exit codes with @_process_sub_status, which is analogous to PIPESTATUS.

(I should probably write a blog post about this: http://www.oilshell.org/blog/)

[+] andreyv|4 years ago|reply
> So there is no way to abort a bash script if something like <(sort nonexistent) fails.

The process ID of the last executed background command in Bash is available as $!.

  cat <(sort nonexistent)
  wait $! || echo fail
gives

  sort: cannot read: nonexistent: No such file or directory
  fail
[+] vimax|4 years ago|reply
A fun trick is using the `paste` command with process substitution to combine lines of output from different commands.

  ./cmd1
outputs:

  a
  b
  c

  ./cmd2
outputs:

  1
  2
  3

  paste <(./cmd1) <(./cmd2)
outputs:

  a 1
  b 2
  c 3

You can then use xargs to use each line as arguments for another command:

  paste <(./cmd1) <(./cmd2) | xargs -L1 ./cmd3
calls:

  ./cmd3 a 1
  ./cmd3 b 2
  ./cmd3 c 3
[+] sillysaurusx|4 years ago|reply
Should’ve been named zip. Is there an unzip?

But of course it can’t be named zip because that’s the compressor, so I’m really asking whether there’s an unpaste.

I guess the closest is awk {print $1;}, but it makes me wince every time I write it. (I already know that would give syntax errors as written here.)

[+] throwawayboise|4 years ago|reply
How do you know that the outputs a, b, and c correspond to outputs 1, 2, and 3 and whether they will always occur in the same orders. This technique seems like it's full of possibly invalid assumptions.
[+] sillysaurusx|4 years ago|reply
I love zsh’s take on process substitution:

  cat =(echo foo)
It’s almost identical to <(echo foo), but crucially it drains the output before performing the substitution. It’s like <(... | sponge), but somehow much more reliable.

I’ve used it in a few situations where process substation failed, though being old and feeble I can’t remember why it failed. But for example you can copy the file safely, knowing that it will contain the full output of the shell pipeline, unlike process substitution.

I don’t even know the name of =(...). Hmm. Sponge substitution?

[+] sillysaurusx|4 years ago|reply
Thought of an example.

vs

  cp =( ( echo foo; sleep 1; echo bar) | sponge) ) foobar
The first will complete instantly and result in a file named foobar but whose contents is either zero length or foo, depending on how unlucky you are.

The second takes one second to complete and results in a file named foobar containing foobar.

It’s strange that vanilla process substitution can’t seem to solve this at all.

[+] iamevn|4 years ago|reply
I quite like how fish does process substitution.

  foo (bar | psub)
Instead of syntax it's just a plain function.

docs: https://fishshell.com/docs/current/cmds/psub.html source of psub.fish: https://github.com/fish-shell/fish-shell/blob/master/share/f...

[+] limoce|4 years ago|reply
But fish's psub only supports file or named pipe, not the one mentioned in the post, which is described as the "the best way".
[+] zokier|4 years ago|reply
The usability of unixy shells generally falls down a cliff when you need to deal with more than just one input one output. The awkwardness in trying to shoehorn process substitution is just one of the examples of that.
[+] matheusmoreira|4 years ago|reply
Yeah. It's caused by this single standard input/output model. Even using standard error is unergonomic and leads to magic like 2>&1.

What if programs could have any number of input and output file descriptors and the numbers/names along with their contents and data types were documented in the manual? Could eliminate the need to parse data altogether. I remember Common Lisp had something similar to this.

[+] rscnt|4 years ago|reply
do you know alternatives to that? I assume PowerShell but don't know if there's anything beyond that.
[+] TorKlingberg|4 years ago|reply
Great read, but worth noting from the end of the article that "late feature" here means it was added in the early '90s. The late addition to Unix that surprises me is ssh. It was only invented in the late '90s. Before that everyone used unencrypted remote shells like telnet.

Encryption in general was in a pretty bad state in the '90s: original http was unencrypted and early mobile phone standards that are still in use have very weak encryption.

[+] zokier|4 years ago|reply
Everything was unencrypted until late 90s (and in many cases until late 00s). Email (both smtp and pop3/imap), irc, web, gopher, telnet, ftp, local disks, removable storage, network storage (smb/nfs etc), everything. Computing and the internet was much nicer place, there wasn't such an adversial attitude where everything would be broken just because its out there like today.
[+] mprovost|4 years ago|reply
Encryption is CPU-heavy and CPUs weren't nearly as fast then as they are now. Unix was developed on systems like a VAX which could do 1 MIPS (millions of instructions per second). For comparison an M1 chip can do about 10 trillion instructions per second. It just wasn't possible to encrypt data in real time like it is now.
[+] usr1106|4 years ago|reply
Before Eternal September the Internet was trustworthy place. And criminals had not found it yet.
[+] teddyh|4 years ago|reply
Telnet has encryption options, it just wasn't widely implemented.
[+] shuffel|4 years ago|reply
For whatever reason I can never remember the syntax of <(command) and end up "rediscovering" it every year. It's seldom used but when it's needed it's rather elegant.

Another somewhat related useful bash feature is using this form:

    wc <<< 'a b c' 
instead of:

    echo 'a b c' | wc
[+] JNRowe|4 years ago|reply
With the caveat that the here string causes a tempfile to be written¹, so they're not quite equivalent. How much that matters for your use cases though is a different question, but it is worth thinking about if you're doing lots of repeated calls.

¹ With Bash v5 it may use a pipe if the data is small enough, but you can't guarantee people will have that because of GPLv3 phobia. I believe it is always a tempfile with zsh.

[+] veltas|4 years ago|reply
Didn't even know about process substitution, had been using fifo's to achieve this!
[+] ogogmad|4 years ago|reply
Fish tries to use FIFOs to emulate process substitution, and it leads to deadlock. Not sure why.

By default, Fish actually runs the processes in a strict sequence. But this is to avoid the above deadlock situation. And it therefore isn't process substitution.

[+] kazinator|4 years ago|reply
The claim is false; process substitution can be cobbed together with named fifos,* and those are "ancient".

Only problem that those are temporary objects that have to be created in the file system, and cleaned up.

However, temporary objects (files, no fifos) are also used in here doc implementations.

Process substitution is a late feature simply because the creativity juice in Unix (tm) dried up some time before the middle 1990's, leaving the FOSS reimplementations of Unix to carry the development torch.

Those projects had to balance among other goals like quality/robustness and compatibility.

(If we look at the quality of the FOSS tools compared to the Unix originals, we could also remark that "quality and robustness was late in coming to Unix". But we equivocate on Unix, because GNU stands for GNU is Not Unix!)

Features appearing in FOSS utilities like GNU Bash take time to make into Unix (tm).

Process substitution is not yet in the standard, therefore it is not in in fact in Unix (tm).

Shell scripting is a conservative activity. The language isn't very good and so improving it is like kicking a dead horse in some ways; the most important matter in any new shell release is that old scripts keep working. (Like configuration scripts for the build systems of nicer languages).

---

* See GNU Bash manual: https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash....: " Process substitution is supported on systems that support named pipes (FIFOs) or the /dev/fd method of naming open files. "

[+] xeromal|4 years ago|reply
Yeah, but Saget could still have had a damaged heart from it regardless of Clapton's experience.
[+] renewiltord|4 years ago|reply
In practice I end up caching the output often. I have used process substitution but the iteration process feels more useful to me if I've slowly built up data and I can inspect the internal pieces each time and reuse them in different ways.

But I can see if it's relatively fast. I like it. I just don't end up using it often.

[+] errcorrectcode|4 years ago|reply
0. Process substitution is a potential DoS vector as it could take up all of RAM and/or disk space.

1. Also, not all commands are compatible with it, especially if they need rewinding or reopening. diff has issues with using it for both arguments often. It's likely the use of memory mapped files, but I could be wrong.

2. Shells ought to implement a flag for process substitution to allow temporary files to reside on disk for the lifetime of the command line. This way, it can operate on extremely large files.

[+] jakub_g|4 years ago|reply
An unfortunate thing is that process substitution does not work in git bash on Windows. (at least it was the case last time I tested; googling around I found a random comment in random github repo saying it's been fixed in 2.25 but I don't have a laptop handy to test it now).
[+] FedericoRazzoli|4 years ago|reply
This is a great explanation! I wonder many times why we had to play with obscure xargs instead of being able to pipe a command output to an argument.
[+] unixhero|4 years ago|reply
Which is the best she'll, fish or zsh?//
[+] yjftsjthsd-h|4 years ago|reply
There is no possible way to usefully answer that question as given; every shell has its own advantages and disadvantages. Plain bourne is universal but bare-bones, bash is mostly ubiquitous on Linux, zsh is powerful but just different enough to occasionally bite you, fish is very user friendly but doesn't even try to be compatible with anything else, ksh is a nice option and is built-in on BSDs, dash sucks for interactive work but is great for running scripts...
[+] errcorrectcode|4 years ago|reply
When everything is both nails and screws, a hammer isn't the best tool for everything (unless it has a screwdriver on the end and the hammer part is big enough to be actually useful as a hammer).
[+] nmz|4 years ago|reply
tcl