There is general wisdom about bash pipelines here that I think most people will miss simply because of the title. Interesting though, my mental model of bash piping was wrong too.
There were several reasons why pipes were added to Unix, and the ability to run producer/consumer processes concurrently was one of them. Before that (and for many years after on non-Unix systems) indeed the most prevalent paradigm were to run multi-stage pipelines with the moral equivalent of the following:
stage1.exe /in:input.dat /out:stage1.dat
stage2.exe /in:stage1.dat /out:stage2.dat
del stage1.dat
stage3.exe /in:stage2.dat /out:result.dat
del stage2.dat
Pipes are so useful. I find myself more and more using shell script and pipes for complex multi-stage tasks. This also simplifies any non-shell code I must write, as there are already high quality, performant implementations of hashing and compression algorithms I can just pipe to.
Sometimes you want the intermediate files as well, though. For example, if doing some kind of exploratory analysis of the different output stages of the pipeline, or even just for debugging.
Tee can be useful for that. Maybe pv (pipe viewer) too. I have not tried it yet.
SPONGE(1) moreutils SPONGE(1)
NAME
sponge - soak up standard input and write to a file
SYNOPSIS
sed '...' file | grep '...' | sponge [-a] file
DESCRIPTION
sponge reads standard input and writes it out to the specified file.
Unlike a shell redirect, sponge soaks up all its input before writing
the output file. This allows constructing pipelines that read from and
write to the same file.
Usually mental models develop "organically" from when one was a n00b, without much thought, and sometimes it can take a long time for them to be unseated, even though it's kind of obvious in hindsight that the mental model is wrong (e.g. one can see that from "slow-program | less", and things like that).
Can’t speak for OP, but one might reasonably expect later stages to only start execution once at least some data is available—rather than immediately, before any data is available for them to consume.
Of course, there many reasons you wouldn’t want this—processes can take time to start up, for example—but it’s not an unreasonable mental model.
I know this about Unix pipes from a very long time. Whenever they are introduced it is always said, but I guess people can miss it.
Though now I will break your mind as my mind was broken not a long time ago. Powershell, which is often said to be a better shell, works like that. It doesn't run things in parallel. I think the same is to be said about Windows cmd/batch, but don't cite me on that. That one thing makes Powershell insufficient to ever be a full replacement of a proper shell.
Joker_vD|1 year ago
jakogut|1 year ago
fuzztester|1 year ago
Tee can be useful for that. Maybe pv (pipe viewer) too. I have not tried it yet.
alas44|1 year ago
adql|1 year ago
m000|1 year ago
arp242|1 year ago
zamfi|1 year ago
Of course, there many reasons you wouldn’t want this—processes can take time to start up, for example—but it’s not an unreasonable mental model.
hawski|1 year ago
Though now I will break your mind as my mind was broken not a long time ago. Powershell, which is often said to be a better shell, works like that. It doesn't run things in parallel. I think the same is to be said about Windows cmd/batch, but don't cite me on that. That one thing makes Powershell insufficient to ever be a full replacement of a proper shell.
shiomiru|1 year ago
lylejantzi3rd|1 year ago