top | item 39765415

(no title)

aforty | 1 year ago

There is general wisdom about bash pipelines here that I think most people will miss simply because of the title. Interesting though, my mental model of bash piping was wrong too.

discuss

Joker_vD|1 year ago

There were several reasons why pipes were added to Unix, and the ability to run producer/consumer processes concurrently was one of them. Before that (and for many years after on non-Unix systems) indeed the most prevalent paradigm were to run multi-stage pipelines with the moral equivalent of the following:

    stage1.exe /in:input.dat /out:stage1.dat
    stage2.exe /in:stage1.dat /out:stage2.dat
    del stage1.dat
    stage3.exe /in:stage2.dat /out:result.dat
    del stage2.dat

jakogut|1 year ago

Pipes are so useful. I find myself more and more using shell script and pipes for complex multi-stage tasks. This also simplifies any non-shell code I must write, as there are already high quality, performant implementations of hashing and compression algorithms I can just pipe to.

fuzztester|1 year ago

Sometimes you want the intermediate files as well, though. For example, if doing some kind of exploratory analysis of the different output stages of the pipeline, or even just for debugging.

Tee can be useful for that. Maybe pv (pipe viewer) too. I have not tried it yet.

alas44|1 year ago

We are two!

adql|1 year ago

...how ? It's called pipe, not "infinitely large buffer that will wait indefintely till the command ends to pass its output further"

m000|1 year ago

That is called a sponge!

  SPONGE(1)                          moreutils                         SPONGE(1)

  NAME
         sponge - soak up standard input and write to a file

  SYNOPSIS
         sed '...' file | grep '...' | sponge [-a] file

  DESCRIPTION
         sponge reads standard input and writes it out to the specified file.
         Unlike a shell redirect, sponge soaks up all its input before writing
         the output file. This allows constructing pipelines that read from and
         write to the same file.

arp242|1 year ago

Usually mental models develop "organically" from when one was a n00b, without much thought, and sometimes it can take a long time for them to be unseated, even though it's kind of obvious in hindsight that the mental model is wrong (e.g. one can see that from "slow-program | less", and things like that).

zamfi|1 year ago

Can’t speak for OP, but one might reasonably expect later stages to only start execution once at least some data is available—rather than immediately, before any data is available for them to consume.

Of course, there many reasons you wouldn’t want this—processes can take time to start up, for example—but it’s not an unreasonable mental model.

hawski|1 year ago

I know this about Unix pipes from a very long time. Whenever they are introduced it is always said, but I guess people can miss it.

Though now I will break your mind as my mind was broken not a long time ago. Powershell, which is often said to be a better shell, works like that. It doesn't run things in parallel. I think the same is to be said about Windows cmd/batch, but don't cite me on that. That one thing makes Powershell insufficient to ever be a full replacement of a proper shell.

shiomiru|1 year ago

DOS also has a "pipe", which works exactly like that. (Obviously, since DOS can't run multiple programs in parallel.)

lylejantzi3rd|1 year ago

Pipe, |, was also commonly used as an "OR" operator. I wonder if the idea that you could "pipe" data between commands came later.