For the Love of Pipes

[+] mothsonasloth|7 years ago|reply

[Quote]

The Unix philosophy is documented by Doug McIlroy as:

    Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new “features”.

    Expect the output of every program to become the input to another, as yet unknown, program. Don’t clutter output with extraneous information. Avoid stringently columnar or binary input formats. Don’t insist on interactive input.

    Design and build software, even operating systems, to be tried early, ideally within weeks. Don’t hesitate to throw away the clumsy parts and rebuild them.

    Use tools in preference to unskilled help to lighten a programming task, even if you have to detour to build the tools and expect to throw some of them out after you’ve finished using them.

I really like the last two, if you can do them in development then you are then you have a great dev culture

[+] quietbritishjim|7 years ago|reply

Reformatted to be readable:

> Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new “features”.

> Expect the output of every program to become the input to another, as yet unknown, program. Don’t clutter output with extraneous information. Avoid stringently columnar or binary input formats. Don’t insist on interactive input.

> Design and build software, even operating systems, to be tried early, ideally within weeks. Don’t hesitate to throw away the clumsy parts and rebuild them.

> Use tools in preference to unskilled help to lighten a programming task, even if you have to detour to build the tools and expect to throw some of them out after you’ve finished using them.

[+] pdkl95|7 years ago|reply

> The Unix philosophy is documented by Doug McIlroy as

TaoUP has a longer discussion[1] of the Unix philosophy, which includes Rob Pike's and Ken Thompson's comments on the philosophy.

[1] http://www.catb.org/esr/writings/taoup/html/ch01s06.html

"Those who don't understand Unix are condemned to reinvent it, poorly." (Henry Spencer)

[+] Waterluvian|7 years ago|reply

The last one is especially interesting to me these days. On a macro scale it sure sounds a whole lot like the robot revolution taking unskilled jobs.

But of course that's probably not the author's intended context.

[+] matchagaucho|7 years ago|reply

From 1978, and still applicable to microservices today.

[+] nojvek|7 years ago|reply

I’m surprised JessFraz who is employed by Microsoft doesn’t talk about powershell pipes at all.

Powershell pipes are an extension over Unix pipes. Rather than just being able to pipe a stream of bytes, powershell can pipe a stream of objects.

It makes working with pipes so much fun. In Unix you have to cut, awk and do all sorts of parsing to get some field out of `ls`. In poweshell, ls outputs stream of file objects and you can get the field you want by piping to `Get-Item` or sum the file sizes, or filter only directories. It’s very expressive once you’re manipulating streams of objects with properties.

[+] yaakushi|7 years ago|reply

I'm probably nitpicking, but if you're using cat to pipe a single file into the sdtin of another program, you most likely don't need the cat in the first place, you can just redirect the file to the process' stdin. Unless, of course, you're actually concatenating multiple files or maybe a file and stdin together.

Disclaimer: I do cat-piping myself quite a bit out of habit, so I'm not trying to look down at the author or anything like that! :)

[+] alkonaut|7 years ago|reply

I love the idea of simple things that can be connected in any way. I'm not so much a fan of "everything is a soup of bytes with unspecified encoding and unknown formatting".

It's an abstraction that held up quite well, but its starting to show its age.

[+] jerf|7 years ago|reply

I fully agree... and yet... everyone who has tried to "fix" this has failed, at least in the sense of "attaining anything like shell's size and reach". Many have succeeded in the sense of producing working code that fixes this in some sense.

Powershell's probably the closest to success, because it could be pushed out unilaterally. Without that I'm not sure it would have gotten very far, not because it's bad, but again because nobody else seems to have gotten very far....

[+] Jyaif|7 years ago|reply

100% agree. Having to extract information with regular expressions is a waste of time. If the structure of the data was available, you would have type safety / auto-completion. You could even have GUIs to compose programs.

[+] icebraining|7 years ago|reply

For an alternative view, don't forget to read the section on Pipes of The Unix-Haters Handbook: http://web.mit.edu/~simsong/www/ugh.pdf (page 198)

[+] darrenf|7 years ago|reply

I twitched horribly at the final sentence, screaming inwardly "you don't pipe to /dev/null, you redirect to it". And now I feel like an arsehole.

[+] benj111|7 years ago|reply

Hmm well, the Unix shell seems to follow a plumbing metaphor.

You could direct, or redirect the flow to /dev/null. Or pipe to /dev/null. Or redirect the pipe to /dev/null?

So from a metaphor point of view either would fit.

Although of course you don't use the pipe construct to direct to a file. Which would suggest piping is wrong?

And then on the third hand, we all know what it means so what's the problem.

So I would say theres war, famine and injustice in the world. Don't worry about posix shell semantics. :)

[+] analpaper|7 years ago|reply

redirect your feelings to /dev/null, because a pipe will just give us a Permission denied

[+] crazygringo|7 years ago|reply

Pipes are awesome and infuriating.

Sometimes they work great -- being able to dump from MySQL into gzip sending across the wire via ssh into gunzip and into my local MySQL without ever touching a file feels nothing short of magic... although the command/incantation to do so took quite a while to finally get right.

But far too often they inexplicably fail. For example, I had an issue last year where piping curl to bunzip would just inexplicably stop after about 1GB, but it was at a different exact spot every time (between 1GB and 1.5GB). No error message, no exit, my network connection is fine, just an infinite timeout. (While curl by itself worked flawlessly every time.)

And I've got another 10 stories like this (I do a lot of data processing). Any given combination of pipe tools, there's a kind of random chance they'll actually work in the end or not. And even more frustrating, they'll often work on your local machine but not on your server, or vice-versa. And I'm just running basic commodity macOS locally and out-of-the-box Ubuntu on my servers.

I don't know why, but many times I've had to rewrite a piped command as streams in a Python script to get it to work reliably.

[+] jarpineh|7 years ago|reply

I recently came across Ramda CLI's interactive mode [1]

It essentially hijacks pipe's input and output into browser where you can play with the Ramda command. Then you just close browser tab and Ramda CLI applies your changed code in the pipe, resuming its operation.

Now I'm thinking all kinds ways I use pipe that I could "tee" through a browser app. I can use browser for interactive JSON manipulation, visualization and all around playing. I'm now looking for ways to generalize Ramda CLI's approach. Pipes, Unix files and HTTP don't seem directly compatible, but the promise is there. Unix tee command doesn't "pause" the pipe, but probably one could just introduce pause/resume output passthrough command into the pipe after it. Then web server tool can send the tee'd file to browser and catch output from there.

[1] https://github.com/raine/ramda-cli#interactive-mode

[+] avodonosov|7 years ago|reply

You can just store the first pipeline results in a file, edit it, then use the file as an input for the second pipeline.

[+] mpweiher|7 years ago|reply

Yes, pipes are awesome, and the concepts actually translate well to in-process usage with structured data.

https://github.com/mpw/MPWFoundation/blob/master/Documentati...

One aspect is that the coordinating entity hooks up the pipeline and then gets out of the way, the pieces communicate amongst themselves, unlike FP simulations, which tend to have to come back to the coordinator.

This is very useful in "scripted-components" settings where you use a flexible/dynamic/slow scripting language to orchestrate fixed/fast components, without the slowness of the scripting language getting in the way. See sh :-)

Another aspect is error handling. Since results are actively passed on to the next filter, the error case is simply to not pass anything. Therefore the "happy path" simply doesn't have to deal with error cases at all, and you can deal with errors separately.

In call/return architectures (so: mostly everything), you have to return something, even in the error case. So we have nil, Maybe, Either, tuples or exceptions to get us out of Dodge. None of these is particularly good.

And of course | is such a perfect combinator because it is so sparse. It is obvious what each end does, all the components are forced to be uniform and at least syntactically composable/compatible.

Yay pipes.

[+] cmsj|7 years ago|reply

pipe junkies might like to know about the following tools:

* vipe (part of https://joeyh.name/code/moreutils/ - lets you edit text part way through a complex series of piped commands)

* pv (http://www.ivarch.com/programs/pv.shtml - lets you visualise the flow of data through a pipe)

[+] foreigner|7 years ago|reply

Yes! I love pv. Besides that and tee, can anyone else suggest some more general pipe tools?

[+] sudhirj|7 years ago|reply

Sanjay Ghemawat (the other less visible half of Jeff Dean) wrote a pipe library in Go, learnt quite a bit from it.

https://github.com/ghemawat/stream

Edit: Jeff Dean, not James Dean

[+] timvisee|7 years ago|reply

Cool, the pipe command must be one of the most essential things in Unix/Linux based systems.

I would have loved to see some awesome pipe examples though.

[+] benj111|7 years ago|reply

Why isn't the pipe a construct that has caught on in 'proper' languages?

[+] diggan|7 years ago|reply

Clojure has something that you could call a pipe almost. `->` passes the output from one form to the next one.

This example has a nested hash map where we try to get the "You got me!" string.

We can either use `:a` (keyword) as a function to get the value. Then we have to nest the function calls a bit unnaturally.

Or we can use the thread-first macro `->`, which is basically a unix pipe.

   user=> (def res {:a {:b {:c "You got me!"}}})
   #'user/res
   
   user=> res
   {:a {:b {:c "You got me!"}}}
   
   user=> (:c (:b (:a res)))
   "You got me!"
   
   user=> (-> res :a :b :c)
   "You got me!"

Thinking about it, Clojure advocates having small functions (similar to unix's "small programs / do one thing well") that you compose together to build bigger things.

[+] VladimirGolovin|7 years ago|reply

It has, in the form of function composition, as other replies show. However, the Unix pipe demonstrates a more interesting idea: composable programs on the level of the OS.

Nowadays, most of the user-facing desktop programs have GUIs, so the 'pipe' operator that composes programs is the user himself. Users compose programs by saving files from one program and opening them in another. The data being 'piped' through such program composition is sort-of typed, with the file types (PNG, TXT, etc) being the types and the loading modules of the programs being 'runtime typecheckers' that reject files with invalid format.

On the first sight, GUIs prevent program composition by requiring the user to serve as the 'pipe'. However, if GUIs were reflections / manifestations of some rich typed data (expressible in some really powerful type system, such as that of Idris), one could imagine the possibility of directly composing the programs together, bypassing the GUI or file-saving stages.

[+] arianvanp|7 years ago|reply

It is! It's the main way of programming in lazy functional programming languages like Haskell

And many programming languages have libraries for something similar:. iterators in rust / c++, streams in java/c#, thinks like reactive

[+] duckerude|7 years ago|reply

Many functional languages have |> for piping, but chained method calls are also a lot like pipelines. Data goes from left to right. This javascript expression:

  [1, 2, 3].map(n => n + 1).join(',').length

Is basically like this shell command:

  seq 3 | awk '{ print $1 + 1 }' | tr '\n' , | wc -c

(the shell version gives 6 instead of 5 because of a trailing newline, but close enough)

[+] Ndymium|7 years ago|reply

Elixir also has a pipe construct:

    f |> g(1)

would be equivalent to

    g(f, 1)

[+] mamcx|7 years ago|reply

Others have show languages have some kind of pipe support, but not exactly as the shell.

The shell have the weird(?) behavior of ONE input and TWO outputs (stdout, stderr).

Also, can redirect in both directions. I think a language to be alike pipes, it need each function to be alike:

    fun open(...) -> Result(Ok,Err)

and have the option of not only chain the OK side but the ERR:

    open("file.txt") |> print !!> raise |> print

exist something like this???

[+] dpwm|7 years ago|reply

OCaml has had the pipe operator |> since 4.01 [0]

It actually somewhat changes the way you write code, because it enables chaining of calls.

It's worth noting there's nothing preventing this being done before the pipe operator using function calls.

x |> f |> g is by definition the same as (g (f x)).

In non-performance-sensitive code, I've found that what would be quite a complicated monolithic function in an imperative language often ends up as a composition of more modular functions piped together. As others have mentioned, there are similarities with the method chaining style in OO languages.

Also, I believe Clojure has piping in the form of the -> thread-first macro.

[0] https://caml.inria.fr/pub/docs/manual-ocaml/libref/Pervasive...

[+] xaduha|7 years ago|reply

To take any significant advantage of it you need to use data-driven, transformational approach of solving something. But funny thing is once you have that it's not really a big deal even if you don't have a pipe operator.

[+] barrkel|7 years ago|reply

Monads are effectively pipes; the monad controls how data flows through the functions you put into the monad, but the functions individually are like individual programs in a pipe.

[+] jfhufl|7 years ago|reply

Hmmm, well that's why I like Ruby and other languages with functional approaches. Method chaining and blocks are very similar to pipes to me.

    cat /etc/passwd | grep root | awk -F: '{print $3}'

    ruby -e 'puts File.read("/etc/passwd").lines.select { |line| line.match(/root/) }.first.split(":")[2]'

A little more verbose, but the idea is the same.

https://alvinalexander.com/scala/fp-book/how-functional-prog...

[+] jaster|7 years ago|reply

As others have mentioned, the pipe construct is present in many languages (or can be added).

A small additional bit of information: this style is called "tacit" (or "point-free") programming. See https://en.wikipedia.org/wiki/Tacit_programming

(Unix pipes are even explicitly mentioned in the articles as an example)

[+] msravi|7 years ago|reply

Julia has a pipe operator, which applies a function to the preceding argument:

julia> 1:5 |> x->x.^2 |> x->2x

5-element Array{Int64,1}:

[+] gpderetta|7 years ago|reply

There is a tradition in C++ of overloading operator| for pipelining range operations. Your mileage may vary.

[+] rakoo|7 years ago|reply

It doesn't look the same, bug go's up.Reader and io.Writer are the interfaces you implement if you want the equivalent of "reading from stdin"/"writing to stdout". Once implemented io.Copy is the actual piping operation.

[+] nicwilson|7 years ago|reply

It has, D's uniform functional syntax makes this as easy as auto foo = some_array.filter!(predicate).array.sort.uniq; for the unique elements of the sorted array that satisfy the predicate pred.

[+] demirev|7 years ago|reply

To add to the list, pipes are also heavily used in modern R

[+] msravi|7 years ago|reply

awk, grep, sort, and pipe. I'm always amazed at how well thought out, simple, functional, and fast the unix tools are. I still prefer to sift through and validate data using these tools rather than use excel or any full-fledged language.

Edit: Also "column" to format your output into a table.

[+] gpderetta|7 years ago|reply

Although I probably use it multiple times everyday, I hate column. At least the implementation I use has issues with empty fields and a fixed maximum line length.

Edit: s/files/fields/

[+] heinrichhartman|7 years ago|reply

I have wondered for a long time why pipes are not used more often in production-grade applications.

I have seen plenty pipe use in bash scripts, especially for build and ETL purposes. Other languages, have ways to do pipes as well (e.g. Python https://docs.python.org/2/library/subprocess.html#replacing-...) but I have seen much less use of it.

It appears to me, that for more complex applications one rather opts for TCP-/UDP-/UNIX-Domain sockets for IPC.

- Has anyone here tried to plumb together large applications with pipes?

- Was it successful?

- Which problems did you run into?

[+] aidenn0|7 years ago|reply

The biggest issue is that pipes are unidirectional, while not all data flow is unidirectional.

Some functional programming styles are pipe-like in the sense that data-flow is unidirectional:

  Foo(Bar(Baz(Bif(x))))

is analagous to:

  cat x | Bif| Baz |Bar| Foo

Obviously the order of evaluation will depend on the semantics of the language used; most eager languages will fully evaluate each step before the next. (Actually this is one issue with Unix pipes; the flow-control semantics are tied to the concept of blocking I/O using a fixed-size buffer)

The idea of dataflow programming[1] is closely related to pipes and has existed for a long time, but it has mostly remained a niche, at least outside of hardware-design languages

1: https://en.wikipedia.org/wiki/Dataflow_programming

[+] kradroy|7 years ago|reply

I built an entire prototype ML system/pipeline using shell scripts that glued together two python scripts that did some heavy lifting not easily reproduced.

I got the whole thing working from training to prediction in about 3 weeks. What I love about Unix shell commands is that you simply can't abstract beyond the input/output paradigm. You aren't going to create classes, types classes, tests, etc. It's not possible or not worth it.

I'd like to see more devs use this approach, because it's a really nice way to get a project going in order to poke holes in it or see a general structure. I consider it a sketchpad of sorts.

[+] talkingtab|7 years ago|reply

I'm not a Clojure person, but there are transducers, https://clojure.org/reference/transducers.

[+] roelb|7 years ago|reply

Fully agree, pipes are awesome, only downside is the potential duplicate serialization/deserialization overhead.

Streams in most decent languages closely adhere to this idea.

I especially like how node does it, in my opinion one of the best things in node. Where you can simply create cli programs that have backpressure the same as you would work with binary/file streams, while also supporting object streams.

    process.stdin.pipe(byline()).pipe(through2(transformFunction)).pipe(process.stdout)

[+] mweberxyz|7 years ago|reply

Node streams are excellent, but unfortunately don't get as much fanfare as Promises/async+await. A number of times I have gotten asked "how come my node script runs out of memory" -- due to the dev using await and storing the entirety of what is essentially streaming data in memory in between processing steps.

[+] WhompingWindows|7 years ago|reply

Pipes have been a game-changer for me in R with the tidyverse suite of packages. Base R doesn't have pipes, requiring a bit more saving of objects or a compromise on code readability.

One criticism would be that ggplot2 uses the "+" to add more graph features, whereas the rest of tidyverse uses "%>%" as its pipe, when ideally ggplot2 would also use it. One of my most common errors with ggplot2 is not utilizing the + or the %>% in the right places.

[+] vharuck|7 years ago|reply

I've always thought of ggplot2's process as building a plot object. Most steps only add to the input.

Of course, Hadley admitted it was because he wrote ggplot2 before adopting pipes into his packages.

[+] adem666|7 years ago|reply

Unix's philosophy of “do one thing well” and “expect the output of every program to become the input to another” is living with "microservices" in nowadays.

303 comments