The Unix philosophy is documented by Doug McIlroy as:
Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new “features”.
Expect the output of every program to become the input to another, as yet unknown, program. Don’t clutter output with extraneous information. Avoid stringently columnar or binary input formats. Don’t insist on interactive input.
Design and build software, even operating systems, to be tried early, ideally within weeks. Don’t hesitate to throw away the clumsy parts and rebuild them.
Use tools in preference to unskilled help to lighten a programming task, even if you have to detour to build the tools and expect to throw some of them out after you’ve finished using them.
I really like the last two, if you can do them in development then you are then you have a great dev culture
> Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new “features”.
> Expect the output of every program to become the input to another, as yet unknown, program. Don’t clutter output with extraneous information. Avoid stringently columnar or binary input formats. Don’t insist on interactive input.
> Design and build software, even operating systems, to be tried early, ideally within weeks. Don’t hesitate to throw away the clumsy parts and rebuild them.
> Use tools in preference to unskilled help to lighten a programming task, even if you have to detour to build the tools and expect to throw some of them out after you’ve finished using them.
I’m surprised JessFraz who is employed by Microsoft doesn’t talk about powershell pipes at all.
Powershell pipes are an extension over Unix pipes. Rather than just being able to pipe a stream of bytes, powershell can pipe a stream of objects.
It makes working with pipes so much fun. In Unix you have to cut, awk and do all sorts of parsing to get some field out of `ls`. In poweshell, ls outputs stream of file objects and you can get the field you want by piping to `Get-Item` or sum the file sizes, or filter only directories. It’s very expressive once you’re manipulating streams of objects with properties.
I'm probably nitpicking, but if you're using cat to pipe a single file into the sdtin of another program, you most likely don't need the cat in the first place, you can just redirect the file to the process' stdin. Unless, of course, you're actually concatenating multiple files or maybe a file and stdin together.
Disclaimer: I do cat-piping myself quite a bit out of habit, so I'm not trying to look down at the author or anything like that! :)
I love the idea of simple things that can be connected in any way. I'm not so much a fan of "everything is a soup of bytes with unspecified encoding and unknown formatting".
It's an abstraction that held up quite well, but its starting to show its age.
I fully agree... and yet... everyone who has tried to "fix" this has failed, at least in the sense of "attaining anything like shell's size and reach". Many have succeeded in the sense of producing working code that fixes this in some sense.
Powershell's probably the closest to success, because it could be pushed out unilaterally. Without that I'm not sure it would have gotten very far, not because it's bad, but again because nobody else seems to have gotten very far....
100% agree.
Having to extract information with regular expressions is a waste of time.
If the structure of the data was available, you would have type safety / auto-completion. You could even have GUIs to compose programs.
Sometimes they work great -- being able to dump from MySQL into gzip sending across the wire via ssh into gunzip and into my local MySQL without ever touching a file feels nothing short of magic... although the command/incantation to do so took quite a while to finally get right.
But far too often they inexplicably fail. For example, I had an issue last year where piping curl to bunzip would just inexplicably stop after about 1GB, but it was at a different exact spot every time (between 1GB and 1.5GB). No error message, no exit, my network connection is fine, just an infinite timeout. (While curl by itself worked flawlessly every time.)
And I've got another 10 stories like this (I do a lot of data processing). Any given combination of pipe tools, there's a kind of random chance they'll actually work in the end or not. And even more frustrating, they'll often work on your local machine but not on your server, or vice-versa. And I'm just running basic commodity macOS locally and out-of-the-box Ubuntu on my servers.
I don't know why, but many times I've had to rewrite a piped command as streams in a Python script to get it to work reliably.
I recently came across Ramda CLI's interactive mode [1]
It essentially hijacks pipe's input and output into browser where you can play with the Ramda command. Then you just close browser tab and Ramda CLI applies your changed code in the pipe, resuming its operation.
Now I'm thinking all kinds ways I use pipe that I could "tee" through a browser app. I can use browser for interactive JSON manipulation, visualization and all around playing. I'm now looking for ways to generalize Ramda CLI's approach. Pipes, Unix files and HTTP don't seem directly compatible, but the promise is there. Unix tee command doesn't "pause" the pipe, but probably one could just introduce pause/resume output passthrough command into the pipe after it. Then web server tool can send the tee'd file to browser and catch output from there.
One aspect is that the coordinating entity hooks up the pipeline and then gets out of the way, the pieces communicate amongst themselves, unlike FP simulations, which tend to have to come back to the coordinator.
This is very useful in "scripted-components" settings where you use a flexible/dynamic/slow scripting language to orchestrate fixed/fast components, without the slowness of the scripting language getting in the way. See sh :-)
Another aspect is error handling. Since results are actively passed on to the next filter, the error case is simply to not pass anything. Therefore the "happy path" simply doesn't have to deal with error cases at all, and you can deal with errors separately.
In call/return architectures (so: mostly everything), you have to return something, even in the error case. So we have nil, Maybe, Either, tuples or exceptions to get us out of Dodge. None of these is particularly good.
And of course | is such a perfect combinator because it is so sparse. It is obvious what each end does, all the components are forced to be uniform and at least syntactically composable/compatible.
Thinking about it, Clojure advocates having small functions (similar to unix's "small programs / do one thing well") that you compose together to build bigger things.
It has, in the form of function composition, as other replies show. However, the Unix pipe demonstrates a more interesting idea: composable programs on the level of the OS.
Nowadays, most of the user-facing desktop programs have GUIs, so the 'pipe' operator that composes programs is the user himself. Users compose programs by saving files from one program and opening them in another. The data being 'piped' through such program composition is sort-of typed, with the file types (PNG, TXT, etc) being the types and the loading modules of the programs being 'runtime typecheckers' that reject files with invalid format.
On the first sight, GUIs prevent program composition by requiring the user to serve as the 'pipe'. However, if GUIs were reflections / manifestations of some rich typed data (expressible in some really powerful type system, such as that of Idris), one could imagine the possibility of directly composing the programs together, bypassing the GUI or file-saving stages.
Many functional languages have |> for piping, but chained method calls are also a lot like pipelines. Data goes from left to right. This javascript expression:
It actually somewhat changes the way you write code, because it enables chaining of calls.
It's worth noting there's nothing preventing this being done before the pipe operator using function calls.
x |> f |> g is by definition the same as (g (f x)).
In non-performance-sensitive code, I've found that what would be quite a complicated monolithic function in an imperative language often ends up as a composition of more modular functions piped together. As others have mentioned, there are similarities with the method chaining style in OO languages.
Also, I believe Clojure has piping in the form of the -> thread-first macro.
To take any significant advantage of it you need to use data-driven, transformational approach of solving something. But funny thing is once you have that it's not really a big deal even if you don't have a pipe operator.
Monads are effectively pipes; the monad controls how data flows through the functions you put into the monad, but the functions individually are like individual programs in a pipe.
It doesn't look the same, bug go's up.Reader and io.Writer are the interfaces you implement if you want the equivalent of "reading from stdin"/"writing to stdout". Once implemented io.Copy is the actual piping operation.
It has, D's uniform functional syntax makes this as easy as auto foo = some_array.filter!(predicate).array.sort.uniq; for the unique elements of the sorted array that satisfy the predicate pred.
awk, grep, sort, and pipe. I'm always amazed at how well thought out, simple, functional, and fast the unix tools are. I still prefer to sift through and validate data using these tools rather than use excel or any full-fledged language.
Edit: Also "column" to format your output into a table.
Although I probably use it multiple times everyday, I hate column. At least the implementation I use has issues with empty fields and a fixed maximum line length.
The biggest issue is that pipes are unidirectional, while not all data flow is unidirectional.
Some functional programming styles are pipe-like in the sense that data-flow is unidirectional:
Foo(Bar(Baz(Bif(x))))
is analagous to:
cat x | Bif| Baz |Bar| Foo
Obviously the order of evaluation will depend on the semantics of the language used; most eager languages will fully evaluate each step before the next. (Actually this is one issue with Unix pipes; the flow-control semantics are tied to the concept of blocking I/O using a fixed-size buffer)
The idea of dataflow programming[1] is closely related to pipes and has existed for a long time, but it has mostly remained a niche, at least outside of hardware-design languages
I built an entire prototype ML system/pipeline using shell scripts that glued together two python scripts that did some heavy lifting not easily reproduced.
I got the whole thing working from training to prediction in about 3 weeks. What I love about Unix shell commands is that you simply can't abstract beyond the input/output paradigm. You aren't going to create classes, types classes, tests, etc. It's not possible or not worth it.
I'd like to see more devs use this approach, because it's a really nice way to get a project going in order to poke holes in it or see a general structure. I consider it a sketchpad of sorts.
Fully agree, pipes are awesome, only downside is the potential duplicate serialization/deserialization overhead.
Streams in most decent languages closely adhere to this idea.
I especially like how node does it, in my opinion one of the best things in node. Where you can simply create cli programs that have backpressure the same as you would work with binary/file streams, while also supporting object streams.
Node streams are excellent, but unfortunately don't get as much fanfare as Promises/async+await. A number of times I have gotten asked "how come my node script runs out of memory" -- due to the dev using await and storing the entirety of what is essentially streaming data in memory in between processing steps.
Pipes have been a game-changer for me in R with the tidyverse suite of packages. Base R doesn't have pipes, requiring a bit more saving of objects or a compromise on code readability.
One criticism would be that ggplot2 uses the "+" to add more graph features, whereas the rest of tidyverse uses "%>%" as its pipe, when ideally ggplot2 would also use it. One of my most common errors with ggplot2 is not utilizing the + or the %>% in the right places.
Unix's philosophy of “do one thing well” and “expect the output of every program to become the input to another” is living with "microservices" in nowadays.
[+] [-] mothsonasloth|7 years ago|reply
The Unix philosophy is documented by Doug McIlroy as:
I really like the last two, if you can do them in development then you are then you have a great dev culture[+] [-] quietbritishjim|7 years ago|reply
> Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new “features”.
> Expect the output of every program to become the input to another, as yet unknown, program. Don’t clutter output with extraneous information. Avoid stringently columnar or binary input formats. Don’t insist on interactive input.
> Design and build software, even operating systems, to be tried early, ideally within weeks. Don’t hesitate to throw away the clumsy parts and rebuild them.
> Use tools in preference to unskilled help to lighten a programming task, even if you have to detour to build the tools and expect to throw some of them out after you’ve finished using them.
[+] [-] pdkl95|7 years ago|reply
TaoUP has a longer discussion[1] of the Unix philosophy, which includes Rob Pike's and Ken Thompson's comments on the philosophy.
[1] http://www.catb.org/esr/writings/taoup/html/ch01s06.html
"Those who don't understand Unix are condemned to reinvent it, poorly." (Henry Spencer)
[+] [-] Waterluvian|7 years ago|reply
But of course that's probably not the author's intended context.
[+] [-] matchagaucho|7 years ago|reply
[+] [-] nojvek|7 years ago|reply
Powershell pipes are an extension over Unix pipes. Rather than just being able to pipe a stream of bytes, powershell can pipe a stream of objects.
It makes working with pipes so much fun. In Unix you have to cut, awk and do all sorts of parsing to get some field out of `ls`. In poweshell, ls outputs stream of file objects and you can get the field you want by piping to `Get-Item` or sum the file sizes, or filter only directories. It’s very expressive once you’re manipulating streams of objects with properties.
[+] [-] yaakushi|7 years ago|reply
Disclaimer: I do cat-piping myself quite a bit out of habit, so I'm not trying to look down at the author or anything like that! :)
[+] [-] alkonaut|7 years ago|reply
It's an abstraction that held up quite well, but its starting to show its age.
[+] [-] jerf|7 years ago|reply
Powershell's probably the closest to success, because it could be pushed out unilaterally. Without that I'm not sure it would have gotten very far, not because it's bad, but again because nobody else seems to have gotten very far....
[+] [-] Jyaif|7 years ago|reply
[+] [-] icebraining|7 years ago|reply
[+] [-] darrenf|7 years ago|reply
[+] [-] benj111|7 years ago|reply
You could direct, or redirect the flow to /dev/null. Or pipe to /dev/null. Or redirect the pipe to /dev/null?
So from a metaphor point of view either would fit.
Although of course you don't use the pipe construct to direct to a file. Which would suggest piping is wrong?
And then on the third hand, we all know what it means so what's the problem.
So I would say theres war, famine and injustice in the world. Don't worry about posix shell semantics. :)
[+] [-] analpaper|7 years ago|reply
[+] [-] crazygringo|7 years ago|reply
Sometimes they work great -- being able to dump from MySQL into gzip sending across the wire via ssh into gunzip and into my local MySQL without ever touching a file feels nothing short of magic... although the command/incantation to do so took quite a while to finally get right.
But far too often they inexplicably fail. For example, I had an issue last year where piping curl to bunzip would just inexplicably stop after about 1GB, but it was at a different exact spot every time (between 1GB and 1.5GB). No error message, no exit, my network connection is fine, just an infinite timeout. (While curl by itself worked flawlessly every time.)
And I've got another 10 stories like this (I do a lot of data processing). Any given combination of pipe tools, there's a kind of random chance they'll actually work in the end or not. And even more frustrating, they'll often work on your local machine but not on your server, or vice-versa. And I'm just running basic commodity macOS locally and out-of-the-box Ubuntu on my servers.
I don't know why, but many times I've had to rewrite a piped command as streams in a Python script to get it to work reliably.
[+] [-] jarpineh|7 years ago|reply
It essentially hijacks pipe's input and output into browser where you can play with the Ramda command. Then you just close browser tab and Ramda CLI applies your changed code in the pipe, resuming its operation.
Now I'm thinking all kinds ways I use pipe that I could "tee" through a browser app. I can use browser for interactive JSON manipulation, visualization and all around playing. I'm now looking for ways to generalize Ramda CLI's approach. Pipes, Unix files and HTTP don't seem directly compatible, but the promise is there. Unix tee command doesn't "pause" the pipe, but probably one could just introduce pause/resume output passthrough command into the pipe after it. Then web server tool can send the tee'd file to browser and catch output from there.
[1] https://github.com/raine/ramda-cli#interactive-mode
[+] [-] avodonosov|7 years ago|reply
[+] [-] mpweiher|7 years ago|reply
https://github.com/mpw/MPWFoundation/blob/master/Documentati...
One aspect is that the coordinating entity hooks up the pipeline and then gets out of the way, the pieces communicate amongst themselves, unlike FP simulations, which tend to have to come back to the coordinator.
This is very useful in "scripted-components" settings where you use a flexible/dynamic/slow scripting language to orchestrate fixed/fast components, without the slowness of the scripting language getting in the way. See sh :-)
Another aspect is error handling. Since results are actively passed on to the next filter, the error case is simply to not pass anything. Therefore the "happy path" simply doesn't have to deal with error cases at all, and you can deal with errors separately.
In call/return architectures (so: mostly everything), you have to return something, even in the error case. So we have nil, Maybe, Either, tuples or exceptions to get us out of Dodge. None of these is particularly good.
And of course | is such a perfect combinator because it is so sparse. It is obvious what each end does, all the components are forced to be uniform and at least syntactically composable/compatible.
Yay pipes.
[+] [-] cmsj|7 years ago|reply
* vipe (part of https://joeyh.name/code/moreutils/ - lets you edit text part way through a complex series of piped commands)
* pv (http://www.ivarch.com/programs/pv.shtml - lets you visualise the flow of data through a pipe)
[+] [-] foreigner|7 years ago|reply
[+] [-] sudhirj|7 years ago|reply
https://github.com/ghemawat/stream
Edit: Jeff Dean, not James Dean
[+] [-] timvisee|7 years ago|reply
I would have loved to see some awesome pipe examples though.
[+] [-] benj111|7 years ago|reply
[+] [-] diggan|7 years ago|reply
This example has a nested hash map where we try to get the "You got me!" string.
We can either use `:a` (keyword) as a function to get the value. Then we have to nest the function calls a bit unnaturally.
Or we can use the thread-first macro `->`, which is basically a unix pipe.
Thinking about it, Clojure advocates having small functions (similar to unix's "small programs / do one thing well") that you compose together to build bigger things.[+] [-] VladimirGolovin|7 years ago|reply
Nowadays, most of the user-facing desktop programs have GUIs, so the 'pipe' operator that composes programs is the user himself. Users compose programs by saving files from one program and opening them in another. The data being 'piped' through such program composition is sort-of typed, with the file types (PNG, TXT, etc) being the types and the loading modules of the programs being 'runtime typecheckers' that reject files with invalid format.
On the first sight, GUIs prevent program composition by requiring the user to serve as the 'pipe'. However, if GUIs were reflections / manifestations of some rich typed data (expressible in some really powerful type system, such as that of Idris), one could imagine the possibility of directly composing the programs together, bypassing the GUI or file-saving stages.
[+] [-] arianvanp|7 years ago|reply
And many programming languages have libraries for something similar:. iterators in rust / c++, streams in java/c#, thinks like reactive
[+] [-] duckerude|7 years ago|reply
[+] [-] Ndymium|7 years ago|reply
[+] [-] mamcx|7 years ago|reply
The shell have the weird(?) behavior of ONE input and TWO outputs (stdout, stderr).
Also, can redirect in both directions. I think a language to be alike pipes, it need each function to be alike:
and have the option of not only chain the OK side but the ERR: exist something like this???[+] [-] dpwm|7 years ago|reply
It actually somewhat changes the way you write code, because it enables chaining of calls.
It's worth noting there's nothing preventing this being done before the pipe operator using function calls.
x |> f |> g is by definition the same as (g (f x)).
In non-performance-sensitive code, I've found that what would be quite a complicated monolithic function in an imperative language often ends up as a composition of more modular functions piped together. As others have mentioned, there are similarities with the method chaining style in OO languages.
Also, I believe Clojure has piping in the form of the -> thread-first macro.
[0] https://caml.inria.fr/pub/docs/manual-ocaml/libref/Pervasive...
[+] [-] xaduha|7 years ago|reply
[+] [-] barrkel|7 years ago|reply
[+] [-] jfhufl|7 years ago|reply
https://alvinalexander.com/scala/fp-book/how-functional-prog...
[+] [-] jaster|7 years ago|reply
A small additional bit of information: this style is called "tacit" (or "point-free") programming. See https://en.wikipedia.org/wiki/Tacit_programming
(Unix pipes are even explicitly mentioned in the articles as an example)
[+] [-] msravi|7 years ago|reply
julia> 1:5 |> x->x.^2 |> x->2x
5-element Array{Int64,1}:
[+] [-] gpderetta|7 years ago|reply
[+] [-] rakoo|7 years ago|reply
[+] [-] nicwilson|7 years ago|reply
[+] [-] demirev|7 years ago|reply
[+] [-] msravi|7 years ago|reply
Edit: Also "column" to format your output into a table.
[+] [-] gpderetta|7 years ago|reply
Edit: s/files/fields/
[+] [-] heinrichhartman|7 years ago|reply
I have seen plenty pipe use in bash scripts, especially for build and ETL purposes. Other languages, have ways to do pipes as well (e.g. Python https://docs.python.org/2/library/subprocess.html#replacing-...) but I have seen much less use of it.
It appears to me, that for more complex applications one rather opts for TCP-/UDP-/UNIX-Domain sockets for IPC.
- Has anyone here tried to plumb together large applications with pipes?
- Was it successful?
- Which problems did you run into?
[+] [-] aidenn0|7 years ago|reply
Some functional programming styles are pipe-like in the sense that data-flow is unidirectional:
is analagous to: Obviously the order of evaluation will depend on the semantics of the language used; most eager languages will fully evaluate each step before the next. (Actually this is one issue with Unix pipes; the flow-control semantics are tied to the concept of blocking I/O using a fixed-size buffer)The idea of dataflow programming[1] is closely related to pipes and has existed for a long time, but it has mostly remained a niche, at least outside of hardware-design languages
1: https://en.wikipedia.org/wiki/Dataflow_programming
[+] [-] kradroy|7 years ago|reply
I got the whole thing working from training to prediction in about 3 weeks. What I love about Unix shell commands is that you simply can't abstract beyond the input/output paradigm. You aren't going to create classes, types classes, tests, etc. It's not possible or not worth it.
I'd like to see more devs use this approach, because it's a really nice way to get a project going in order to poke holes in it or see a general structure. I consider it a sketchpad of sorts.
[+] [-] talkingtab|7 years ago|reply
[+] [-] roelb|7 years ago|reply
Streams in most decent languages closely adhere to this idea.
I especially like how node does it, in my opinion one of the best things in node. Where you can simply create cli programs that have backpressure the same as you would work with binary/file streams, while also supporting object streams.
[+] [-] mweberxyz|7 years ago|reply
[+] [-] WhompingWindows|7 years ago|reply
One criticism would be that ggplot2 uses the "+" to add more graph features, whereas the rest of tidyverse uses "%>%" as its pipe, when ideally ggplot2 would also use it. One of my most common errors with ggplot2 is not utilizing the + or the %>% in the right places.
[+] [-] vharuck|7 years ago|reply
Of course, Hadley admitted it was because he wrote ggplot2 before adopting pipes into his packages.
[+] [-] adem666|7 years ago|reply