top | item 37981683

Show HN: Pypipe – A Python command-line tool for pipeline processing

213 points| bugen | 2 years ago |github.com | reply

pypipe is a command-line tool for writing data pipelines in Python. When working with data processing in the terminal, I often find myself wanting to pass the output of commands to Python for further processing. In such cases, one can either write one-liners or create regular Python scripts and connect them through pipes. However, using pypipe makes this process more convenient and efficient.

45 comments

order
[+] robertlagrant|2 years ago|reply

  $ echo "pypipe" | ppp "line[::2]"
  ppp
This is an incredibly confusing first example!
[+] js2|2 years ago|reply
It's pretty clear if you're intimately familiar with Python's slice syntax, but too clever by half otherwise. I've been coding Python since 2000 and can count on one hand the number of times I've used the step parameter in a slice.
[+] polyrand|2 years ago|reply
A bit unrelated, but one thing I absolutely love is the fact you can install it by copying a single file to a folder in your PATH. I have been trying to follow this approach for my Python scripts (standard library only, everything in one file) and I really enjoy the experience. Most of the features I need only require Python 3.8, and Ubuntu comes with Python pre-installed, so

  rsync
  chmod +x
"just works".
[+] BiteCode_dev|2 years ago|reply
You can use shiv to make any script with deps a single file
[+] linsomniac|2 years ago|reply
If you do that, you may want to know about this magic to test code that's in a single file Python CLI program: https://linsomniac.com/post/2023-03-21-python_testing_a_cli_...

I like to set up at least some tests on my scripts so that I can reduce the number of times I push something out that is obviously broken. pre-commit can also help with preventing shipping things with syntax errors if you enable the "ast" check, which does a simple syntax check on the code.

[+] bugen|2 years ago|reply
Thank you for all the comments and advice. I'm truly surprised by the response, it's beyond what I expected. It was here that I learned about other projects similar to pypipe for the first time. After checking them out, I now understand that pypipe's strength is in its simplicity. I plan to improve pypipe while keeping it simple, so anyone can easily understand how it works by reading the source code and make their own customizations.
[+] kunley|2 years ago|reply
Cool!

My tool of choice for such things is awk, still, it's good to have more alternatives

[+] smithza|2 years ago|reply
I couldn't dream of using awk for json data (ubiquitous nowadays). Of course there is jq and others. It is as the Pragmatic Programmer puts it that we have to take care to curate and master our tools like a woodworker and their tools.
[+] dbragdon|2 years ago|reply
This is an awesome tool, I love cmd tools that make it easier to manipulate and work with tabular data. I work with a lot of tabular data, mainly in s3, and I put together "s3head" for easily streaming s3 data into stdout:

https://github.com/dbragdon1/s3head

and I'm gonna have a good time piping the output from s3head into pypipe.

[+] leandot|2 years ago|reply
Can't you just use:

  aws s3 cp s3://YOUR_FILE - |
[+] paiute|2 years ago|reply
I’ve had dreams about making this sort of tool. I’m so thrilled to see this!!!!
[+] Difwif|2 years ago|reply
Nice! My go to system scripting is bash that calls python for the things that just suck doing in bash. I didn't see a method to do it but it would be great if this could cleanup bash/python interop by giving an ergonomic interface to define custom python functions and call them.

Also since you really want to think of this as an extension of coreutils it would be great to offer this as a brew/apt package even if it's this simple. I just want to add it to my system package list and be able to depend on the command.

[+] shellmachine|2 years ago|reply
"To make it easier to type, it's recommended to create a symbolic link. ln -s pypipe.py ppp" Glad to see you didn't name it ppp.py
[+] marcyb5st|2 years ago|reply
Pretty cool! I feel that this would be extremely helpful for me since at times I struggle remembering the incantations for xargs, awk, ... .
[+] fwungy|2 years ago|reply
Why didn't I think of this? Very cool.
[+] AndyKluger|2 years ago|reply
Cool!

I was going to ask how this differs in broad strokes from pz, but when I went to get the reference link found that pz hasn't been updated in two years, so that's one big difference.

https://github.com/CZ-NIC/pz

[+] izoow|2 years ago|reply
I was looking for something like this, will definitely try! I was always envious of perl being able to be easily incorporated into shell pipelines and wished python would support something like that.
[+] syrusakbary|2 years ago|reply
This is awesome, great work bugen!

I've created a package in Wasmer [1] to showcase this tool (also, it will do the processing fully sandboxed thanks to Wasm!)... hope you all like it! (here's the PR [2])

  # Install Wasmer
  curl https://get.wasmer.io -sSfL | sh
  # Add ppp alias
  alias ppp="wasmer run syrusakbary/[email protected] -- "

And then, run it normally:

  $ cat staff.txt |ppp 'i, line.upper()'

[1] https://wasmer.io/

[2] https://github.com/bugen/pypipe/pull/2

[+] theamk|2 years ago|reply
wow that wasmer thing is _SLOW_.. we are talking about 57x time slow! (granted most of this is likely startup delay). Here is a random benchmark with warmed-up cache:

    $ time cat /var/lib/dpkg/status | wasmer run syrusakbary/[email protected] -- 'i, line.upper()'  | wc -l
    39175
    real    0m5.761s
    user    1m15.071s
    sys     0m4.838s
vs regular python:

    $ time cat /var/lib/dpkg/status | python3 pypipe.py 'i, line.upper()'  | wc -l
    39175
    real    0m0.107s
    user    0m0.096s
    sys     0m0.026s
and the wasmer install procedure.. not a deb file in sight, adds itself to ~/.bashrc (of course...) and apparently requires two environment variables to even work.

Compare this to OP's instructions: (1) check out the repo (2) execute the file directly.

Not sure why would anyone want wasmer for simple command like tools like those.

[+] enoch2090|2 years ago|reply
I have a feeling that in most use cases this is replacing grep and awk in a familiar way to Python programmers, especially the latter with its own grammar. Fun stuff!
[+] hk__2|2 years ago|reply
Nice! How does it compare performance-wise with AWK?
[+] imglorp|2 years ago|reply
A better analog would be perl -ne. It was only a matter of time before python got this.
[+] williamcotton|2 years ago|reply
This is great!

I've been making a lot of tools in this similar vein. I've been keeping them in my dotfiles.

I've got plt [0], a simple matplotlib templating language built with Python Lex Yacc for making quick plots from CSVs , eg,

  cat data.csv | plt '[a_version_count, b_version_count], date { plot 1px [solid blue, solid red] }' > plot.png
There's a plugin format so you can make extensions like bleep [1]:

  plt 'a_version_count, date { bleep blop blip green 10 } --py' > bleep_plotter.py
  cat data.csv | python3 bleep_plotter.py > bleep_plot.png
To create a plugin xyz, just call it "xyz_template.py" and put it in ~/dotfiles/plt. Outputs to Python code are optional but useful for minor adjustments.

(Does plt look familiar? Can you tell I just read the latest version of The Awk Programming language?)

Or I was reading The Unix Programming Environment (1982) and being inspired by the pick command, wired up electron to allow for STDIN/OUT/ARGV in the browser context, for what I'm calling elec [2]:

  elec textarea -x 300 -y 0 | elec pick -x 300 -y 600 | awk '{ print $0 " " $0 }'
Again, to create a plugin xyz, and in this case all elec commands are plugins, add "xyz.html" to ~/dotfiles/elec, as seen with the pick [3] plugin.

ANYWAYS, where I'm going with this instead of,

  cat staff.xml | ppp custom -N xpath -O path='./Animal/Age'
How about?

  cat staff.xml | ppp xpath -O path='./Animal/Age'
Convention over configuration!

Again, this tool is great, it's already in my dotfiles and I've already used it at work this morning, so thank you!

[0] https://github.com/williamcotton/dotfiles/blob/master/bin/pl...

[1] https://github.com/williamcotton/dotfiles/blob/master/plt/bl...

[2] https://github.com/williamcotton/dotfiles/blob/master/bin/el...

[3] https://github.com/williamcotton/dotfiles/blob/master/elec/p...

[+] bugen|2 years ago|reply
Thank you for using pypipe!

  How about?  
  
  cat staff.xml | ppp xpath -O path='./Animal/Age'
I also wanted to allow custom commands like this, but I decided on the current format for a few reasons, including the ability to omit the default 'line' command from the arguments. For frequently used commands, please consider setting up aliases in your configuration files (e.g ~/.profile).

  alias xpath='ppp custom -N xpath'