top | item 29318751

Bash patterns I use weekly

278 points| gcmeplz | 4 years ago |will-keleher.com | reply

112 comments

order
[+] jph|4 years ago|reply
> git bisect is the "real" way to do this, but it's not something I've ever needed

git bisect is great and worth trying; it does what you're doing in your bash loop, plus faster and with more capabilities such as logging, visualizing, skipping, etc.

The syntax is: $ git bisect run <command> [arguments]

https://git-scm.com/docs/git-bisect

[+] OskarS|4 years ago|reply
Yes, git bisect is the way to go: in addition to the stuff you mentioned, his method only dives into one parent branch of merge commits. git bisect handles that correctly. A gem of a tool, git bisect.
[+] stewartbutler|4 years ago|reply
I've always had trouble getting `for` loops to work predictably, so my common loop pattern is this:

    grep -l -r pattern /path/to/files | while read x; do echo $x; done
or the like.

This uses bash read to split the input line into words, then each word can be accessed in the loop with variable `$x`. Pipe friendly and doesn't use a subshell so no unexpected scoping issues. It also doesn't require futzing around with arrays or the like.

One place I do use bash for loops is when iterating over args, e.g. if you create a bash function:

    function my_func() {
        for arg; do
            echo $arg
        done
    }
This'll take a list of arguments and echo each on a separate line. Useful if you need a function that does some operation against a list of files, for example.

Also, bash expansions (https://www.gnu.org/software/bash/manual/html_node/Shell-Par...) can save you a ton of time for various common operations on variables.

[+] adityaathalye|4 years ago|reply
Once I discovered functions are pipeline-friendly, I started pipelining all the things. Almost any `for arg` can be re-designed as a `while read arg` with a function.

Here's what it can look like. Some functions I wrote to bulk-process git repos. Notice they accept arguments and stdin:

  # ID all git repos anywhere under this directory
  ls_git_projects ~/src/bitbucket |
      # filter ones not updated for pre-set num days    
      take_stale |
      # proc repo applies the given op. (a function in this case) to each repo
      proc_repos git_fetch
Source: https://github.com/adityaathalye/bash-toolkit/blob/master/bu...

The best part is sourcing pipeline-friendly functions into a shell session allows me to mix-and-match them with regular unix tools.

Overall, I believe (and my code will betray it) functional programming style is a pretty fine way to live in shell!

[+] hnlmorg|4 years ago|reply
> I've always had trouble getting `for` loops to work predictably, so my common loop pattern is this:

for loops were exactly the pain point that lead me to write my own shell > 6 years ago.

I can now iterate through structured data (be it JSON, YAML, CSV, `ps` output, log file entries, or whatever) and each item is pulled intelligently rather than having to conciously consider a tonne of dumb edge cases like "what if my file names have spaces in them"

eg

    » open https://api.github.com/repos/lmorg/murex/issues -> foreach issue { out "$issue[number]: $issue[title]" }
    380: Fail if variable is missing
    379: Backslashes and code comments
    378: Improve testing facility documentation
    377: v2.4 release
    361: Deprecate `swivel-table` and `swivel-datatype`
    360: `sort` converts everything to a string
    340: `append` and `prepend` should `ReadArrayWithType`

Github repo: https://github.com/lmorg/murex

Docs on `foreach`: https://murex.rocks/docs/commands/foreach.html

[+] jolmg|4 years ago|reply
> I've always had trouble getting `for` loops to work predictably, so my common loop pattern is this:

The problem with the pipe-while-read pattern is that you can't modify variables in the loop, since it runs in a subshell.

[+] edgyquant|4 years ago|reply
For me I always used for loops and only recently (after a decade of using Linux daily) have learned about the power of piped-loops. It’s strange to me you are more comfortable with those than for loops, but I think it does make sense as you’re letting a program generate the list to iterate over. A pain point in for loops is getting that right, e.g. there isn’t a good way to iterate over files with spaces in them using a for loop (this is why I learned about piped loops recently.)
[+] michaelhoffman|4 years ago|reply
I have something like this in my bashrc:

   preexec ()
   {
       # shellcheck disable=2034
       _CMD_START="$(date +%s)"
   }

   trap 'preexec; trap - DEBUG' DEBUG

   PROMPT_COMMAND="_CMD_STOP=\$(date +%s)
       let _CMD_ELAPSED=_CMD_STOP-_CMD_START

       if [ \$_CMD_ELAPSED -gt 5 ]; then
           _TIME_STR=\" (\${_CMD_ELAPSED}s)\"
       else
           _TIME_STR=''
       fi; "

    PS1="\n\u@\h \w\$_TIME_STR\n\\$ "

    PROMPT_COMMAND+="trap 'preexec; trap - DEBUG' DEBUG"
Whenever a command takes more than 5 s it tells me exactly how long at the next prompt.

I didn't know about `$SECONDS` so I'm going to change it to use that.

[+] caymanjim|4 years ago|reply
Installing GNU stuff with the 'g' prefix (gsed instead of sed) means having to remember to include the 'g' when you're on a Mac and leave it off when you're on Linux, or use aliases, or some other confusing and inconvenient thing, and then if you're writing a script meant for multi-platform use, it still won't work. I find it's a much better idea to install the entire GNU suite without the 'g' prefix and use PATH to control which is used. I use MacPorts to do this (/opt/local/libexec/gnubin), and even Homebrew finally supports this, although it does it in a stupid way that requires adding a PATH element for each individual GNU utility (e.g. /usr/local/opt/gnu-sed/libexec/gnubin).
[+] R0flcopt3r|4 years ago|reply
You can use `wait` to wait for jobs to finish.

    some_command &
    some_other_command &
    wait
[+] masklinn|4 years ago|reply
One issue with that is it won't reflect the failure of those commands.

In bash you can fix that by looping around, checking for status 127, and using `-n` (which waits for the first job of the set to complete), but not all shells have `-n`.

[+] l0b0|4 years ago|reply

  > 1. Find and replace a pattern in a codebase with capture groups
  > git grep -l pattern | xargs gsed -ri 's|pat(tern)|\1s are birds|g'
Or, in IDEA, Ctrl-Shift-r, put "pat(tern)" in the first box and "$1s are birds" in the second box, Alt-a, boom. Infinitely easier to remember, and no chance of having to deal with any double escaping.
[+] DocTomoe|4 years ago|reply
What you are doing here is to propose a very specialist approach ("Why not just use a SpaceX Merlin Engine, this is ?") when a slightly more cumbersome general approach ("This is how you get from A to B") was described.

IDEA is nice if you have IDEA.

"But not everyone uses Bash" - very correct (more fond of zsh, personally), but this article is specifically about Bash.

[+] turbocon|4 years ago|reply
Yea, I've yet to come across a regex replacement tool as easy to use as jetbrains find & replace. Invaluable for certain tasks.
[+] marginalia_nu|4 years ago|reply
Using an IDE kind of handicaps you to only working with your IDE though. The shell works everywhere for every use case.
[+] barbazoo|4 years ago|reply
> git bisect is the "real" way to do this, but it's not something I've ever needed

uh, yeah, you did need it, that's why you came up with "2. Track down a commit when a command started failing". Seriously though, git bisect is really useful to track down that bug in O(log n) rather than O(n).

[+] deckard1|4 years ago|reply
For the given command, if the assumption is the command failed recently, it's likely faster than bisect. You can start it and go grab a coffee. It's automatic.

I wish my usage of bisect were that trivial, though. Usually I need to find a bug in a giant web app. Which means finding a good commit, doing the npm install/start dance, etc. for each round.

[+] masklinn|4 years ago|reply
Also provides super useful commands like skipping commits because some of your colleagues are assholes and commit non-working code.
[+] bloopernova|4 years ago|reply
This thread seems like a good place to ask this:

When you're running a script, what is the expected behaviour if you just run it with no arguments? I think it shouldn't make any changes to your system, and it should print out a help message with common options. Is there anything else you expect a script to do?

Do you prefer a script that has a set of default assumptions about how it's going to work? If you need to modify that, you pass in parameters.

Do you expect that a script will lay out the changes it's about to make, then ask for confirmation? Or should it just get out of your way and do what it was written to do?

I'm asking all these fairly basic questions because I'm trying to put together a list of things everyone expects from a script. Not exactly patterns per se, more conventions or standard behaviours.

[+] gjulianm|4 years ago|reply
> When you're running a script, what is the expected behaviour if you just run it with no arguments? I think it shouldn't make any changes to your system, and it should print out a help message with common options. Is there anything else you expect a script to do?

Most scripts I use, custom-made or not, should be clear enough in their name for what they do. If in doubt, always call with --help/-h. But for example, it doesn't make sense that something like 'update-ca-certificates' requires arguments to execute: it's clear from the name it's going to change something.

> Do you prefer a script that has a set of default assumptions about how it's going to work? If you need to modify that, you pass in parameters.

It depends. If there's a "default" way to call the script, then yes. For example, in the 'update-ca-certificates' example, just use some defaults so I don't need to read more documentation about where the certificates are stored or how to do things.

> Do you expect that a script will lay out the changes it's about to make, then ask for confirmation? Or should it just get out of your way and do what it was written to do?

I don't care too much, but give me options to switch. If it does everything without asking, give me a "--dry-run" option or something that lets me check before doing anything. On the other hand, if it's asking a lot, let me specify "--yes" as apt does so that it doesn't ask me anything in automated installs or things like that.

[+] mason55|4 years ago|reply
IMO any script that makes any real changes (either to the local system or remotely) should take some kind of input.

It's one thing if your script reads some stuff and prints output. Defaulting to the current working directory (or whatever makes sense) is fine.

If the script is reading config from a config file or envvars then it should still probably get some kind of confirmation if it's going to make any kind of change (of course with an option to auot-confirm via a flag like --yes).

For really destructive changes it should default to dry run and require an explicit —-execute flag but for less destructive changes I think a path as input on the command line is enough confirmation.

That being said, if it’s an unknown script I’d just read it. And if it’s a binary I’d pass —-help.

[+] scbrg|4 years ago|reply
A script is just another command, the only difference in this case is that you wrote it and not someone else. If its purpose is to make changes, and it's obvious what changes it should make without any arguments, I'd say it can do so without further ado. poweroff doesn't ask me what I want to do - I already told it by executing it - and that's a pretty drastic change.

Commands that halt halfway through and expect user confirmation should definitely have an option to skip that behavior. I want to be able to use anything in a script of my own.

[+] auno|4 years ago|reply
What you're getting at seems to be more about CLI conventions as opposed to script conventions specifically. As such, you might want to have a look at https://clig.dev/ which is a really comprehensive document describing CLI guidelines. I can't say I've read the whole thing yet, but everything I _have_ read very much made sense.

It's been discussed here on HN before.

https://news.ycombinator.com/item?id=25304257

[+] gmuslera|4 years ago|reply
What is your intended audience? It is you? A batch job or called by another program? Or a person that may or not be able to read bash and will call it manually?

A good and descriptive name comes first, then the action and the people that may have to run it are next.

[+] guruparan18|4 years ago|reply
I am confused how this works. I would assume `SECONDS` would just be a shell variable and it was first assigned `0` and then it should stay same, why did it keep counting the seconds?

    > SECONDS
    bash: SECONDS: command not found
    > SECONDS=0; sleep 5; echo $SECONDS;
    5
    > echo "Your command completed after $SECONDS seconds";
    Your command completed after 41 seconds
    > echo "Your command completed after $SECONDS seconds";
    Your command completed after 51 seconds
    > echo "Your command completed after $SECONDS seconds";
    Your command completed after 53 seconds
[+] notatoad|4 years ago|reply
I really love this style of blog post. Short, practical, no backstory, and not trying to claim one correct way to do anything. Just an unopinionated share that was useful to the author.

It seems like a throwback to a previous time, but honestly can't remember when that was. Maybe back to a time when I hoped this was what blogging could be.

[+] marcodiego|4 years ago|reply
We need a simpler regex format. One that allows easy searching and replacing in source code. Of course, some IDE's already to that pretty well, but I'd like to be able to do it from the command line with a stand alone tool I can easily use in scripts.

The simplest thing I know that is able to do that is coccinelle, but even coccinelle is not handy enough.

[+] marginalia_nu|4 years ago|reply
I think what we really need is one regex format. You have POSIX, PCRE, and also various degrees of needing to double-escape the slashes to get past whatever language you're using the regex in. Always adds a large element of guesswork even when you are familiar with regular expressions.
[+] f0e4c2f7|4 years ago|reply
It's not perfect but I like sed for this.
[+] 1vuio0pswjnm7|4 years ago|reply
To conserve host resources RFC 2616 recommends making multiple HTTP requests over a single TCP connection ("HTTP/1.1 pipelining").

The cURL project said it never properly suported HTTP/1.1 pipelining and in 2019 it said it was removed once and for all.

https://daniel.haxx.se/blog/2019/04/06/curl-says-bye-bye-to-...

Anyway, curl is not needed. One can write a small program in their language of choice to generate HTTP/1.1, but even a simple shell script will work. Even more, we get easy control over SNI which curl binary does have have.

There are different and more concise ways, but below is an example, using the IFS technique.

This also shows the use of sed's "P" and "D" commands (credit: Eric Pement's sed one-liners).

Assumes valid, non-malicious URLs, all with same host.

Usage: 1.sh < URLs.txt

       #!/bin/sh
       (IFS=/;while read w x y z;do
       case $w in http:|https:);;*)exit;esac;
       case $x in "");;*)exit;esac;
       echo $y > .host
       printf '%s\r\n' "GET /$z HTTP/1.1";
       printf '%s\r\n' "Host: $y";
       # add more headers here if desired;
       printf 'Connection: keep-alive\r\n\r\n';done|sed 'N;$!P;$!D;$d';
       printf 'Connection: close\r\n\r\n';
       ) >.http
       read x < .host;
       # SNI;
       #openssl s_client -connect $x:443 -ign_eof -servername $x < .http;
       # no SNI;
       openssl s_client -connect $x:443 -ign_eof -noservername < .http;
       exec rm .host .http;
[+] 1vuio0pswjnm7|4 years ago|reply
Heres another way to do it without the subshell, using tr.

       #!/bin/sh
       IFS=/;while read w x y z;do
       v=$(echo x|tr x '\34');
       case $w in http:|https:);;*)exit;esac;
       case $x in "");;*)exit;esac;
       echo $y > .host
       printf '%s\r\n' "GET /$z HTTP/1.1";
       printf '%s\r\n' "Host: $y";
       printf 'Connection: keep-alive'$v$v;done \
       |sed '$s/keep-alive/close/'|tr '\34\34' '\r\n' > .http;
       read x < .host;
       # SNI;
       #openssl s_client -connect $x:443 -ign_eof -servername $x < .http;
       # no SNI;
       openssl s_client -connect $x:443 -ign_eof -noservername < .http;
       exec rm .host .http;
[+] themk|4 years ago|reply
The curl binary will reuse the TCP connection when fed multiple URLs. Infact it can even use HTTP2 and make the requests in parallel over a single TCP connection. Common pattern I use is to construct URLs with a script and use xargs to feed to curl.
[+] geocrasher|4 years ago|reply
I want to like this, but the for loop is unnecessarily messy, and not correct.

   for route in foo bar baz do
   curl localhost:8080/$route
   done
That's just begging go wonky. Should be

  stuff="foo bar baz"
  for route in $stuff; do
  echo curl localhost:8080/$route
  done
Some might say that it's not absolutely necessary to abstract the array into a variable and that's true, but it sure does make edits a lot easier. And, the original is missing a semicolon after the 'do'.

I think it's one reason I dislike lists like this- a newb might look at these and stuff them into their toolkit without really knowing why they don't work. It slows down learning. Plus, faulty tooling can be unnecessarily destructive.

[+] lambic|4 years ago|reply
Even more correct would be to use an array:

  stuff=("foo foo" "bar" "baz")
  for route in "${stuff[@]}"; do
    curl localhost:8080/"$route"
  done
[+] marginalia_nu|4 years ago|reply
These are the types of stuff you typically run on the fly in the command line, not full blown scripts you intended to be reused and shared.

Here's a few examples from my command history:

  for ((i=0;i<49;i++)); do wget https://neocities.org/sitemap/sites-$i.xml.gz ; done

  for f in img*.png; do echo $f; convert $f -dither Riemersma -colors 12 -remap netscape: dit_$f; done
Hell if I know what they do now, they made sense when I ran them. If I need them again, I'll type them up again.
[+] masklinn|4 years ago|reply
Seems like you could say the same thing about every snippet e.g. running jobs is the original purpose of a shell, (3) can be achieved using job specifications (%n), and using the `jobs` command for oversight.
[+] gcmeplz|4 years ago|reply
Thanks for the note about the semicolon! Added it
[+] penguin_booze|4 years ago|reply
If I find myself running a set of commands in parallel, I'd keep a cheap Makefile around: individual commands I want to run in parallel will be written as phony targets:

  all: cmd1 cmd2
  
  cmd1:
      sleep 5
  
  cmd2:
      sleep 10

And then

  make -j[n]
[+] usefulcat|4 years ago|reply
> Use for to iterate over simple lists

I definitely use this all the time. Also, generating the list of things over which to iterate using the output of a command:

    for thing in $(cat file_with_one_thing_per_line) ; do ...
[+] phone8675309|4 years ago|reply
Why not?

    xargs -n1 command < file_with_one_thing_per_line
[+] make3|4 years ago|reply
that's like bash 101, not sure why it's in there
[+] gnubison|4 years ago|reply
It’s simple enough to use macos sed here instead of needing gsed: just use ‘-i ''’ instead of ‘-i’.
[+] harvie|4 years ago|reply
curl localhost:8080/{foo,bar,baz}