I think whitespace handling is a problem, but not the only one. Shell data structures are awful and confusing, so best avoided. Error handling is also subpar, requiring boilerplate for every reasonable error situation. And there’s a constant need to be careful about stdout va stderr and how exactly a function returns data.
I find moving to even Python to be an inadequate answer, because the shell does two crucial things very very well - it manipulates the filesystem and runs commands in their most native format. And even Python is very cumbersome at those two tasks.
But sometimes you need good data structures/execution flow control, and good filesystem/command control, at the same time.
Intuitive, consistent, predictable whitespace handling would fix a lot of shell scripting problems, though.
(I haven’t given Powershell a serious shot, maybe I should.)
This is the #1 reason I enjoy using the plan 9 rc shell[1] for scripts. There's exactly one place where word splitting happens: at the point where command output is evaluated. And there, it's trivial to use any character you want:
x = `{echo hi there: $user} # evalutates to list ('hi' 'there:' 'ori')
y = `:{echo hi there: $user} # evalutates to list ('hi there' ' ori')
There's no other word splitting, so:
args = ('a' 'b c' 'd e f')
echo $#args
echo $args(3)
will print:
3
d e f
The shell itself is pleasantly simple; there's not much to learn[2]. And while it's not fun for interactive use on unix because it offloads too much of the interactive pleasantness to the plan 9 window system (rio), it's still great for scripting.
> I find moving to even Python to be an inadequate answer, because the shell does two crucial things very very well - it manipulates the filesystem and runs commands in their most native format. And even Python is very cumbersome at those two tasks.
Powershell is worth a deeper look if you have time. It can be odd "at first glance", especially trying to do things you know from other shells, but it does have a lot more data structures available for use in the shell, a different mostly predictable approach to error handling (including try { } catch { } just like a "real" language), a standardized arguments parsing model for its own commands and cmdlets (though you'll still likely be using lots of DOS or Unix commands with their own bespoke argument parsing, as usual), and now that Powershell is open source and cross-platform it is more useful than ever.
I use declare in a lot of my bash scripts for associative arrays and some other stuff. It can make scripts easier to read/reason about IMO. Something useful to learn if you’ve never heard of it.
> I think what bugs me most about this problem in the shell is that it's so uncharacteristic if the Bell Labs people to have made such an unforced error. They got so many things right, why not this?
All it takes to understand it is that it took a while for anyone to consider that spaces in filenames was (or might be) a thing.
I don't know that this is true, but having started on Unix about the same time as the author of TFA, I know that I found it quite disorienting when I first started interacting with Windows/macOS users who regularly used spaces in their filenames and thought nothing of it.
I suspect this wasn't an "error" as much as just a failure to grok the pretty basic idea that "bite me.jpg" is an entirely valid filename, and maybe more useful than "biteme.jpg" or "bite_me.jpg"
and steve's & edy's taxes.xls and 10" deck.doc and Surprise!.mov and... all the other shell active characters which are otherwise just normal characters that normal people will want to use.
If it's on the keyboard, they expect to be able to use it.
> I know that I found it quite disorienting when I first started interacting with Windows/macOS users who regularly used spaces in their filenames and thought nothing of it.
At least most Windows users are "trained" by the Windows that asterisk, question mark, vertical bar, double quote, prefix and suffix spaces "aren't valid characters for files" (in a weird way, it's a Windows limitation, not a NTFS one). I expect only the worst (case insensitive stuff) naming schemes when the files comes from a macOS user.
The author claims to have 35 years of shell usage (that, I believe), but these are the arguments he uses? I'll summarize for anyone who doesn't want to waste their time: "Quote your sh variables and expansions". That's the first thing I learned and the first thing I teach about shell. Using shell wrong and then complaining about it supposedly not working is a weak argument.
Let's see what he says in this article:
- "$* is literally useless": no, it's used to covert arguments to a string separated by the first char in IFS. Useful pretty much only for logging, but sometimes joining strings. $@ is POSIX and should be used in all other cases. A niche feature that generally shouldn't be used isn't an issue in shell
- $* and $@ are the same except when $@ is quoted: true, except in more modern shells like bash, where $@ is an array, not a specially-treated variable. I don't know who told him to use $* for 35 years, but the author should be mad at them instead
- To make it work properly you have to say `for i in ; do cp "$i" /tmp; done`: wrong. cp can't distinguish arguments and flags. You must either use `./` or `cp -t /tmp -- "$i"` (or some variation). It's correct in the glob/wordsplit sense, but saying this is proper is incorrect
- "And the shell doesn't have this behavior for any other sort of special character": He doesnt even mention why spaces are magic. Hint: they aren't. It's entirely controlled by IFS, which you can change. The default is space, tab, and newline. He also doesn't mention globbing, which can arguably be more disasterous.
An article about wordsplitting and he doesn't even mention it once? This is at best a rant
I know all the quoting and escaping rules there are to know and still — considering that it is a shell's job to work with text the way it is designed is just a major pain.
I am grateful for all those who historically came up with the concepts and put in the work, but if anybody were to design a text interface to computers where things can be piped into other commands etc. today they could heavily improve the usability of the thing and safe thousands of collective hours wasted. And peobably like when trying to create a sucessor to email, nobody would use it.
To be clear, I think shell has problems too. But this article is poorly written. I don't think it makes sense to incorrectly use a tool and then complain about how bad it is. And to qualify your article with 35 years of experience? This just reflects that the author didnt take time to learn shell for 35 years
Do yourself a favor and read Greg's entire wiki: https://mywiki.wooledge.org/BashFAQ and just learn how to use it properly, and then you can complain about how painful it is to learn or how easy it is to use incorrectly rather than how bad it is if you use it wrong.
Zsh does not split words by default, so you don't need to quote everything. This is the main reason I switch to Zsh instead of Bash when there I need a bit more than the base shell.
Allowing spaces in filenames where command invocations are pure text streams⁰ is the problem IMO, though one made several distinct times¹, and one that would be made again if all past occurrences were somehow removed from history, so suggesting we fix things that way is pointless as a way forward.
Requiring filenames to be quotes if they contain spaces, or optionally otherwise, would help – similar to CSV values with commas in. Though this opens up other issues: what about filenames containing quotes, and when nesting calls² the question of what part does the unpacking becomes a complex question without explicit structure. And to be honest we have enough trouble with clients sending us malformed CSV with unquoted commas that I can confidently state this wouldn't work either.
And you can't trust users to follow instructions, even internal users, where the limitation might be an issue. If you put a simple helper tool together and say “it won't work with spaces in filenames”, either it becomes a high priority ticket that the thing you jammed together as a favour doesn't work with spaces, or you get so sick or responding to problem reports that turn out to be due to the matter that you spend time fixing it up anyway. </rant>
--
[0] for systems that pass structured data, such as powershell, spaces are handled fine, though that would be a lot of work to retrofit into shells & utilities with a generally useful level of conformance
[1] like the convergent evolution of wings in pterodactyls/birds/bats or crab-like features all over the place
[2] for instance ssh to run a complex command line involving sudo
Note: this is an issue with sh/bash/fish/zsh, not shells in general. rc, from plan9 (http://doc.cat-v.org/plan_9/4th_edition/papers/rc), correctly has lists as a first-class data structure and none of the problems in the article happen; a list is a list, not a string split on whitespaces.
> Note: this is an issue with sh/bash/fish/zsh, not shells in general. rc (...) has lists as a first-class data structure and none of the problems in the article happen; a list is a list, not a string split on whitespaces.
Not really, bash, fish and zsh all have arrays as "first-class data structures"; it's only really a problem if you really want to limit yourself to POSIX shell syntax.
This isn’t related to pipes. This is related to how command lines and variables are tokenised and expanded. Zsh doesn’t have this issue, nor does most, if not all, other shells written in the last 10+ years.
I don't think I've ever had this kind of stupid error in a script that Shellcheck accepts, by the way, which is a more principled way to pick up the problems.
I was once asked in an Amazon interview a famous problem of theirs, find all phone numbers on their website. This was a real world problem they ran into. The first engineer there went about it in java, took days and pages of code (and still various corner cases were issues, it was slow, etc..) Another came along, threw down a few lines of shell and was done.
If only all UI applications replaced whitespace with underscores when making files. I know... that will never happen and won't help with existing files.
No, please no! Let's first figure out what it means to work with whitespace properly, then update our tools to do it. Behind-the-scenes mangling to work around flawed tools seems to be the macOS favored solution, with .DS_Store files dropped silently everywhere, and "case-insensitive but case-respecting" by default file systems, and increasingly byzantine approaches to file management so that it's harder and harder to figure out where anything is actually stored, which defeats the whole point of a Unix underlayer ….
That’s a typical software engineer solution, though. In the real world, putting spaces in file names is much more natural than avoiding it at all costs. We find it distasteful only because we have PTSD from using such dumb tools.
The real solution is something that does not split a variable on spaces, or that offers some control about how it splits things. These exist, there is no excuse to stick to bash in this day and age.
We really don’t need another layer of obfuscation between GUIs and the underlying file system.
As a MacOS user, I have long wished that Apple would make it possible to configure a custom file naming policy for the the dialog boxes presented when saving files.
In my dream world it would enable changing "The shell and its crappy handling of whitespace _ Hacker News.html" to "The-shell-and-its-crappy-handling-of-whitespace_Hacker-News.html"
> Even if [parsing command arguments via whitespace boundaries] was a simple or reasonable choice to make in the beginning, at some point around 1979 Steve Bourne had a clear opportunity to realize he had made a mistake. He introduced $* and must shortly therefter have discovered that it wasn't useful. This should have gotten him thinking.
First off, no, that's not a bug. The alternative is some variant of "here's a data structure to hold the list of arguments you need to manipulate manually". And that's how program execution works in "real programming language" environments (even things like PowerShell).
And it sucks, which is why we all continue to use the Bourne shell after four decades to solve these problems. What we really want is "here's a string to execute just like I typed it; go!". Sure, that's not what we think we want. And we might convince ourselves that mishandling based on that metaphor is a bad thing and write too-smart-for-our-own-good blog posts about it to show how smart we are.
But no, we don't want that. We want shell. If we didn't want shell, we wouldn't write shell scripts. And yet, here we are.
Again, four decades outweighs any amount of pontification in a blog post. If you aren't willing to start from a perspective of why this system remains so successful, you're probably not bringing as much insight to the problem as you think you are.
>Again, four decades outweighs any amount of pontification in a blog post.
Many more people used MS-DOS/Windows during that time.
>And it sucks, which is why we all continue to use the Bourne shell after four decades to solve these problems.
>But no, we don't want that. We want shell.
People continue to use Linux and Unix and interactive shell for good reasons. Shell as scripting shell is there simply because it's available, a form of Stockholm Syndrome I guess.
Ah, double quotes handling can be another nightmare as well, can't make this work:
function backup {
local SOURCE=$1
local DESTINATION=$2
local ID_RSA=$3
# VALID_SSH is hard-coded for testing.
local VALID_SSH="yes"
local REMOTE_SHELL=""
if [[ "${VALID_SSH}" == "yes" ]]; then
REMOTE_SHELL='-e "ssh -i '${ID_RSA}'"'
fi
rsync -av \
--no-perms \
--links \
$(if [[ "${VALID_SSH}" == "yes" ]]; then echo "${REMOTE_SHELL}"; fi) \
"${SOURCE}" "${DESTINATION}"
}
Been fiddling yesterday and today with no luck, it must be something very subtle. Tried a bunch of different things
from SO suggestions but none seem to work. Meh.
As a rule, `printf %q` or `${var@Q}` are very useful when building up quoted command strings.
But your main problem is that `REMOTE_SHELL` is a string, when you need an array. Though I suppose you could make it a string if you used `eval` around its whole use.
Auto-splitting variables is for passing arguments to a command you'll be using multiple times, or that you need to change under some condition, for example:
RSYNC_ARGS='--progress --human-readable --human-readable'
if some_condition; then RSYNC_ARGS="$RSYNC_ARGS --exclude-from=some_file"; fi
rsync $RSYNC_ARGS foo bar
I can't think of a use for $* though and would guess it probably existed before $@ and is still there just for backwards compatibility.
this reminds me that zsh auto-escaping \$ and \& on basically all my shells messes up pasting curl commands and other payloads... or doesn't and still looks them up in weird subshells.
> zsh auto-escaping \$ and \& on basically all my shells messes up pasting curl commands
This must be something you configured, or added in a plugin or something, because it's not the default behaviour, and AFAIK it's also not a simple setting.
You choose to expand a variable and then complain about not wanting the expansion?
Or about the behaviour of clearly documented special variables doing exactly what they're supposed to instead of something another special variable does? What are they unhappy about exactly, they don't like the symbol that was chosen??
These are all things you find out within minutes of reading the manpage. (one of the best written manpages out there, in fact).
Heck, this isn't even a "cannot whitespace" problem. It's expanded vs unexpanded tokens. Which is a feature of the language. Because it's text/stream based (not "despite"). 35 years of shell experience? I call bait.
Funny, i was just discussing migrating Plex files from Windows to Linux, and if it requires persisting whitespace in the filenames. My position was that "Linux" handles spaces fine, but if i can avoid dealing with escapes in filenames that would be nice.
Curious to read this article when it recovers from the HN load.
> I think what bugs me most about this problem in the shell is that it's so uncharacteristic if the Bell Labs people to have made such an unforced error.
Bell Labs fixed it with the shell called “tc”. They can’t be blamed that you ignored it.
[+] [-] jprete|2 years ago|reply
I find moving to even Python to be an inadequate answer, because the shell does two crucial things very very well - it manipulates the filesystem and runs commands in their most native format. And even Python is very cumbersome at those two tasks.
But sometimes you need good data structures/execution flow control, and good filesystem/command control, at the same time.
Intuitive, consistent, predictable whitespace handling would fix a lot of shell scripting problems, though.
(I haven’t given Powershell a serious shot, maybe I should.)
[+] [-] ori_b|2 years ago|reply
[1] http://shithub.us/cinap_lenrek/rc/HEAD/info.html
[2] http://man.9front.org/1/rc
[+] [-] pjmlp|2 years ago|reply
Thankfully Perl exists.
[+] [-] thiht|2 years ago|reply
In Perl, you can use backticks to execute commands for example
[+] [-] WorldMaker|2 years ago|reply
[+] [-] massysett|2 years ago|reply
https://www.oilshell.org/
[+] [-] pasc1878|2 years ago|reply
You can call pure python plus it has the simple running of command like other shells.
[+] [-] pram|2 years ago|reply
https://linuxhint.com/bash_declare_command/
[+] [-] svilen_dobrev|2 years ago|reply
btw csh/tcsh "" quoting rules are atrocious, avoid it..
[+] [-] PaulDavisThe1st|2 years ago|reply
All it takes to understand it is that it took a while for anyone to consider that spaces in filenames was (or might be) a thing.
I don't know that this is true, but having started on Unix about the same time as the author of TFA, I know that I found it quite disorienting when I first started interacting with Windows/macOS users who regularly used spaces in their filenames and thought nothing of it.
I suspect this wasn't an "error" as much as just a failure to grok the pretty basic idea that "bite me.jpg" is an entirely valid filename, and maybe more useful than "biteme.jpg" or "bite_me.jpg"
[+] [-] Brian_K_White|2 years ago|reply
If it's on the keyboard, they expect to be able to use it.
[+] [-] Gibbon1|2 years ago|reply
Course not have a hard separation between paths and file names is also not good.
The above would be better[+] [-] ElectricalUnion|2 years ago|reply
At least most Windows users are "trained" by the Windows that asterisk, question mark, vertical bar, double quote, prefix and suffix spaces "aren't valid characters for files" (in a weird way, it's a Windows limitation, not a NTFS one). I expect only the worst (case insensitive stuff) naming schemes when the files comes from a macOS user.
[+] [-] hn92726819|2 years ago|reply
Let's see what he says in this article:
- "$* is literally useless": no, it's used to covert arguments to a string separated by the first char in IFS. Useful pretty much only for logging, but sometimes joining strings. $@ is POSIX and should be used in all other cases. A niche feature that generally shouldn't be used isn't an issue in shell
- $* and $@ are the same except when $@ is quoted: true, except in more modern shells like bash, where $@ is an array, not a specially-treated variable. I don't know who told him to use $* for 35 years, but the author should be mad at them instead
- To make it work properly you have to say `for i in ; do cp "$i" /tmp; done`: wrong. cp can't distinguish arguments and flags. You must either use `./` or `cp -t /tmp -- "$i"` (or some variation). It's correct in the glob/wordsplit sense, but saying this is proper is incorrect
- "And the shell doesn't have this behavior for any other sort of special character": He doesnt even mention why spaces are magic. Hint: they aren't. It's entirely controlled by IFS, which you can change. The default is space, tab, and newline. He also doesn't mention globbing, which can arguably be more disasterous.
An article about wordsplitting and he doesn't even mention it once? This is at best a rant
[+] [-] atoav|2 years ago|reply
I am grateful for all those who historically came up with the concepts and put in the work, but if anybody were to design a text interface to computers where things can be piped into other commands etc. today they could heavily improve the usability of the thing and safe thousands of collective hours wasted. And peobably like when trying to create a sucessor to email, nobody would use it.
[+] [-] hn92726819|2 years ago|reply
Do yourself a favor and read Greg's entire wiki: https://mywiki.wooledge.org/BashFAQ and just learn how to use it properly, and then you can complain about how painful it is to learn or how easy it is to use incorrectly rather than how bad it is if you use it wrong.
[+] [-] vbernat|2 years ago|reply
[+] [-] frizlab|2 years ago|reply
[+] [-] pxeger1|2 years ago|reply
[+] [-] mbwgh|2 years ago|reply
`vidir [path]` will open an editor with the given directory as buffer contents. Editing and saving will translate to a sequence of `mv` invocations.
[+] [-] dspillett|2 years ago|reply
Requiring filenames to be quotes if they contain spaces, or optionally otherwise, would help – similar to CSV values with commas in. Though this opens up other issues: what about filenames containing quotes, and when nesting calls² the question of what part does the unpacking becomes a complex question without explicit structure. And to be honest we have enough trouble with clients sending us malformed CSV with unquoted commas that I can confidently state this wouldn't work either.
And you can't trust users to follow instructions, even internal users, where the limitation might be an issue. If you put a simple helper tool together and say “it won't work with spaces in filenames”, either it becomes a high priority ticket that the thing you jammed together as a favour doesn't work with spaces, or you get so sick or responding to problem reports that turn out to be due to the matter that you spend time fixing it up anyway. </rant>
--
[0] for systems that pass structured data, such as powershell, spaces are handled fine, though that would be a lot of work to retrofit into shells & utilities with a generally useful level of conformance
[1] like the convergent evolution of wings in pterodactyls/birds/bats or crab-like features all over the place
[2] for instance ssh to run a complex command line involving sudo
[+] [-] rakoo|2 years ago|reply
See https://drewdevault.com/2023/07/31/The-rc-shell-and-whitespa... to see the examples work out of the box as you woula naturally write them.
[+] [-] ElectricalUnion|2 years ago|reply
Not really, bash, fish and zsh all have arrays as "first-class data structures"; it's only really a problem if you really want to limit yourself to POSIX shell syntax.
[+] [-] shortrounddev2|2 years ago|reply
[+] [-] hnlmorg|2 years ago|reply
[+] [-] thesuperbigfrog|2 years ago|reply
Powershell's structured data can cause issues if you expect it to be similar to Unix-like shells.
For example, Powershell mangles piped binary data by default:
https://brianreiter.org/2010/01/29/powershells-object-pipeli...
Correct handling of piped binary data requires additional flags:
https://stackoverflow.com/questions/54086430/how-to-pipe-bin...
[+] [-] grrdotcloud|2 years ago|reply
Get-Process | Out-Host -Paging | Format-List
Case sensitive hyphenated compound word commands?
[+] [-] DrBazza|2 years ago|reply
[+] [-] Smaug123|2 years ago|reply
[+] [-] collinvandyck76|2 years ago|reply
[+] [-] PaulHoule|2 years ago|reply
[+] [-] seadan83|2 years ago|reply
There are some things shell does really well.
[+] [-] 13of40|2 years ago|reply
[+] [-] ape4|2 years ago|reply
[+] [-] JadeNB|2 years ago|reply
[+] [-] kergonath|2 years ago|reply
The real solution is something that does not split a variable on spaces, or that offers some control about how it splits things. These exist, there is no excuse to stick to bash in this day and age.
We really don’t need another layer of obfuscation between GUIs and the underlying file system.
[+] [-] foundart|2 years ago|reply
In my dream world it would enable changing "The shell and its crappy handling of whitespace _ Hacker News.html" to "The-shell-and-its-crappy-handling-of-whitespace_Hacker-News.html"
[+] [-] ajross|2 years ago|reply
First off, no, that's not a bug. The alternative is some variant of "here's a data structure to hold the list of arguments you need to manipulate manually". And that's how program execution works in "real programming language" environments (even things like PowerShell).
And it sucks, which is why we all continue to use the Bourne shell after four decades to solve these problems. What we really want is "here's a string to execute just like I typed it; go!". Sure, that's not what we think we want. And we might convince ourselves that mishandling based on that metaphor is a bad thing and write too-smart-for-our-own-good blog posts about it to show how smart we are.
But no, we don't want that. We want shell. If we didn't want shell, we wouldn't write shell scripts. And yet, here we are.
Again, four decades outweighs any amount of pontification in a blog post. If you aren't willing to start from a perspective of why this system remains so successful, you're probably not bringing as much insight to the problem as you think you are.
[+] [-] yyyk|2 years ago|reply
Many more people used MS-DOS/Windows during that time.
>And it sucks, which is why we all continue to use the Bourne shell after four decades to solve these problems.
>But no, we don't want that. We want shell.
People continue to use Linux and Unix and interactive shell for good reasons. Shell as scripting shell is there simply because it's available, a form of Stockholm Syndrome I guess.
[+] [-] liendolucas|2 years ago|reply
[+] [-] o11c|2 years ago|reply
But your main problem is that `REMOTE_SHELL` is a string, when you need an array. Though I suppose you could make it a string if you used `eval` around its whole use.
[+] [-] fragmede|2 years ago|reply
[+] [-] Izkata|2 years ago|reply
[+] [-] jks|2 years ago|reply
[+] [-] pluto_modadic|2 years ago|reply
no such problems in bash.
[+] [-] arp242|2 years ago|reply
This must be something you configured, or added in a plugin or something, because it's not the default behaviour, and AFAIK it's also not a simple setting.
[+] [-] tpoacher|2 years ago|reply
You choose to expand a variable and then complain about not wanting the expansion?
Or about the behaviour of clearly documented special variables doing exactly what they're supposed to instead of something another special variable does? What are they unhappy about exactly, they don't like the symbol that was chosen??
These are all things you find out within minutes of reading the manpage. (one of the best written manpages out there, in fact).
Heck, this isn't even a "cannot whitespace" problem. It's expanded vs unexpanded tokens. Which is a feature of the language. Because it's text/stream based (not "despite"). 35 years of shell experience? I call bait.
[+] [-] unshavedyak|2 years ago|reply
Curious to read this article when it recovers from the HN load.
[+] [-] rantingdemon|2 years ago|reply
Joking aside, as a relatively newbie to the star nix world and liking it - shell scripting in Linux/Unix is broken.
The object handling approach that powershell provides just seem to be a better approach.
[+] [-] YesThatTom2|2 years ago|reply
Bell Labs fixed it with the shell called “tc”. They can’t be blamed that you ignored it.
https://www.scs.stanford.edu/nyu/04fa/sched/readings/rc.pdf