top | item 26613607

(no title)

IMO, the bad parts Unix are of two semi-distinct kinds: the skin-deep kind, and the genetic defects.

The superficial problems are everywhere, easy to spot, and fun to complain about! The names of commands are obscure, the flags are inconsistent. Any utility's feature set is inherently arbitrary and its limitations are equally so. Just how to accomplish any particular task is often a puzzle, the traedeoffs between using different utilities for the same task is inscrutable, and the time spent contemplating alternative approaches within the tool kit is utterly worthless. Utilities' inputs, outputs, and messages aren't for humans, they're either for coprocesses or for the control layer running the utility; and so users are supposed to learn to conform to the software, rather than vice versa. There are a hodgepodge of minilanguages (sed, find, test, bc, dc, expr, etc.), but they're unnecessary and inefficient if you're working in an even halfway capable language (let's say anything at about the level of awk is potentially "halfway capable"); and so "shelling out" to such utilities is a code smell. The canonical shells are somewhat expressive for process control, but terrible at handling data: consequently, safe, correct and robust uses of the shell layer Unix is hard, maybe impossible; so nowadays most any use of the Unix shell in any "real" application is also a bad code smell.

I say these are skin-deep in the sense that in theory any particular utility, minilanguage, or shell can be supplanted by something better. Some have tried, but uptake is slow/rare. The conventional rationales for why this doesn't happen is economic: it's either not worth anyone's time to learn new tools that replace old ones, or the network-effect-induced value of the old ones is so high (because every installation has the old ones) that any prospective replacement has to be loads better to get market traction. I have a different theory, which I'll get to below.

But I also think there's a deeper set of problems in the "genetics" of Unix, in that it supports a "reductive" form of problem solving, but doesn't help at all if you want to build abstractions. Let's say one of the core ideas in Unix is "everything is a file", i.e., read/write/seek/etc. is the universal interface across devices, files, pipes, etc.). "Everything is a file" insulates a program from some (but not all!) irrelevant details of the mechanics of moving bytes into and out of RAM... by forcing all programs to contend with even more profoundly irrelevant details about how those bytes in RAM should be interpreted as data in the program! While it is sometimes useful to be able to peek or poke at bits in stray spots, most programs implicitly or explicitly traffic in in data relevant to that program. While every such datum must be /realized/ as bytes somewhere, operating on some datum's realization /as bytes/ (or by convention, as text) is mostly a place to make mistakes.

Here's an example: consider the question "who uses bash as their login shell?" A classical "Unixy" methodology to attacking such a problem is supposed to be to (a) figure out how to get a byte stream containing the information you want, and then (b) figure out how to apply some tools to extract and perhaps transform that stream into the desired stream. So maybe you know that /etc/passwd one way to get that stream on your system, and you decide to use awk for this problem, and type

awk -F: '$6 ~ "bash$" { print $1 }' /etc/passwd

That's a nicely compact expression! Sadly, it's an incorrect one to apply to /etc/passwd to get the desired answer (at least on my hosts), because the login shell in the 7th field, not the 6th. Now, this is just a trivial little error, but that's why I like it as an example. Even in the most trivial cases, reducing anything to a byte stream does mean you can use any general purpose tool to a problem, but it also means that any such usage is going to reinvent the wheel in exact proportion to how directly it's using that byte stream; and that reinvention is a source of needless error.

Of course the sensible thing to do in all but the most contrived cases is to perform your handling of byte-level representations with a dedicated a library that provides at least some abstraction over the representation details; even thin and unsafe abstractions like C structs are better than nothing. (Anything less than a library is imperfect: if all you've got is a separate process on a pipe, you've just traded one byte stream problem for another. Maybe the one you get is easier than the one you started with, but still admits the same kinds of incorrect byte interpretation errors.) And so "everything is a file", which was supposed to be great faciltiy to help put things together, is usually just an utterly irrelevant implementation detail beneath libraries.

And this gets me back around to why I think the superficial stuff hasn't changed all that much: I doubt that the "Unix way" of putting things together has really truly mattered enough to bother making the tools or shells substantially better. I got started on Unix in 1999, by which time it was already customary for most people I knew to solve problems inside a capable language for which libraries existed, rather than to use pipelines of Unix tools. (Back then there was lots of Perl, Java, TCL, Python, et al.; nowadays less Perl and TCL, more Ruby and JavaScript.) Sure, you've needed a kernel to host your language and make your hard drive get warm, but once you have a halfway capable language (defined above), if it also has libraries and some way to call C functions (which awk didn't), you don't need the Unix toolkit, or a wide range of the original features of Unix itself (pipes, fork, job control, hierarchical process structure, separate address spaces, etc.).

And that's just stuff related to I/O and pipes. One could look at the relative merits of Unix's take on the file namespace, Plan 9's revision of the idea, and then observe that "logical devices" addressed much of that set of problems as early by the early-to-mid 70s on TOPS-20 and VMS, without (AFAICT) accompanying propaganda about how simple and orthogonal and critical it is that there be a tree-shaped namespace (except that it's a DAG) and everything in the namespace works like a file (except when it doesn't).

My point is that people have said about Unix that it's good because it's got a small number of orthogonal ideas, and look how you can make those ideas can hang together to produce some results! That's all fine, though in practice the attempt to combine the small number of ideas ends up giving fragile, inefficient, and unmaintanable solutions; and what you need to do to build more robust solutions on Unix is to ignore Unix, and just treat it as a host for an ecology of your own invention or selection, which ecology will probably make little use of Unix's Unix-y-ness.

(As to why Unix-like systems are widespread, it's hard not to observe some accidents of history: it was of no commercial value to its owner at a moment when hardware vendors needed a cheap operating system. Commercial circumstances later changed so that it made sense for some hardware vendors to subsidize free Unix knockoffs. Commercial circumstances have changed again, and it still makes sense for some vendors to continue subisidizing Unix knockoffs. But being good for a vendor to sell and being good for someone to use can very often be different things...)

discuss

No comments yet.