top | item 1650759

Language for Unix command line utilities?

6 points| gn | 15 years ago

I code for a small biocomputing company. We download nucleotide sequence and taxonomy information in a number of unrelated formats from a number of public repositories and run various kinds of translation and analysis on it. Much of what we write are small command line tools that search, summarize, or transform certain types of (large trees of) text files. These programs look and feel a lot like traditional core Unix utilities; our most widely used programs are essentially just a specialized version of diff and a specialized version of grep, respectively. We used to prototype most of our utilities as shell scripts or in Perl; we redid shell scripts in Perl or C (or sometimes Java) if they became performance bottlenecks.

Some years ago we decided to move from Perl to Python for new projects because Perl programs had a way of always ending up as maintainability nightmares and because Perl seemed on the way out anyway. It largely worked, but we were never really, truly happy with our Python code. I suspect part of the reason is that Python can be (or at least feel) less succinct than even C if you do lots of low-level file system stuff with close error checking. The true reason is probably largely aesthetic. We can't explain what's wrong, we're just vaguely uneasy.

What other alternative to C should we be looking at? Ruby? Haskell? Is Go there yet? We have very open minds and are willing to consider pretty much anything that gives us reasonably easy and unmolested access to syscalls and their return values.

10 comments

order

gaius|15 years ago

I do a lot of command-line-tool-writing, my language at the moment is Haskell, tho' I still count myself as a Haskell beginner, it's proving to be very productive. The old adage that if Haskell code compiles it works is mostly true; bugs are caught up-front rather than after running in the wild for a bit; type inference, explicit pure/IO and functional composition are real boons. I need to build more familiarity with the libraries before I can be as productive in the short term (e.g. for "one offs") as I am in Python but already I believe in the long term (because one-offs never are!) I'm pulling ahead.

OCaml would be a good choice too. Both of these languages work very naturally with tree-like structures. Profiling/code coverage in both is very easy. IMHO there's no need to go to C for any but the most performance-critical code (and remember that your I/O etc is already in C in the kernel). The C approach of checking the return value of every syscall (e.g. no exceptions) is very cumbersome.

Case in point today: rather than persuade our Unix guys to roll out Expect across a bunch of new machines, I rewrote a ~200 line Expect script I had in ~60 lines of Haskell and deployed a binary instead of a script.

aidenn0|15 years ago

Only thing I can think of is AWK, but that's only slightly more readable than perl and is probably less maintainable since perl has vastly superior profiling and debugging tools.

I mostly use Python for the sorts of things you are mentioning. And from what you're saying you don't like a bout Python, I suspect that going to Ruby or Haskell or such is going to be worse. Python can more easily call the underlying C routines then either of those.

It would be nice if you could provide an example of something you think is inelegant and/or awkward in Python so that we could figure out which direction to point you.

I do a lot of coding in common lisp and some programming in haskell, but wouldn't recommend either of those based on what I've heard from you so far. There's a few dataflow style languages I've seen that would probably allow very succinct code, but they were all toys and performed quite poorly.

gn|15 years ago

> It would be nice if you could provide an example

For me personally the main source of unhappiness is error messages. In C I can say

if (!(f = open(name, "r"))) die(name);

where die is a tiny function that prints name, followed by whatever strerror has to say to the subject, formatted in the usual fashion. One line, done with it. The obvious, conventional Python equivalent is four lines long because both try: and except: insist on a line of their own. Since I cannot tell Python to produce succinct unixy error messages instead of rambling stack traces I have to catch and examine more or less every plausible exception. Some exceptions I can deal with close to the base level of my call stack in a butt-ugly fourty-line catch-all clause but a large proportion of my syscalls end up taking three lines extra each. I know it's a trivial problem, but I agree with pg you tend to get the more productive the more of your actual application logic you can see.

> I do a lot of coding in common lisp

We did experiment with clisp a while back; it turned out not to be a natural fit for problems that involve a lot of pathname, datetime, and stat info manipulation. If there was a reasonably modern Lisp that let me say things like (localtime (nth 9 (stat "/foo"))) I would go looking for it this very afternoon.

ggchappell|15 years ago

> ... Python can be (or at least feel) less succinct than even C ....

That's an interesting statement. Certainly, Python can be less succinct than Perl, particularly for small scripts where a quick "while(<>) {" and a regexp get most of your work done. But C??

> ... if you do lots of low-level file system stuff with close error checking.

Hmmm. In my experience, C's I/O libraries tend to make error checking something we leave by the wayside. Is there any chance that the real reason your Python scripts are longer, is that you actually check for, and properly handle, the errors there, while in C, you often don't?

In any case, I'll echo a comment from aidenn0:

> It would be nice if you could provide an example of something you think is inelegant and/or awkward in Python so that we could figure out which direction to point you.

chromatic|15 years ago

With a modern version of Perl and the autodie pragma active (part of Perl 5.10.1), your dissatisfaction with verbose handling can often simply disappear.

CyberFonic|15 years ago

Python is just fine. O'Reilly have a great book "Python for Unix and Linux System Administration" if you'd like some great suggestions and ideas.