Python, Python, Python

August 31, 2011 at 7:46 pm
filed under Coding
Tagged , ,

Oh, god. I still have a bunch of rant left in me. So here we go, Internet: yet another angry rant to add to the pile.

Sometimes, in the course of one’s life, one is left with a one-off task. In this case, I needed to call a binary a whole lot of times, and do something with the output each time. The details aren’t important; I just needed to write a wrapper script for this binary, do a modest amount of processing on it, and then output the result to a file. This is a pretty common task, one to which the command line in general and *nix in particular is well-suited.

Now, for various reasons, I prefer not to do this via shell scripts. I don’t have a hard and fast rule for when or why. Unless you routinely write shell scripts (I don’t), you’ll inevitably spend a bunch of time looking stuff up that you looked up, oh, six months ago. Or at least I do. I don’t enjoy it, so if I can’t do it with a few commands, maybe one loop or so, then I prefer to use a scripting language. I have the advantage of better/clearer failure modes and simpler syntax and I feel like I’m learning something more powerful and more widely applicable in the process.

One of my “favorite” things to forget and look up again is the difference between [[ ]] and [ ] in tests. Another one is how you loop over a file or output from a command (while read line; do bar --baz $line; done <foo and foo | while read line; do bar $line; done, respectively).

If you have no trouble remembering this stuff or you write scripts often enough, I suppose this is moot. There was a time when I was scripting enough that I didn’t have to look all this stuff up every time, but that time is past. YMMV and all that.

Anyway, the point is that I probably could’ve done this with something like this (which is simplified somewhat):

ARGS="--bar=baz"
CMD="~/bin/foo"
KEY="quux"
MIN=0
MAX=10000
 
for $(seq 0 1000) num do;
  $CMD $num $ARGS | while read line;
    if egrep -q $KEY <<<$line; then
      value=$(egrep -o '[1-9]+$')
        echo "$num,$value"
    fi
  done
done

I’m sure I have a bajillion bugs in there. For example, I am not at all confident about the outer for loop syntax. And while it was easy to write, I’m not doing anything with command line args, doing any error handling, or anything like that. (On the other other hand, I uh underestimated how easy this was. That’s partially why I’m convinced there are bugs in there.)

I still don’t know how to write idiomatic (typo: idiotmatic) Ruby, so this is going to be very rough. Still, it feels natural to me, minus one or two hitches:

ARGS = "--bar=baz"
CMD = "~/bin/foo"
KEY = /quux/
MIN = 0
MAX = 10000
 
def main
  (MIN..MAX).each do |i|
    out = %x[ #{CMD} #{i} #{ARGS} ]
    out.split('\n').each do |line|
      KEY.match(line) do |m|
        value = line.split(' ')[1]
        print "#{i},#{value}\n"
      end
    end
  end
end

I’m not confident about the outer loop once again, and I’m not sure the call to match() will do what I want, let alone whether it’s elegant. Still, I feel pretty good about it. I love Ruby’s pattern of passing in blocks.

Fortunately or unfortunately, I had to use Python. And don’t get me wrong: I love Python. It is through Python that I learned to love scripting languages. Processing a file line by line was, I think, the real epiphany. And list comprehensions are wonderfully expressive.

But man alive is it awful for this sort of thing. I’m not even going to bother writing it out in Python. I’ll just excerpt, from the 2.7 subprocess docs. Here’s what they say should replace backticks output = \mycmd myarg``:

output = Popen(["mycmd", "myarg"], stdout=PIPE).communicate()[0]

Mind you, this is if you did from subprocess import *. Generally I don’t, which means it ends up looking like this:

output = \
  subprocess.Popen(["mycmd", "myarg"], subprocess.stdout=PIPE).\
  communicate()[0]

Yes, once you have it written, sequestered in its own function that you never touch again, it’s not so bad. However, this is firmly in the category of something I will not be able to (and have not been able to) remember months later.

It is also not discoverable in the sense that it’s highly particular— stdout=PIPE? Really? Compare and contrast opening a file (for line in open('somefile'): print foo(line) or [foo(line) for line in open('somefile')) with this monster. Even my Ruby example could have used backticks if I hadn’t remembered the %x syntax. The best I can say about subprocess is that it’s a) better than Popen in Python 2.4 (or whatever), and b) it’s easier to search the web for than %x.

The kicker of course is that rest of the Python script was very easy to write! However, since calling the binary was changing some external state, though, I had to make sure it was doing the right thing. In the end building and testing the call to subprocess.Popen() took longer than the rest of the script. In an otherwise elegant, no-bullshit, batteries included language, the subprocess module is a terrible blemish. It doesn’t look any better in Python 3.0, either, unfortunately.

Even more unfortunately, this is ultimately why it would be my preference to use either Ruby or shell. Ruby seems to work well for a variety of tasks, from writing a full-fledged webapp to some grungy text manipulation. You don’t have to compromise because it’s quite easy to use it for Python-y things as well as Perl-y things. It’s just a shame that Python seems to treat this case with a bizarre kind of fussiness incongruent with the rest of the language and standard library.

%d bloggers like this: