Wednesday, August 06, 2008


Does your shell script use temporary files? Does it clean up the temporaries after it dies? What if the user aborts your script early by using ^C? What if he feels mean and uses kill -9 instead?

Not using temporary files is the easiest way out of this mess, but unfortunately there are situations where this is unavoidable. Let's say, for example, that you want to diff the output of two programs. Using temporary filenames, this can be done as follows.

bash progA > outA.tmp bash progB > outB.tmp bash diff outA.tmp outB.tmp | progC bash rm outA.tmp bash rm outB.tmp

It's possible to use pipes to eliminate outA.tmp

bash progB > outB.tmp bash progA | diff - outB.tmp | progC bash rm outB.tmp

It's also possible to eliminate outB.tmp

bash progA > outA.tmp bash progB | diff outA.tmp - | progC bash rm outA.tmp

But with this technique it isn't possible to eliminate both temporaries at once. To see why this is is a problem, consider the case where progB is stuck in an infinite loop. In that case, the user is justified to kill your script using the magic CTRL-C combination; and even if the action isn't totally justified, the user can do whatever he wants and it's your job to keep your program from fooling up even when the user does. In any case, ^C will kill both progB and your script, which won't have the opportunity to delete its temporary file. It's not a very critical flaw, but I'd feel better if the script was robust enough to withstand this.

A naive solution could be to disable ^C.

bash trap "" SIGINT bash progA > outA.tmp bash progB > outB.tmp bash diff outA.tmp outB.tmp | progC bash rm outA.tmp bash rm outB.tmp

But this will only annoy the user, who will just kill your script using kill -9 instead of ^C. And you can't disable kill -9. So what would be a better solution?

The solution I've been advocating until today is as follows.

bash trap "rm -f outA.tmp outB.tmp" EXIT bash progA > outA.tmp bash progB > outB.tmp bash diff outA.tmp outB.tmp | progC

This will erase the temporaries on exit, whether the exit is normal or due to ^C. It still doesn't work against kill -9, but what does?

Well, I'll tell you what does. If user-mode code can't clean up on SIGKILL (the signal which -9 selects), then we'll use kernel-mode code. Incidentally, the kernel already contains code to clean up unclosed file descriptors, and we can use this to our advantage.

When we used "-" as an argument for diff, it really was just a short name for /proc/self/fd/0 (that is, standard input). This path names a virtual file which is deleted by the kernel when grep terminates. These virtual files correspond to file descriptors, so if we can create additional file descriptors, then we can create auto-cleaning temporary files.

read_fd3.rb fd3, fd4 = IO.pipe read_fd3.rb fd3.getc # wait until fd4 is opened by an outside process read_fd3.rb fd4.close read_fd3.rb exec(*ARGV) write_fd3.rb $stdout.close write_fd3.rb $stdout = "/proc/#{ARGV.shift}/fd/4", 'w' write_fd3.rb $stdout.write 'x' write_fd3.rb $stdout.flush write_fd3.rb exec(*ARGV) bash ( bash progA | ruby read_fd3.rb diff /proc/self/fd/0 /proc/self/fd/3 & bash ruby write_fd3.rb $! progB bash wait $! bash ) | progC

The first ruby program creates two linked descriptors and leaves them open, letting the exec'd program use them. In this case, the exec'd program is diff, and it is told to compare /proc/self/fd/0 against /proc/self/fd/3. The first is just standard input, which comes from progA. The second is one of the linked file descriptors which were left open.

The second ruby program writes to the other linked file descriptor. This effectively sends the output of progB to the virtual file which diff is reading, as needed.

write_fd3.rb also writes an additional byte before the transfering the control to progB. This byte is needed because the end-of-file only occurs once all programs have closed all of their handles on the virtual file. read_fd3.rb has one of those handles, but it cannot close it until write_fd3.rb has secured its own handle, or the end-of-file will come too soon. So write_fd3.rb sends one byte to tell read_fd3.rb that it is now safe to close its handle.

You'll notice that this virtual file strategy is a bit awkward to use. And now that you have learned how to do it by hand, I can reveal you a secret: this strategy was built into bash all along! The syntax is as follows.

bash diff <(progA) <(progB) | progC

Concise, isn't it? However, this isn't the end of the story. There are some programs, unfortunately, which look at their input multiple times. You can't do that. Closing /proc/self/fd/3 is just as irreversible as closing stdin! And you can't use fseek(), either. So sorry, virtual files freed by the kernel. I tried to make this work, I really did, but... you're ordered, I'm random. We're just not made for each other.

Here's a more flexible solution. If the dying process cannot cleanup, then another process should take care of it. Preferably, a process which always runs, a daemon process. Whose job is to allocate and delete temporary files. A given client would reveal its process-id to the daemon, which will allocate a file for the client and delete it once the process with the given process-id has disappeared from the system. It could have disappeared for different reasons, including normal termination, ^C, and even kill -9.

If you think implementing a daemon would be going through a lot of trouble just for a few temporary files, then you'll be happy to hear that I've already went through this trouble for you. I called my daemon filenamed, and the corresponding client, filename, can be used as follows.

bash diff $(progA | filename) $(filename $(progB)) | progC

I also provide a wrapper called "<", which can be used as a drop-in replacement for the bash syntax <(...) which I introduced earlier. My version, of course, allows fseek().

bash diff $(\< progA) $(\< progB) | progC

There are a few aspects of my daemon which I'm not that proud of, but which I don't intend to fix. First, it depends on ruby, even though it's use is not ruby-specific. I'm just too lazy to reimplement it in C. Second, it's not very secure. Clients shouldn't be able to take control of the daemon, but they could easily request it to keep files until process 1 dies, that is, forever. Clients shouldn't be able to modify the files of other clients, but they could easily read them.

If you can live with that, enjoy!

No comments: