Bloggy Badger: shell

Introducing linediff, a quick ruby wrapper around diff to highlight changed words instead of changed lines.

#!/usr/bin/ruby
pre, post = $stdin.read.split("\n=======\n")
post[-1..-1] = ""

class String
  def bash
    `bash -c #{inspect.gsub("$", "\\$")}`
  end

  def binspect
    split("\n").map {|s|
      "echo #{s.inspect.gsub("$", "\\$")}"
    }.join(";")
  end

  def to_lines
    gsub("\n", "\\n").gsub(" ", "\n")
  end
  def to_lines!
    gsub!("\n", "\\n")
    gsub!(" ", "\n")
  end

  def to_words
    gsub("\n", " ").gsub("\\n", "\n")
  end
  def to_words!
    gsub!("\n", " ")
    gsub!("\\n", "\n")
  end
end

pre.to_lines!
post.to_lines!
len = pre.length+post.length

both = "diff -U#{len} <(#{pre.binspect}) <(#{post.binspect})".bash
both = both.split("\n")[3..-1]

pre = both.select {|s| s[0..0] != "+"}.join("\n")
post = both.select {|s| s[0..0] != "-"}.join("\n")

pre.gsub!(/^-/, "+")
pre.gsub!(/^ /, "")
post.gsub!(/^ /, "")

pre.to_words!
post.to_words!

puts "#{pre}\n=======\n#{post}"

I use it mostly to handle git conflicts which, in my current setup, show up more often than they should (I repeatedly sync with both a git and an svn repository, which I should not be doing according to git-svn). Here's an example conflict. Can you spot the difference?

<<<<<<< HEAD:thesis.tex
    Once we are done with binary functions, we turn to unary functions, shifting our focus from the aspect weavers to the aspects themselves. Our goal is to help programmers to build aquariums containing the aspects they need, while minimizing the number of proofs needed to achieve this goal. To this end, we provide several generic proofs that create, extend, and translate aquariums with aspects of specific forms. Note, however, that not every pair of functions that commute can be shown to do so using our proofs only, so the programmers are expected to complete our collection with proofs of their own whenever the aspects they need are not of the specific forms covered by our theorems.
    \begin{enumerate}
    \item For every associative and commutative binary function, such as those created using our unordered pair strategy, we can create an aquarium of unary functions corresponding to the elements of the algebraic datatype.
    \item Given an aquarium containing a function $f$, the aquarium can be extended with function $f^n$, for any natural number $n$. The function $f^n$ is the function which repeatedly applies, $n$ times, the function $f$. If $f$ is invertible, the theorem extends to negative integers.
=======
    Once we are done with binary functions, we turn to unary functions, shifting our focus from the aspect weavers to the aspects themselves. Our goal is to help programmers to build aquariums containing the aspects they need, while minimizing the number of proofs needed to achieve this goal. To this end, we provide several generic proofs that create, extend, and translate aquariums with aspects of specific forms. Note, however, that not every pair of functions that commute can be shown to do so using our proofs only, so the programmers are expected to complete our collection with proofs of their own whenever the aspects they need are of the specific forms covered by our theorems.
    \begin{enumerate}
    \item For every associative and commutative binary function, such as those created using our unordered pair strategy, we can create an aquarium of unary functions corresponding to the elements of the algebraic datatype.
    \item Given an aquarium containing a function $f$, the aquarium can be extended with function $f^n$, for any natural number $n$. The function $f^n$ is the function which repeatedly applies, $n$ times in total, the function $f$. If $f$ is invertible, the theorem extends to negative integers.
>>>>>>> df2a5b6eb97ded25a34412d6ff946e0c513da6f3:thesis.tex
    \item If $M$ is a functor, then applying $M$ to all the members of an aquarium transforms the aquarium into a new one, while preserving the pairwise commutativity of its elements.
    %\item We show how parametricity \cite{free_theorems} ensures that structure-manipulating aspects cannot interfere with element-manipulating aspects, thereby reducing the programmer's remaining burden of proof.
    \end{enumerate}

When I encounter such a conflict, I select the code between the "<<<<<<< ... >>>>>>>" and type "!linediff" to tell vim to filter the lines through my ruby script.

<<<<<<< HEAD:thesis.tex
    Once we are done with binary functions, we turn to unary functions, shifting our focus from the aspect weavers to the aspects themselves. Our goal is to help programmers to build aquariums containing the aspects they need, while minimizing the number of proofs needed to achieve this goal. To this end, we provide several generic proofs that create, extend, and translate aquariums with aspects of specific forms. Note, however, that not every pair of functions that commute can be shown to do so using our proofs only, so the programmers are expected to complete our collection with proofs of their own whenever the aspects they need are not of the specific forms covered by our theorems.
    \begin{enumerate}
    \item For every associative and commutative binary function, such as those created using our unordered pair strategy, we can create an aquarium of unary functions corresponding to the elements of the algebraic datatype.
    \item Given an aquarium containing a function $f$, the aquarium can be extended with function $f^n$, for any natural number $n$. The function $f^n$ is the function which repeatedly applies, $n$ times, the function $f$. If $f$ is invertible, the theorem extends to negative integers.
=======
    Once we are done with binary functions, we turn to unary functions, shifting our focus from the aspect weavers to the aspects themselves. Our goal is to help programmers to build aquariums containing the aspects they need, while minimizing the number of proofs needed to achieve this goal. To this end, we provide several generic proofs that create, extend, and translate aquariums with aspects of specific forms. Note, however, that not every pair of functions that commute can be shown to do so using our proofs only, so the programmers are expected to complete our collection with proofs of their own whenever the aspects they need are of the specific forms covered by our theorems.
    \begin{enumerate}
    \item For every associative and commutative binary function, such as those created using our unordered pair strategy, we can create an aquarium of unary functions corresponding to the elements of the algebraic datatype.
    \item Given an aquarium containing a function $f$, the aquarium can be extended with function $f^n$, for any natural number $n$. The function $f^n$ is the function which repeatedly applies, $n$ times in total, the function $f$. If $f$ is invertible, the theorem extends to negative integers.
>>>>>>> df2a5b6eb97ded25a34412d6ff946e0c513da6f3:thesis.tex
    \item If $M$ is a functor, then applying $M$ to all the members of an aquarium transforms the aquarium into a new one, while preserving the pairwise commutativity of its elements.
    %\item We show how parametricity \cite{free_theorems} ensures that structure-manipulating aspects cannot interfere with element-manipulating aspects, thereby reducing the programmer's remaining burden of proof.
    \end{enumerate}

Note that my script detects the "=======" separator and diffs between the lines appearing before and after it. It should be trivial to adapt it to examine different files, like the original diff does. Anyway, the result is that the changed words get prefixed with a "+", which I highlight using vim's ":hlsearch" option.

<<<<<<< HEAD:thesis.tex
    Once we are done with binary functions, we turn to unary functions, shifting our focus from the aspect weavers to the aspects themselves. Our goal is to help programmers to build aquariums containing the aspects they need, while minimizing the number of proofs needed to achieve this goal. To this end, we provide several generic proofs that create, extend, and translate aquariums with aspects of specific forms. Note, however, that not every pair of functions that commute can be shown to do so using our proofs only, so the programmers are expected to complete our collection with proofs of their own whenever the aspects they need are +not of the specific forms covered by our theorems.
    \begin{enumerate}
    \item For every associative and commutative binary function, such as those created using our unordered pair strategy, we can create an aquarium of unary functions corresponding to the elements of the algebraic datatype.
    \item Given an aquarium containing a function $f$, the aquarium can be extended with function $f^n$, for any natural number $n$. The function $f^n$ is the function which repeatedly applies, $n$ +times, the function $f$. If $f$ is invertible, the theorem extends to negative integers.
=======
    Once we are done with binary functions, we turn to unary functions, shifting our focus from the aspect weavers to the aspects themselves. Our goal is to help programmers to build aquariums containing the aspects they need, while minimizing the number of proofs needed to achieve this goal. To this end, we provide several generic proofs that create, extend, and translate aquariums with aspects of specific forms. Note, however, that not every pair of functions that commute can be shown to do so using our proofs only, so the programmers are expected to complete our collection with proofs of their own whenever the aspects they need are of the specific forms covered by our theorems.
    \begin{enumerate}
    \item For every associative and commutative binary function, such as those created using our unordered pair strategy, we can create an aquarium of unary functions corresponding to the elements of the algebraic datatype.
    \item Given an aquarium containing a function $f$, the aquarium can be extended with function $f^n$, for any natural number $n$. The function $f^n$ is the function which repeatedly applies, $n$ +times +in +total, the function $f$. If $f$ is invertible, the theorem extends to negative integers.
>>>>>>> df2a5b6eb97ded25a34412d6ff946e0c513da6f3:thesis.tex
    \item If $M$ is a functor, then applying $M$ to all the members of an aquarium transforms the aquarium into a new one, while preserving the pairwise commutativity of its elements.
    %\item We show how parametricity \cite{free_theorems} ensures that structure-manipulating aspects cannot interfere with element-manipulating aspects, thereby reducing the programmer's remaining burden of proof.
    \end{enumerate}

Tada! Now I know which words have changed since my last unfortunate sync-and-fix-conflicts cycle.

Does your shell script use temporary files? Does it clean up the temporaries after it dies? What if the user aborts your script early by using ^C? What if he feels mean and uses kill -9 instead?

Not using temporary files is the easiest way out of this mess, but unfortunately there are situations where this is unavoidable. Let's say, for example, that you want to diff the output of two programs. Using temporary filenames, this can be done as follows.

bash progA > outA.tmp bash progB > outB.tmp bash diff outA.tmp outB.tmp | progC bash rm outA.tmp bash rm outB.tmp

It's possible to use pipes to eliminate outA.tmp

bash progB > outB.tmp bash progA | diff - outB.tmp | progC bash rm outB.tmp

It's also possible to eliminate outB.tmp

bash progA > outA.tmp bash progB | diff outA.tmp - | progC bash rm outA.tmp

But with this technique it isn't possible to eliminate both temporaries at once. To see why this is is a problem, consider the case where progB is stuck in an infinite loop. In that case, the user is justified to kill your script using the magic CTRL-C combination; and even if the action isn't totally justified, the user can do whatever he wants and it's your job to keep your program from fooling up even when the user does. In any case, ^C will kill both progB and your script, which won't have the opportunity to delete its temporary file. It's not a very critical flaw, but I'd feel better if the script was robust enough to withstand this.

A naive solution could be to disable ^C.

bash trap "" SIGINT bash progA > outA.tmp bash progB > outB.tmp bash diff outA.tmp outB.tmp | progC bash rm outA.tmp bash rm outB.tmp

But this will only annoy the user, who will just kill your script using kill -9 instead of ^C. And you can't disable kill -9. So what would be a better solution?

The solution I've been advocating until today is as follows.

bash trap "rm -f outA.tmp outB.tmp" EXIT bash progA > outA.tmp bash progB > outB.tmp bash diff outA.tmp outB.tmp | progC

This will erase the temporaries on exit, whether the exit is normal or due to ^C. It still doesn't work against kill -9, but what does?

Well, I'll tell you what does. If user-mode code can't clean up on SIGKILL (the signal which -9 selects), then we'll use kernel-mode code. Incidentally, the kernel already contains code to clean up unclosed file descriptors, and we can use this to our advantage.

When we used "-" as an argument for diff, it really was just a short name for /proc/self/fd/0 (that is, standard input). This path names a virtual file which is deleted by the kernel when grep terminates. These virtual files correspond to file descriptors, so if we can create additional file descriptors, then we can create auto-cleaning temporary files.

read_fd3.rb fd3, fd4 = IO.pipe read_fd3.rb fd3.getc # wait until fd4 is opened by an outside process read_fd3.rb fd4.close read_fd3.rb exec(*ARGV) write_fd3.rb $stdout.close write_fd3.rb $stdout = File.new "/proc/#{ARGV.shift}/fd/4", 'w' write_fd3.rb $stdout.write 'x' write_fd3.rb $stdout.flush write_fd3.rb exec(*ARGV) bash ( bash progA | ruby read_fd3.rb diff /proc/self/fd/0 /proc/self/fd/3 & bash ruby write_fd3.rb $! progB bash wait $! bash ) | progC

The first ruby program creates two linked descriptors and leaves them open, letting the exec'd program use them. In this case, the exec'd program is diff, and it is told to compare /proc/self/fd/0 against /proc/self/fd/3. The first is just standard input, which comes from progA. The second is one of the linked file descriptors which were left open.

The second ruby program writes to the other linked file descriptor. This effectively sends the output of progB to the virtual file which diff is reading, as needed.

write_fd3.rb also writes an additional byte before the transfering the control to progB. This byte is needed because the end-of-file only occurs once all programs have closed all of their handles on the virtual file. read_fd3.rb has one of those handles, but it cannot close it until write_fd3.rb has secured its own handle, or the end-of-file will come too soon. So write_fd3.rb sends one byte to tell read_fd3.rb that it is now safe to close its handle.

You'll notice that this virtual file strategy is a bit awkward to use. And now that you have learned how to do it by hand, I can reveal you a secret: this strategy was built into bash all along! The syntax is as follows.

bash diff <(progA) <(progB) | progC

Concise, isn't it? However, this isn't the end of the story. There are some programs, unfortunately, which look at their input multiple times. You can't do that. Closing /proc/self/fd/3 is just as irreversible as closing stdin! And you can't use fseek(), either. So sorry, virtual files freed by the kernel. I tried to make this work, I really did, but... you're ordered, I'm random. We're just not made for each other.

Here's a more flexible solution. If the dying process cannot cleanup, then another process should take care of it. Preferably, a process which always runs, a daemon process. Whose job is to allocate and delete temporary files. A given client would reveal its process-id to the daemon, which will allocate a file for the client and delete it once the process with the given process-id has disappeared from the system. It could have disappeared for different reasons, including normal termination, ^C, and even kill -9.

If you think implementing a daemon would be going through a lot of trouble just for a few temporary files, then you'll be happy to hear that I've already went through this trouble for you. I called my daemon filenamed, and the corresponding client, filename, can be used as follows.

bash diff $(progA | filename) $(filename $(progB)) | progC

I also provide a wrapper called "<", which can be used as a drop-in replacement for the bash syntax <(...) which I introduced earlier. My version, of course, allows fseek().

bash diff $(\< progA) $(\< progB) | progC

There are a few aspects of my daemon which I'm not that proud of, but which I don't intend to fix. First, it depends on ruby, even though it's use is not ruby-specific. I'm just too lazy to reimplement it in C. Second, it's not very secure. Clients shouldn't be able to take control of the daemon, but they could easily request it to keep files until process 1 dies, that is, forever. Clients shouldn't be able to modify the files of other clients, but they could easily read them.

If you can live with that, enjoy!

Bloggy Badger

Pages

Wednesday, December 09, 2009

intra-line diff

Wednesday, August 06, 2008

filenamed

Popular Posts

Blog Archive

Links

gelisam's shared items