Wednesday, June 04, 2008

Rant against .gitkeep (and solution)

Git claims to be miles ahead of Subversion, looking down on it as "only" fixing CVS's flaws. Well, one annoying CVS flaw is its failure to archive the existence of directories on the server side, and it seems that git is still behind Subversion in this respect.

Of course, it would be incorrect to state that CVS doesn't track directories at all. It tracks changes to individual files, where a "file" is really a path from the root CVS checkout. On the server side, a CVS archive is simply a working copy where each file is replaced by a "<file>,v" version, archiving diffs in RCS format. So each RCS file on the server side is located in a directory which corresponds to a directory in the working copy.

In addition to the text-based changes which current diff formats convey, RCS can also distinguish between an empty file and a deleted file. It would have been really simple to use a "<dir>,v" file to distinguish between empty and deleted directories, but for some reason, CVS didn't. And it's really annoying.

When you check out a CVS project, the client will create all the directories it knows of, including the ones which were once needed a long time ago to store some obsolete file in the project. To avoid polluting your working copy in this manner, the CVS client offers a "-P" option, which skips the creation of directories whose files are deleted. But this is annoying too (albeit less so), since this will skip the "bin" directory and the build process will fail. Assuming you weren't foolish enough to archive your binaries into CVS, that is.

The standard fix is to archive a "bin/.keepme" file whose sole purpose is to exist whenever "bin" is supposed to exist. It's very similar to the "<dir>,v" idea I suggested above, except that it's called "<dir>/.keepme,v", and that it's visible to users. So it's still annoying, but vanishingly so.

The Subversion team decided that they could get rid of the last few specks of annoyingness in the obvious way, by keeping track of the directory proper.

The git team decided to stick with the CVS way of doing things. They even have an FAQ advising to add the ".keepme" files (which they call ".gitignore").

Anyway, I wrote these wrapper scripts to hide those ".keepme" files from the users. I name them ".gitkeep", and they're still slightly visible in that you have to create and add them manually, but once you commit them, they're gone. And my version of git-status reminds you to add them, instead of pretending that it added everything when in fact it ignored empty directories.


Jakub Narebski said...

This issue (usually under an umbrella of "tracking empty directories") comes now and then on git mailing list. There were even proposals how to solve this "in core". Please search git mailing list archives.

Why it is not (as an option) included in Git? Evidently patches (adding entries for directories in the 'index') were not good enough to be accepted, even if could speed up git by keeping track of which directories didn't change. One of the reason was that it goes against stated goal of Git to be "content tracker"; empty directories, one can argue, are not 'contents'.

By the way, if some directories are required part of build, their existence should be checked by the build system, and they should be created if necessary. I mean, that the problem is in the build system, not in the SCM...

gelisam said...

Oh, I've read them. In fact, the very reason I'm posting a set of wrappers instead of a patch is because I was inspired by this message, which argued against adding the feature to the core. I wish there was a common framework for customizing the relationship between the git repository and the filesystem.

I think it's up to users to decide whether they want to fix their build scripts or their checkout scripts. "mkdir -p bin" was also an obvious solution back when CVS was popular, yet many people chose to use .keepme files instead.

gelisam said...

Pol Llovet is maintaining a version of my wrappers here. Thanks Pol!

gelisam said...

> I wish there was a common framework
> for customizing the relationship
> between the git repository and the filesystem.

Turns out there is! Checkout "smudge" and "clean" filters in