Sunday, October 19, 2008

Rant against merge conflicts (and a solution strategy)

Version control used to be all about versioning. I never used RCS, but I bet the killer command was something like "rcs commit", to create a new version.

Projects soon grew bigger and version control began to be all about collaboration. The killer command was then "cvs checkout", to get the source and begin collaborating.

Then projects grew even more complicated and it was no longer sufficient just to version text, it was now important to version binaries and directories and symlinks and permissions and ancestry. As version control began to resemble filesystems, the magic occurred in commands like "svn add".

More recently, different distributed version control made forking much easier, and the killer command is now "git branch". But the future of version control, I believe, lies in the one command which never worked quite right: "git merge".

Because of merge conflicts, merging is the only version control command which requires human supervision. Removing this constraint would allow other programs to perform merges, a seemingly small advantage which, in the long run, could actually lead to some very useful and unexpected applications... I'm thinking about unionfs extensions, but there could be many others. Never underestimate the creativity of a million strangers!

I happen to be somewhat of an expert on conflicts, or at least I wish I was. For my master's thesis, I'm currently tackling weaving conflicts in aspect-oriented applications. My conclusion so far is that conflicts are inevitable, unless we work with very restricted structures which are guaranteed to merge properly. I'm trying to find suitably-restricted structures which would be expressive enough to program with.

I think that weaving and merging are very closely related operations. The first attempts to combine code, while the second attempts to combine filesystems. In both cases, unfortunately, the parts to be combined could turn out to be utterly incompatible. Current aspect-oriented languages tend to proceed with weaving anyway, postponing the disaster until runtime. Current version control systems take the opposite approach, refusing to merge until the user has fixed all of the conflicts. But I think it should be the other way around! Incorrect programs ought to fail as early as possible, possibly during the compilation or weaving phase. And for scripting purposes, as I've mentioned, I'd rather have merging succeed all the time.

For my aspects-related research, I've found several restricted structures which combine fine all the time, but filesystems can't be among them. Thus if we do insist on combining filesystems, I claim that conflicts are inevitable. That's why I don't insist on combining filesystems. In fact, you might remember how a few months back, I wished for a framework customizing the relationship between the git repository and the filesystem. This is precisely what I'm arguing for once again! Let's use one of my magic restricted structures inside the repository, so that merges succeed all the time. Then if users want to use the repository to track their filesystem, their logs, their cats, or whatever, let them write a translation between the repository and the thing they want to track. Now that would be doing one thing and doing it right, rather than letting filesystem concerns creep into the repository's code base.

What do you think? I am being more crazy than usual?

4 comments:

Martijn said...

You've made me very curious now; can you give us some examples of and insight into these restricted structures?

gelisam said...

My reply grew large enough to become a post of its own. Thanks for asking!

Johannes Schindelin said...

Hey, just stumbled over your blog. I think you are not crazy, but I think that you are using the wrong forum to post the question: post an example to the Git mailing list, and you will get usable feedback.

Ciao,
Dscho "who always misses replies to his comments"

gelisam said...

I used git as an example because it is a representative of the newest wave of version control systems. However, I was really talking about the SCM of the future here, whereas git is the SCM of the present. My suggestion is probably too radical for any particular existing SCM team to accept, but you're right: I should proof-check my idea against the experts, not the few readers who happen to have stumbled upon my blog.