Tuesday, July 12, 2011

CVS and (no) Atomic Commits

Every project that I have worked so far has used a different version control system. My history of using version control systems has been thus - Rational ClearCase in my very first job/project, then Visual Sourcesafe, then Subversion for a very brief period of time (about 3 months) and then Perforce for the longest time as yet(for a little over 4 years) and now I am currently using CVS.

Using Perforce was a pleasant experience and its atomic commits combined with its changelist/changeset feature is something that I am missing while using CVS currently. In CVS, each file committed to the repository has its own, independent version number and history which sure is a limitation. I do not remember enough about Rational ClearCase except that its config-specs pretty much allowed a much efficient branching and merging in a manner that could emulate atomic commits. So this blog entry is about atomic commits and changesets and how they reduce the frequency of build breaks which seem to happen every once in a while in a high-traffic team workload.

A typical reason why a build could break is that developers fail to commit all of the files that go in as part of the task or a bug fix. As a consequence when one or more of the checked-in files try to access constructs(classes, methods, constants etc) newly introduced in the file that was forgotten to be checked-in. And thats the build breaking right in your face!

In my opinion, the kind of SCM being used as the source code repository(read CVS, Visual SourceSafe) can also be a contributing factor towards this. Ideally, when a developer commits files to the repository, it is great if the files are grouped together as a single atomic change towards the bug fix, new feature development or new task. So even mentally developers start to group all files together, thereby reducing the probability of a build break. Also in the event of a networking failure, atomic commits save the build by ensuring all or none of the files are submitted to the repository.

Perforce takes one more step beyond these atomic commits - if a developer modifies any of the local files that are mapped to the repository, it automatically puts them in a pool called 'changelist'. That way there is no chance that a developer could have forgotten to check-in any file.
Changelists serve two purposes:
- to organize your work into logical units by grouping related changes to files together
- to guarantee the integrity of your work by ensuring that related changes to files are checked in together.

Now different version control systems record this atomic commit differently in their history. For Subversion, a changeset/changelist is just a collection of changes with a unique name. The commit will create a new revision number which can forever be used as a "name" for the change. As per Subversion documentation:
    "In Subversion, a global revision number N names a tree in the repository: it's the way the repository looked after the Nth commit. It's also the name of an implicit changeset: if you compare tree N with tree N-1, you can derive the exact patch that was committed. For this reason, it's easy to think of “revision N” as not just a tree, but a changeset as well. If you use an issue tracker to manage bugs, you can use the revision numbers to refer to particular patches that fix bugs—for example, “this issue was fixed by revision 9238”. Somebody can then run svn log -r9238 to read about the exact changeset which fixed the bug, and run svn diff -r9237:9238 to see the patch itself."

Perforce keeps track of each file's independent revision history as well as changelist numbers. Again you can associate changelist numbers with a bug/task tracking database and we can go back and forth between the SCM and the bug/task tracking database.

As I searched through the web, I did find some solutions and forums that discussed ways to get around this limitation which will be the next task on my agenda.

No comments: