Saturday, July 30, 2011

Maven's assumptions

Maven does both - 'makes assumptions' as well as 'enforces' a lot of standards and good practices for project programming and management at an enterprise-level. It also provides some good features such as resource filtering, though not enabled by default, suggest good programming practice. I wish to list down all of them as they come to my mind from the very obvious to the subtle ones below:

  1. Projects would have a standard directory layout relative to project's home directory to place source code, target binaries, test source code, test binaries, resources etc as described in the Introduction to Standard Directory Layout.
  2. Projects would adopt unit-testing as a standard practice by including the test-compile and test phases as part of the build lifecycle. Maven even goes a step further by failing a build if the test source does not compile or the unit tests fail in its default build lifecycle.
  3. Projects provide a clean directory/folder separation between application source code and the resources required (such as Spring applicationContext files, Hibernate configuration files, log4j properties file etc) and not put everything together in project home folder.
  4. Projects need not hardcode application properties inside the configuration resources or source code and instead make use of resource filtering - i.e. a property like jdbc.driverName could be externalized in the /src/main/filters and its value will be substituted wherever it is referenced. Thus any changes to it are localized to one file under /src/main/filters although it can be referenced in several places in configuration and code.
  5. Modern-day applications (represented by an aggregate project model / POM in Maven) will be divided into separate modules - app-domain-model, app-DAO, app-business-logic, app-web-modules, app-webservices-modules, app-utils etc - instead of old-school monolithic applications. Maven multi-module projects and the Reactor plugin help achieve this.
  6. Projects would need to be built for various platforms (Windows, Unix, Linux etc) and deployment environments (dev, staging, test, production) and builds do differ based on the platform and enviroment needed. Eg. different database servers required for each enviroment thereby a different jdbc.url, jdbc.username, jdbc.password for each environment. Maven Profiles are a great feature that if leveraged well can make the build process seamless across environments.

Maven references

Recently I have been learning Maven as it is widely deployed and used at an organizational level in a very clever and effective manner. However, the Maven documentation, although quite extensive for an open-source project does not flow logically to educate a newbie. After scouring through most of the available online information in a not so logical progression and then trying to connect the pieces together, I came to a conclusion that the following would be the ideal order to comprehend Maven.

Online reference books made available by Sonatype.

All about Maven settings.

Core of Maven - Project Object Model and all its intricacies - multi-module builds, profiles, reactors etc

Tuesday, July 12, 2011

CVS and (no) Atomic Commits

Every project that I have worked so far has used a different version control system. My history of using version control systems has been thus - Rational ClearCase in my very first job/project, then Visual Sourcesafe, then Subversion for a very brief period of time (about 3 months) and then Perforce for the longest time as yet(for a little over 4 years) and now I am currently using CVS.

Using Perforce was a pleasant experience and its atomic commits combined with its changelist/changeset feature is something that I am missing while using CVS currently. In CVS, each file committed to the repository has its own, independent version number and history which sure is a limitation. I do not remember enough about Rational ClearCase except that its config-specs pretty much allowed a much efficient branching and merging in a manner that could emulate atomic commits. So this blog entry is about atomic commits and changesets and how they reduce the frequency of build breaks which seem to happen every once in a while in a high-traffic team workload.

A typical reason why a build could break is that developers fail to commit all of the files that go in as part of the task or a bug fix. As a consequence when one or more of the checked-in files try to access constructs(classes, methods, constants etc) newly introduced in the file that was forgotten to be checked-in. And thats the build breaking right in your face!

In my opinion, the kind of SCM being used as the source code repository(read CVS, Visual SourceSafe) can also be a contributing factor towards this. Ideally, when a developer commits files to the repository, it is great if the files are grouped together as a single atomic change towards the bug fix, new feature development or new task. So even mentally developers start to group all files together, thereby reducing the probability of a build break. Also in the event of a networking failure, atomic commits save the build by ensuring all or none of the files are submitted to the repository.

Perforce takes one more step beyond these atomic commits - if a developer modifies any of the local files that are mapped to the repository, it automatically puts them in a pool called 'changelist'. That way there is no chance that a developer could have forgotten to check-in any file.
Changelists serve two purposes:
- to organize your work into logical units by grouping related changes to files together
- to guarantee the integrity of your work by ensuring that related changes to files are checked in together.

Now different version control systems record this atomic commit differently in their history. For Subversion, a changeset/changelist is just a collection of changes with a unique name. The commit will create a new revision number which can forever be used as a "name" for the change. As per Subversion documentation:
    "In Subversion, a global revision number N names a tree in the repository: it's the way the repository looked after the Nth commit. It's also the name of an implicit changeset: if you compare tree N with tree N-1, you can derive the exact patch that was committed. For this reason, it's easy to think of “revision N” as not just a tree, but a changeset as well. If you use an issue tracker to manage bugs, you can use the revision numbers to refer to particular patches that fix bugs—for example, “this issue was fixed by revision 9238”. Somebody can then run svn log -r9238 to read about the exact changeset which fixed the bug, and run svn diff -r9237:9238 to see the patch itself."

Perforce keeps track of each file's independent revision history as well as changelist numbers. Again you can associate changelist numbers with a bug/task tracking database and we can go back and forth between the SCM and the bug/task tracking database.

As I searched through the web, I did find some solutions and forums that discussed ways to get around this limitation which will be the next task on my agenda.