Friday, June 10, 2011

Practical MS Build - Avoiding Spaghetti Builds

A spaghetti build, much like spaghetti code is a hair pulling experience. Most developers (but not nearly enough) know how to avoid spaghetti code, refactoring, extracting classes and methods from tight loops, removing unexpected consequences. Quit frequently however, these same principles don't get applied to our build scripts. Here I want to show an example of a spaghetti build and then some techniques on how to avoid them.

Here I put together an example of what I would consider a spaghetti build, regrettably some of this comes straight from the MS Build documentation:

<Project DefaultTargets="localBuild" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">

  <Choose>
    <When Condition=" '$(Configuration)'=='Server' ">
      <PropertyGroup>
        <databaseConnection>Data Source=staging;Initial Catalog=myDataBase;Integrated Security=SSPI;</databaseConnection>
        <nunitPath>c:\nunit</nunitPath>
        <projectVersion>1.1.0.0</projectVersion>
        <runIntegrationTests>True</runIntegrationTests>
      </PropertyGroup>
    </When>
    <When Condition=" '$(Configuration)'=='Local' ">
      <PropertyGroup>
        <databaseConnection>Data Source=localhost;Initial Catalog=myDataBase;Integrated Security=SSPI;</databaseConnection>
        <nunitPath>c:\nunit</nunitPath>
        <projectVersion>1.1.0.0</projectVersion>
        <runIntegrationTests>False</runIntegrationTests>
      </PropertyGroup>
    </When>
  </Choose>

  <Target Name="serverBuild">
    <CallTarget Targets="init" />
    <CallTarget Targets="compile"  />
    <CallTarget Targets="migrateDb" />
    <CallTarget Targets="test" />
    <CallTarget Targets="package" />
    <CallTarget Targets="deploy" />
  </Target>

  <Target Name="localBuild">
    <CallTarget Targets="init" />
    <CallTarget Targets="compile" />
    <CallTarget Targets="migrateDb" />
    <CallTarget Targets="test" />
  </Target>

  ...

</Project>


Anti-pattern: One Build to rule them all

At first glance a build like this might look like a good idea. The entirety of the build process has been simplified down to two targets and all configuration is handled by a single property and life seems pretty simple. The trade off however is flexibility, the above approach has in essence created the "one build to rule the all", it is an anti pattern that can result in individual tasks being unable to run in isolation.

Another problem is that it violates the repeatability principle. It is impossible to run a "server" build without being logged in to the server or by checking in code changes. A solid build system should instead be repeatable anywhere, no special rules should exist for a CI build. I addressed how to handle properties in this post.


Tasks should declare dependencies.

In the above example tasks didn't declare their dependencies because the meta tasks (clientBuild, serverBuild) called them in an explicit order. A cleaner and more flexible approach is to have each task declare it's dependencies and let MS Build work out which order to run them in. The testUnit task for example will depend on the code having been compiled first, so it should be declared like this:

<Target Name="testIntegration" DependsOnTargets="compile" >
  <Message Text="testIntegration" />
</Target>

The testIntegration tasks also needs an up to date database to run so will depend on the migrate task as well:

<Target Name="testIntegration" DependsOnTargets="compile;dbMigrate" >
  <Message Text="testIntegration" />
</Target>


A general rule of thumb is that if a task depends on the output of another task then that dependency should be explicitly declared.


Users should declare targets

The package task is an example of where the reverse is true. Before packaging the unit test and integration tests need to pass, it would seem natural to make these dependencies. Package however doesn't depend on the output of testUnit or testIntegration so it should be left up to the user to declare this dependency instead:
>msbuild build.proj /target:testUnit;testIntegration;package
This way the build still fails if the tests fail but someone can be working on the installer without worrying about long build times.

Another common example is code versioning, generating an AssemblyInfo class with the appropriate info. Here is a hypothetical version task to do this:

<Target Name="version" >
  <AssemblyInfoGen Version="$(version)" Output="build\src\AssemblyInfo.cs" />
</Target>

It would be tempting to have the compile task dependant on this, but this would trigger a full recompile every time (MS Build will only recompile if the files have changed). Release builds are usually the only one that cares about correct version so it doesn't make sense to slow down every build for one, relatively rare ocurence. The compile task will use the output from the version task but does not depend on it, It simply doesn't care one way or the other.

Anyone that does cares about the correct AssemblyInfo being generated should should make this explicit:
>msbuild build.proj /target:version;compile
When users declare their targets it keeps the dependency chain simple, shallow and, above all, understandable. Once again I want to stress that the CI server should be considered as just another user, no more or less important than the developers.


Don't call other tasks

MS Build come equiped with a call target task which should be avoided whenever possible. Many times I've seen people try to refactor build scripts to conditionally call other tasks. The above example had a "Server" build and a "local" build, other variations include "trunk" and "stable" as well as "test" and "release". What they all have in common is that the violate the repeatability principle. And the points I outlined above give you all you need to avoid this antipattern.

Another quite common example is having test task that executes an integration and unit test task (to save keystrokes:

<Target Name="test">
  <CallTarget Targets="testUnit;testIntegration" />
</Target>

A better alternative would be to declare them as dependencies:

<Target Name="test" DependsOnTargets="testUnit;testIntegration" />


Have many builds

One way these meta tasks come about is when something needs to be repeated more than once. A "stable" build for example might produce a 32 and a 64 bit version that would be a waste of time if every developer had to repeat for every check in. A far easier solution to this is to enlist he CI server rather than the build system. A typical project might contain the following CI configurations:

  • UltraProduct Trunk
  • UltraProduct Test
  • UltraProduct 1.2 32 bit
  • UltraProduct 1.2 64 bit
  • UltraProduct 1.4 32 bit
  • UltraProduct 1.4 64 bit

The details will differ wildly depending on your project. Some only need a stable and development version while others will need to maintain several stable versions and a few development branches simultaneously. Scrum developers might decide on a build for each sprint so as not to have people checking in half implimented features.

Remember, not all builds have to run every check in, a test build might only need to happen when the testers are ready for a new version, a release build can generally happen overnight and a trunk build doesn't need to pass all integration tests. Each build configuration will have different needs and should be unaffected by other configurations.

These are a few simple rules I've come up with from real world situations, developing builds scripts and working with others. If you have suggestions for more I'd like to hear them.