14 April 2008

Deployment Gone Wrong

Here’s a situation:

Your development team has begun implementing a shared enterprise infrastructure. Nobody on the team has ever done this before and they do not take the time to think about how updates will be handled for the software as more applications begin to use it.

The first version of the shared infrastructure goes into production but it is not yet complete. There are enhancements and bug fixes that need to be done before it is fully implemented. Meanwhile, there are multiple applications being written to use the shared infrastructure. All of these projects are running concurrently with the enhancements and bug fixes for the infrastructure.

There are many things that need to happen in order for this scenario to work with minimal problems and no outages for the applications using the shared infrastructure. Off the top of my head I suggest that you need configuration management, a well-defined deployment process that is controlled and repeatable, regression testing before anything is ever put into user-acceptance testing or production, and a well-defined and controlled build process to start.

What happens if your environment doesn’t have all let alone any of these controls in place? The answer is that the infrastructure will fail in a spectacular way at the least opportune moment. Let’s look at what could happen if just one of these controls, such as a well-defined deployment process, is not in place.

If you do not have repeatable or automated deployment processes, you could end up missing steps and having the deployment fail anywhere from subtle to spectacular ways. If your processes are not automated and also not well documented, you could end up with just one person who knows how to deploy the product. This will put you in an at-risk position if that person is out sick or decides to leave. This could also lead to critical deadlines being missed.

All of this will make the entire department look bad to the customer which can potentially lead to cut funding or lost contracts. The problem for me, as a business analyst, is that I am not managing these risks on any one project; I am managing them from a departmental process standpoint.

Therefore, I must document the potential problems from this lack of process in my risk plan for every project. Consequently, I will do things such as make sure I schedule the user acceptance testing at least several days after the deployment to the testing environment. Another mitigation plan I might use is to make sure that I have an escalation plan for deployment problems and that I implement it immediately to avoid last minute issues. Risk management and mitigation can avoid some problems, but it will never offer a substitute for good processes.

Clearly this is not a preferred solution. Rather, a good solution would be to have a well-defined and controlled deployment process to limit the risk involved. The process must be developed from within to get as much buy-in as possible. That said, it must also be enforced from above to ensure compliance by all.

In some cases, all efforts to get the department to institute good build practices will fail. Unfortunately, this means the business analyst or project manager must be prepared to deal with the risk as best they can.

Photo by Nictalopen at Flickr


  1. The level of a teams maturity can be measured in their approach to configuration management :-)

    From my personal experience, the one area most often overlooked by all parties, and also not mentioned here is the treatment of in-flight data, particularly in workflow systems. That is data that is currently being processed in some manner and is in an indeterminate state when the migration occurs.

    The other often overlooked area is impact to existing data. A new field is added and existing records all get the new default value, but often that default value does not make sense for all stages of the lifecycle.

    This is where the value of a good BA comes to the fore. Being able to work through the implications of a change on existing data and work with the technical team to cater for the identified impacts. Sadly, it has been my experience that this task is usually left to the technical team to take care of with mixed results.

    The topic is a huge one and it is great that you bring it up. However, it saddens me that your focus seems to be from necessity in micro-managing the IT team in doing the job they are meant to be doing themselves as a means of risk mitigation, and in doing so overlooking the BA's role in configuration management, which in my mind should primarily be analysis of the impact from the change, and the corresponding verification of the success of that change.

    I have worked in many traditional process orientated teams and configuration management plays a big part in the teams process. Even then, configuration management can be a difficult beast to keep tamed.

    I have also worked in a few Agile green fields projects, where everything was new and ongoing migrations where not such a big deal as we were only responsible for the initial drops into Production. It would be interesting to know what experience other people have had with Agile projects on mature systems and whether configuration management on such teams is problematic.

  2. it saddens me that your focus seems to be from necessity in micro-managing the IT team in doing the job they are meant to be doing themselves as a means of risk mitigation, and in doing so overlooking the BA's role in configuration management,

    Believe me Andrew, it saddens me just as much as it saddens you. In my opinion, these people shouldn't need to be micro-managed at this level. In fact, if I could get management support, I would have had this process issue addressed.

    I am hoping that the CMMI level 3 assessment we are being forced into will solve the problem. Unfortunately that is at least a year (and several product release cycles for multiple products) away.

    I think that most BAs will try to address this from a process perspective. That's why I focus on the idea that one must accept that sometimes this issue will have to be addressed from a risk mitigation perspective.


  3. The only consistent solutions are to integrate often, automate deployment, and have automated integration tests that run whenever changes are made. In my experience, these things cannot be driven from outside of the team.

    @Andrew, I have experience with an Agile project with many releases over time. Configuration management has not been a particular problem - we have the minimum lightweight process we need in order to manage it. We have improved over time, responding to problems and building on our knowledge and experiences.