In case you missed this story when it hit the headlines last month, one of the UK’s major government departments lost the use of most of its desktop PCs for nearly a week. Now we’re seeing more details on the causes of the outage:

EDS has admitted that an error by one of its computer operators during a Microsoft Windows upgrade caused 40,000 PCs at the Department of Work and Pensions (DWP) to crash last month.

EDS said several steps have already been taken to avoid this happening again, including increased checks by EDS' senior engineers and management staff when such upgrades are implemented.

(Via Rod Trent)

This should serve as a reminder for all of us that THERE IS NO SILVER BULLET. If you’ve got a really tough job to do (upgrading 40,000 PCs), then merely having a great tool is not enough — you also need good processes in place to make sure that it gets used correctly. Otherwise, your putative silver bullet just becomes another way to shoot yourself in the foot.

EDS are one of the few contracting companies big enough to take on a job like this, and they have a constantly evolving body of processes to stop just such an event from happening. If even they can get it wrong, then the rest of us should really start paying attention.

A good place to start is ITIL, a set of best practices for IT service management that is getting some real traction in the industry, especially in Europe. ITIL is designed to be vendor-neutral, so that we can all share a common language while still allowing vendors to customize it to fit their particular products. The Microsoft-customized version of ITIL is the Microsoft Operations Framework (MOF), which I’ve mentioned here a couple of times. If you can find a MOF or ITIL course near you being taught by a qualified instructor, I highly recommend making the time to attend it. The specific course I took was MOF Essentials, and it was both a great learning experience and a lot of fun.