follow nigelwatson at http://twitter.com
Welcome to MSDN Blogs Sign in | Join | Help

shlock (1) - Nigels Retrospective

Nigel Watson, an Architect Advisor at Microsoft, based in Melbourne Australia.
"Windows Nightmare" or "Operations and Management Failure"?

Interesting article on /. linking to a story about a big5 consulting firm in Japan replacing a bunch of Windows servers with OpenBSD.  The article (including the usual feral baying in the /. comment threads) neglects to mention that any IT department worth it's salt wouldn't let it's infrastructure get into such a sorry state that there's a "bad relationship between IT and users" (due presumably to dying Domain Controllers, network outages and email servers going up and down like yoyos). 

What makes this IT department imagine that they'll be able to manage a new OpenBSD infrastructure any better than their past performance on the windows platform.  These days, it's almost always the case that this sort of mess is the result of poor operations and management disciplines.  It's really not good enough any more to simply throw your hands in the air and blame the platform.  There was an element of irony about the article (at least there was for me when I read it):

Hmmm...

Notice anything special?      :)

[Update] I received the following feedback from Gabe:

I saw your comment on the slashdot article about switching from Windows to OpenBSD, and I believe you may have missed some points.

From reading the article it looks like the big problem was that management insisted on using CheckPoint running on Windows as a firewall. It seems CheckPoint was the cause of many problems (email down?) and so it was replaced by OpenBSD's standard firewall. If it was purely a Win problem, he could have just run CheckPoint on RedHat.

Also, the bad AD server was replaced by a working AD server, not some OpenBSD substitute. The IT guy who was responsible responded in another forum about how he was quoted out of context, and it looks like what he did really made sense. He came in, saw the problems, and fixed them properly with either new Windows boxes or new OpenBSD boxes depending on worked the best.

The point that I was trying to get across was that this episode sounded more like a failure of the IT department to properly manage and operate their infrastructure than a problem rooted in the underlying platform.

AD server failing? Why is there only one of them? If it's important enough - and I suggest it is - there should have been multiple domain controllers on the network. Things fail, redundancy is good - and this of course is a platform invariant axiom. A sensible IT department would have assessed this risk and put safeguards in place (i.e. run more than one DC in the domain).

Checkpoint on Windows? Why was it failing? There are lots of orgs out there that run CP on the windows platform (as well as others) with no problems, DOS attacks and all. This suggests to me that either (a) the firewall in question was not properly hardened (platform invariant problem) or (b) the platform on which the firewall was running was not patch-current (platform invariant problem) or (c) they are trying to run the firewall on an obsolete version of Windows (i.e. NT), in which case replatforming FW-1 makes sense (even though CP still support it as a platform).

Again, even if his comments were taken out of context, IMHO this story reveals that - at the company in question - the IT department failed to properly manage their infrastructure. I'd question whether, going forward, they have addressed this so as to minimise having to fight similar fires in future.

Posted: Wednesday, October 26, 2005 8:46 PM by shlock
Filed under:

Comments

alaw said:

After 20 years in this profession, I can honestly say that 90% of the production problems I have run into can be traced back to poor operational procedures. I spend most of my time resolving issues due to poor change management, either at the server, or software levels. This is something all developers should be taught in under-graduate school. Learning object oriented programming, or Servce Oriented design, means nothing, if you don't have your change management and other Operational controls in place!
# October 26, 2005 8:32 AM
New Comments to this post are disabled
Page view tracker