MOM does provide all of the facilities we need for collections, but the experience of working with (often around) MOM is like taunting the Marquis De Sade.

 

First there is the issue of scale and its relation to total cost of ownership (TCO).  With our current implementation of MOM 2000, an Agent Management server (called DCAM) is capable of processing approximately 300 agents (1 agent per server) despite dozing along at 150MB of used memory and 10% total CPU use.  Therefore, if the enterprise consists of 3,000 servers an operations team would need 11 servers dedicated for MOM (10 Agent Managers and 1 SQL Server).  At $350 per processor (and more than one processor is wasted on MOM 2000) that means a minimum of $35,000 in MOM licenses + OS Licenses + SQL Licenses + server cost.

 

The problem is a carryover from when Microsoft bought the codebase for NetIQ.  Each Agent and Agent Manager has a “cache” folder with multiple files representing the multiple collection points and paths the an Agent or Manager can have.  On the Manager side, this cache is not limited in size, however, the practical limit is somewhere around 100 MB.  Once the folder contents grows over 100 MB any or all of the cache files can be easily corrupted causing a cascading failure throughout the Manager and its owned Agents.

 

There is one huge benefit to this caching system, especially in regards to the Agents.  MOM will not miss a single event.  The Agent tracks its read state to determine how for back in needs to reach, reads in an scans the data for rule matches then caches the data for transport.  With a properly working MOM environment you will receive 100% of configured events.  That’s a big deal for an operations crew.

 

That limitation goes right to the heart of another pain point in MOM.  Discovery.  Currently there are two methods for getting Agents onto servers and collecting in MOM, Discovery Scan and Manual Input.  Manual Input is simple, just provide a list of server names and MOM will attempt to contact them and install agents.  The Scan method, the only really reasonable method for a larger enterprise is limited to scanning and matching NetBios names to wildcard strings or regular expressions.

 

So you set the rule for a particular Manager to look for names that match the wildcard string “AA*”.  All name matches would then be assigned to the specific Manager.  This I great as long as:

            Your NetBios scan is clean.  WINS/DNS entries for non-existent servers will not cause a real problem, but they do make the scan even more inefficient.

            You have less than 300 server that match AA*.

 

So what do you do if you have more than 300 servers match a wildcard?  Well, you can try a regular expression if your naming convention provides for it.  A regular expression match would looks something like “^AA and [13579]$” so that names beginning with AA and ending in an odd number would be assigned.

 

But none of the options is good for the enterprise.  Enterprise applications shouldn’t speak names or rely on an extraneous list to be clean in order to have an efficient process.  Enterprises speak IP.  Servers are grouped and separated by subnet.  Believe me when I tell you we are lobbying hard for a subnet based Discovery option.

 

Will