PMM History
PMM is the third offering from the Operations Center of Excellence. The first two were SLM - Service Level Management and DCM - Desired Configuration Monitor. PMM is in essence a MOM Tuning SKU, but the focus (just like the other two offerings) is to make ITIL/MOF real for our customers.
The recurrent theme within each of the Operations SKUs is the idea of both a Process stream and a Technology stream. PMM continues that tradition. The Process stream places focus on Incident and Problem Management as well as sustained engineering (more detailed articles on this later). The Technology stream focuses on insuring that the Management Packs were configured correctly after installation, gathering data about "noisy" alerts from the OnePoint database through custom reports, and reviewing tuning steps with the customer.
The actual tuning process occurs when each stream has completed their tasks. At this point the customer has either integrated MOM into their existing Incident/Problem Management processes (or we have helped the establish those) and we have the data we need to start the tuning process from the technology stream.
The Technology Stream
The Technology and Process streams start simultaneously. This article will focus on the Technology stream. I'll have another post next week that specifically deals with the process stream.
The idea with the Technology stream was to develop a way to gather data about "noisy" alerts with minimal impact to a customer's environment. I needed a way to do this that was both reliable, and reproduceable.The most reliable way to get this information seems to be gathering it from the customer's OnePoint database. I considered using only data from the MOM Data warehouse, but I have run into a fairly large number of customers who didn't implement it. In order to make the engagement reproduceable, it seemed to me that best way was to pull the data from the production database.
Now the question became, "What tool do I use to gather the data?". Again it came down to the least impact to the customer. From my days as a consultant I know how difficult it is to ask a customer to install something like SQL Reporting Services if they have standardized across the Enterprise on another reporting or data access solution. Yet, I need an easy way to retrieve and display the data so that they can help us determine which alerts to tune.
The solution I chose was to create a Virtual Server image that has SQL Reporting Services loaded and access reports from there. This also gave me the added benefit of being able to incorporate Sharepoint web services from which I created the MOM Rules Record of Change (again with minimal customer impact).
Since I now had a platform to work from, we began building Reports that would pull the data we needed from the OnePoint Database. We are still in the process of building those, but I expect to have them completed within the next few weeks. (More updates later).
Experience has shown us that many of the alerts customers see in the field are due to misconfiguration of the Exchange Management Pack. Even using the Wizard, some customers configure synthetic transactions between every Exchange Store in their environment. Not only does this incur high traffic costs, it also radically increases the probability of chatter alerts. So the first thing we do during the engagement is ask the customer ro rerun the EXMP Wizard so we can see the original settings used. We also ask them to export the XML file at the end and provide change management for it as new servers are brought on board.
Once we are confident that the EXMP is configured correctly, we configure the Virtual Server image and custom reports to point at the customer's OnePoint database. In a large environment, this could be multiple databases or even simply a top tier database depending upon how they are configured. We then begin to gather the data that will be used during the tuning process.
I spent last week in Redmond with our Exchange MOM servers running these reports and starting the tuning process there. As with the other SKUs we have created, we want to make sure the processes we take to the field are the same we use internally. By RTM in September, we will be fully utilizing this SKU within MSIT.
(7-Jul-2006)
More on Reports
With the 4th of July, this week was a short week. My focus this week was on getting the reports up to speed. I am very pleased with the progress. Currently I have 5 linked reports that run queries against the OnePoint database and return results. This blog site doesn't lend itself well to posting graphics, so I won't be able to provide screen shots, but I can describe a little about what the reports provide.
I tried to be as descriptive as possible with the report names. They are:
The linking is as follows:
Alerts by Computer Group with Alert Counts
/ \
/ Alerts by Computer Group by Severity
Alerts by Computers in Computer Groups with Alert Counts
Alerts by Computer Group by Computer \
Alerts by Computer by Severity
The introductory report (Alerts by Computer Group with Alert Counts) lists the default Exchange Management Pack Computer Groups (Exchange 2000 Server, Exchange 2003 Server) as well as custom computer groups defined by the customer. (These have to be hand coded into the report prior to the engagement.) Beside each of the Computer Groups are columns labeled # Warning, # Error, # Critical Error, and # Service Unavailable. In each of the columns, I list the number of each severity of alert per computer group. All columns in this report are hyperlinked. If you select one of the Computer Group Names, you will jump to a report (Alert by Computers in Computer Groups with Alert Counts) listing all the computers in the computer group with a list of Alerts of differing severity for each (Described next). If you select any of the values listed under the alert severity columns, you are linked to a report ( Alerts by Computer Group by Severity) that shows you all computers within the computer group that have logged the selected severity of alert.