Welcome to MSDN Blogs Sign in | Join | Help

New SKU PMM - Proactive Monitoring with MOM

PMM History

PMM is the third offering from the Operations Center of Excellence. The first two were SLM - Service Level Management and DCM - Desired Configuration Monitor. PMM is in essence a MOM Tuning SKU, but  the focus (just like the other two offerings) is to make ITIL/MOF real for our customers.

The recurrent theme within each of the Operations SKUs is the idea of both a Process stream and a Technology stream. PMM continues that tradition. The Process stream places focus on Incident and Problem Management as well as sustained engineering (more detailed articles on this later). The Technology stream focuses on insuring that the Management Packs were configured correctly after installation, gathering data about "noisy" alerts from the OnePoint database through custom reports, and reviewing tuning steps with the customer.

The actual tuning process occurs when each stream has completed their tasks. At this point the customer has either integrated MOM into their existing Incident/Problem Management processes (or we have helped the establish those) and we have the data we need to start the tuning process from the technology stream.

The Technology Stream

The Technology and Process streams start simultaneously. This article will focus on the Technology stream. I'll have another post next week that specifically deals with the process stream.

The idea with the Technology stream was to develop a way to gather data about "noisy" alerts with minimal impact to a customer's environment. I needed a way to do this that was both reliable, and reproduceable.The most reliable way to get this information seems to be gathering it from the customer's OnePoint database. I considered using only data from the MOM Data warehouse, but I have run into a fairly large number of customers who didn't implement it. In order to make the engagement reproduceable, it seemed to me that best way was to pull the data from the production database.

Now the question became, "What tool do I use to gather the data?". Again it came down to the least impact to the customer. From my days as a consultant I know how difficult it is to ask a customer to install something like SQL Reporting Services if they have standardized across the Enterprise on another reporting or data access solution. Yet, I need an easy way to retrieve and display the data so that they can help us determine which alerts to tune.

The solution I chose was to create a Virtual Server image that has SQL Reporting Services loaded and access reports from there. This also gave me the added benefit of being able to incorporate Sharepoint web services from which I created the MOM Rules Record of Change (again with minimal customer impact).

Since I now had a platform to work from, we began building Reports that would pull the data we needed from the OnePoint Database. We are still in the process of building those, but I expect to have them completed within the next few weeks. (More updates later).

Experience has shown us that many of the alerts customers see in the field are due to misconfiguration of the Exchange Management Pack. Even using the Wizard, some customers configure synthetic transactions between every Exchange Store in their environment. Not only does this incur high traffic costs, it also radically increases the probability of chatter alerts. So the first thing we do during the engagement is ask the customer ro rerun the EXMP Wizard so we can see the original settings used. We also ask them to export the XML file at the end and provide change management for it as new servers are brought on board.

Once we are confident that the EXMP is configured correctly, we configure the Virtual Server image and custom reports to point at the customer's OnePoint database. In a large environment, this could be multiple databases or even simply a top tier database depending upon how they are configured. We then begin to gather the data that will be used during the tuning process.

I spent last week in Redmond with our Exchange MOM servers running these reports and starting the tuning process there. As with the other SKUs we have created, we want to make sure the processes we take to the field are the same we use internally. By RTM in September, we will be fully utilizing this SKU within MSIT.

(7-Jul-2006)

More on Reports

With the 4th of July, this week was a short week. My focus this week was on getting the reports up to speed. I am very pleased with the progress. Currently I have 5 linked reports that run queries against the OnePoint database and return results. This blog site doesn't lend itself well to posting graphics, so I won't be able to provide screen shots, but I can describe a little about what the reports provide.

I tried to be as descriptive as possible with the report names. They are:

  • Alerts by Computer Group with Alert Counts
  • Alerts by Computers in Computer Groups with Alert Counts
  • Alerts by Computer Group by Computer
  • Alerts by Computer Group by Severity
  • Alerts by Computer by Severity

The linking is as follows:

                   Alerts by Computer Group with Alert Counts

                     /                                                   \

                  /                                 Alerts by Computer Group by Severity

Alerts by Computers in Computer Groups with Alert Counts

                  /                                                   \

Alerts by Computer Group by Computer               \

                                                               Alerts by Computer by Severity

The introductory report (Alerts by Computer Group with Alert Counts) lists the default Exchange Management Pack Computer Groups (Exchange 2000 Server, Exchange 2003 Server) as well as custom computer groups defined by the customer. (These have to be hand coded into the report prior to the engagement.) Beside each of the Computer Groups are columns labeled # Warning, # Error, # Critical Error, and # Service Unavailable. In each of the columns, I list the number of each severity of alert per computer group.  All columns in this report are hyperlinked. If you select one of the Computer Group Names, you will jump to a report (Alert by Computers in Computer Groups with Alert Counts)  listing all the computers in the computer group with a list of Alerts of differing severity for each (Described next). If you select any of the values listed under the alert severity columns, you are linked to a report ( Alerts by Computer Group by Severity) that shows you all computers within the computer group that have logged the selected severity of alert.

Posted by mackals | 0 Comments

Blog focus change

I'm changing the focus of the blog somewhat. I recently accepted a new job at Microsoft. I'm now an Architect in the Operations Center of Excellence out of Redmond. This is the same team that created the Desired Configuration Monitor SKU (DCM) and the Service Level Management SKU (SLM). I'm currently working on a new SKU call Proactive Monitoring with MOM (PMM).

While this blog will still focus on MOM, I'll be writing articles in support of our SKU efforts around the product. I hope to still put out valuable content on both MOM 2005 and System Center Operations Manager 2007 as I get it up and start playing with it. I will still post some content for SMS, but not nearly as much attention to it as MOM.

 

mac

Posted by mackals | 0 Comments

Notification Workflow Solution Accelerator

When I was typing the title of this article, my fat fingers made a Freudian slip that is pretty appropriate for this solution accelerator. The original title was "Notification WorkSlow Solution Accelerator". Once you finally get the thing installed, it is extremely frustrating trying to set up things like separate day/evening notification schedules for operators. More on this later; for now let me start with setup:

Setting Up the SA

The setup documentation for this particular SA is poor at best. There are a number of assumptions made by the authors that just don't match up with the average consultant installing the product. Here are all the components I had to add outside of the instructions to make it work:

  • First the SA requires that you have IIS running on the database server where the SA will be installed. (How many Enterprise customers do most of you have that will allow IIS to run on their database servers?)
  • Second, you need to install SQL Notification Services. (I know... I suppose I should have known that, but hey I don't play in the SQL world often)
  • Third, you need to install SQLXML. (This is necessary because you *MUST* install the Engine components when installing the SQL Notification Services. If you select the Engine components and don't have SQLXML installed, you get a pop up telling you to install it and try again.)

The SA documentation doesn't point out that you need any of these installed. If you don'y install them however (and you run the SA installation) there is absolutely nothing in the log files that would lead you to determine why the SA doesn't install...

Using the SA

Once you finally get it installed, you use a web interface to set up users and subscriptions. The url is http://<servername>/NotificationWorkFlow.Web/HomePage.aspx

Two things you will notice immediately.

  • When you add users, they actually go into a local table on the SQL server. There is no AD integration meaning that you will need to enter each subscriber's information manually with no ability to use AD Groups (or groups of any kind for that matter). For a very small shop (5 to 10 Operators) this might be acceptable. For an Enterprise Solution it is extremely poor.
  • Most Enterprise Orgs that I have dealt with usually have email notification during working hours and cell/pager notifications after hours. The SA can accomplish this, but you must create a daytime subscriber ID and and evening subscriber ID for everyone that will need both types of notifications.

Once you have added the users, there is no way to cut and paste schedules between users. Since most people on a single shift will have the same hours, this could be a huge time saver but it is not possible with the SA.

The Subscriber devices tab is interesting. It does allow you to specify multiple devicesand you can specify which device (email, pager, etc) can be used for each subscription, but without the ability to set up devices per schedule, you are limited to adding multiple subscribers per user.

The one thing that is nice about the SA is that it gives you the ability to be paged for alerts on a particular Management Pack, Computer Group, or Individual Computer as well as giving you the choice of notifications by severity threshold.

Overall Score

If I rate this Solution Accelerator on a scale of 1 to 5, I would give it a 1.5 for overall functionality. It falls very short in the Enterprise space. i can't imagine that a single Enterprise customer would actually use the SA in production. It could possibly work for a very small organization.

My suggestion: Unless your customer is up for a dev engagement to make the tool useable, steer them away from this SA...

 

-mac

Posted by mackals | 3 Comments

What does "Commit Changes" when rules are changed actually do?

If you commit changes after a rule change, the management server pushes the new rules at the next heartbeat interval. (By default this is set at every 10 seconds). If you don't commit changes, then the new rule is pushed on the next client configuration request which is by default set at 1 min intervals.

The only place I could foresee this being an issue at all is where you have servers across a slow link so you set the configuration request interval up to reduce traffic…

Posted by mackals | 0 Comments

MOM Event Stream Info

One thing that was unclear to me was the significance or use of the MOM event stream. Somehow I just didn't make the connection between the event stream, alerts, and rules. Our documentation didn't seem to make this clear to me, and it wasn't immediately intuitive to me. Now that I have made the correlation however, I look at the event stream in two distinct ways:

  • Informational Data
  • Troubleshooting Data

Informational Data

For instance, many of the MOM 2005 tasks that you launch from the MOM Operator's console do not display results directly to the operator console directly. For instance, if you run an IP CONFIGURATION task from the task pane against a specific computer, the resultant data is not immediately echoed back to the Operator console. Instead, the action and results are displayed in the MOM event stream. The first thing you see in the event stream for an IP CONFIGURATION task is an information event with the following data:

The task 'IP Configuration' is scheduled to run against 'Computer:DOMAIN\COMPUTER.
Task Id: {6D260750-134E-48FF-806F-4C08CE2A815C}
Execution Id: {38548CAF-B78B-415F-B64C-62D46B6807E2}
Launched By: DOMAIN\ae_squ2

 This is followed shortly by an information event with the results of the request as follows:

The task 'IP Configuration' has successfully executed against 'Computer:DOMAIN\COMPUTER.
Task Id: {6D260750-134E-48FF-806F-4C08CE2A815C}
Execution Id: {38548CAF-B78B-415F-B64C-62D46B6807E2}
Launched By: DOMAIN\ae_squ2

The following output has been generated:

Windows IP Configuration
Host Name . . . . . . . . . . . . : COMPUTER
Primary Dns Suffix . . . . . . . : EXAMPLE.DNSNAME.COM
Node Type . . . . . . . . . . . . : Hybrid
IP Routing Enabled. . . . . . . . : No
WINS Proxy Enabled. . . . . . . . : No
DNS Suffix Search List. . . . . . : example.dnsname.com
home.dnsname.com
dnsname.com
nomad.dnsname.com

Ethernet adapter adsm:
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : HP NC6170 Dual Gigabit Server Adapter
Physical Address. . . . . . . . . : 00-02-A5-47-33-F4
DHCP Enabled. . . . . . . . . . . : No
IP Address. . . . . . . . . . . . : 10.128.41.106
Subnet Mask . . . . . . . . . . . : 255.255.252.0
Default Gateway . . . . . . . . . :
NetBIOS over Tcpip. . . . . . . . : Disabled

Ethernet adapter Internal + Public:
Connection-specific DNS Suffix . : ex.dnsname.org
Description . . . . . . . . . . . : HP NC7781 Gigabit Server Adapter
Physical Address. . . . . . . . . : 00-11-85-BA-CD-2D
DHCP Enabled. . . . . . . . . . . : No
IP Address. . . . . . . . . . . . : 10.128.182.30
Subnet Mask . . . . . . . . . . . : 255.255.255.192
Default Gateway . . . . . . . . . : 10.128.182.1
DNS Servers . . . . . . . . . . . : 10.128.175.201
10.128.175.202
10.64.175.201
Primary WINS Server . . . . . . . : 10.128.175.215
Secondary WINS Server . . . . . . : 10.128.175.216
10.128.175.213
Ethernet adapter Internal:
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : HP NC7781 Gigabit Server Adapter #2
Physical Address. . . . . . . . . : 00-11-85-BA-DF-2C
DHCP Enabled. . . . . . . . . . . : No
IP Address. . . . . . . . . . . . : 10.128.65.16
Subnet Mask . . . . . . . . . . . : 255.255.255.128
Default Gateway . . . . . . . . . :
NetBIOS over Tcpip. . . . . . . . : Disabled
 
Troubleshooting Data
 
The second thing  found important about the MOM event stream is that lots of additional ionformation is available outside of an alert. This is especially true for alerts with repeat counts. So for instance if I see an alert conatining the following information:
Description:
Error during synthetic Outlook Mobile Access logon.

To determine the current state of this problem, look at the events associated with this alert and find the most recent event.

The initial event reported that:

Cannot measure OMA availability. Unexpected error.

OmaStatus failed to initialize

This event was generated by the script: "Exchange 2003 - OMA logon verification"
Name: General error during synthetic Outlook Mobile Access logon.
Severity: Warning
Resolution State: New
Domain: GOMER
Computer: LITTLEPYLE
Time of First Event: 1/21/2005 2:49:00 PM
Time of Last Event: 1/26/2005 10:34:00 AM
Alert latency: 1 sec
Problem State: Active
Repeat Count: 463
Age:
Source: Exchange MOM
Alert Id: 01bc0bca-e2dd-40af-90c2-69171008b7b3
Rule (enabled): Microsoft Exchange Server\Exchange 2003\Availability and State Monitoring\Verify Outlook Mobile Access Front-End Availability\General error during synthetic Outlook Mobile Access logon.
 
I would customize the event stream under My Views so that I could see all events associated with this computer. Further investigation would eventually reveal that when the Exchange Management Pack Wizard was run in this environment, a selection was made *NOT* to monitor front end servers, but this rule and others were never disabled. It was easy to see that multiple events were being raised that lead back to these rules.
 

Access Denied Errors during Computer Scan

I ran into a problem today where when I set up discovery rules and forced a computer scan, I received a slew of Access Denied errors from computers that are part of the discovery rule. After much troubleshooting as to why the problem was occuring, I found that the Action Account had become locked out due to too many password retries. (Still don't know the reason why there we so many retries). This is a highly secured environment and the action account is low privileged anyway, but this account lockout resulted in the inability to discover computers. Once we unlocked the action account everything worked fine.
Posted by mackals | 0 Comments

Management Pack versions on MOM 2005?

I ran into an interesting question from one of my consultants today. He asked me how to determine the version number of the Management Pack that was currently running on a MOM 2000 SP1 server. To be honest, I don't think there is a way to do this. I normally suggest that customers export the current management packs into a BASE directory. Then as new management packs are available and imported, you should create a subdirectory under BASE that in the form of {ManagementPackName}{Date} and store the new management packs in that directory.

Exported Management Packs

I noticed today as I installed MOM 2005 for a customer that after you import the first management pack, a directory gets created in the Operations Manager Program Directory named MPBackup (C:\Program Files\Microsoft Operations Manager 2005\MPBackup). Guess what gets added into that directory when you do a management pack import? That's right! the management pack name followed by date and time. One I imported today shows up like this:

MicrosoftExchangeServer2003_10.20.04 13.57.49.akm

So how do I find the version I'm running today?

To find out what released version you have in production, go to the administrator's console. Right click the management pack you are interested in and select properties. The management pack version is a field on the Properties page.

 

MOM 2005 Management UI and Computer Groups

So Computer Groups have changed between MOM 2000 and MOM 2005. In MOM 2000, the server itself was responsible for server discovery and Computer Groups in the UI showed which computers had been discovered and were populated in the Computer Group. In MOM 2005, the discovered servers do not show up in the management UI. This causes some consternation for most MOM admins until they realize that the Computer Group members are displayed in the Operator's Console not the Admin Console. (I think we did this because we partitioned the MOM 2005 tables...)
 
Page view tracker