Sample Alert and State Change Insertion

Sample Alert and State Change Insertion

  • Comments 64

 Update: I have updated the Management Pack to work with the final RTM bits

 First, a disclaimer. Not everything I write here works on the Beta 2 bits that are currently out. I had to fix a few bugs in order to get all these samples working, so only to most recent builds will fully support the sample management pack. I will, however, provide at the end of a the post a list of the things that don't work =).

I've attached to the post a sample management pack that should import successfully on Beta 2, please let me know if it doesn't and what errors you get. This management pack is for sample purposes only. We will be shipping, either as part of the product or as a web-download, a sealed SDK/MCF management pack that will help alert and state change insertion programmatically and that will support all the things I am demonstrating here.

What I would like to do, is go through this management pack and talk about how each component works, and then include some sample code at the end that goes over how to drive the management pack from SDK code.

This first thing you will notice in the management pack is a ConditionDetectionModuleType named System.Connectors.GenericAlertMapper. What this module type does is take as input any data type and output the proper data type for alert insertion into the database (System.Health.AlertUpdateData). This module type is marked as internal, meaning it cannot be referenced outside of this management pack, and simply provides some glue to make the whole process work.

Next, we have the System.Connectors.PublishAlert WriteActionModuleType which takes the data produced by the aforementioned mapper and publishes it to the database. Regardless of where other parts of a workflow are running, this module type must run on a machine and as an account that has database access. This is controlled by targeting as described in the previous post. This module type is also internal.

Now we have our first two public WriteActionModuleType's, System.Connectors.GenerateAlertFromSdkEvent and System.Connectors.GenerateAlertFromSdkPerformanceData. These combine the aforementioned module types into a more useable composite. They take as input System.Event.LinkedData and System.Performance.LinkedData, respectively. Note, these are the two data types that are produced by the SDK/MCF operational data insertion API. Both module types have the same configuration, allowing you to specify the various properties of an alert.

The last of the type definitions is a simple UnitMonitorType, System.Connectors.TwoStateMonitorType. This monitor represents two states, Red and Green, which can be driven by events. You'll notice that it defines two operational state types, RedEvent and GreenEvent, which correspond to the two expression filter definitions that match on the $Config/RedEventId$ and $Config/GreenEventId$ to drive state. What this monitor type essentially defines, is that if a "Red" event comes in, the state of the monitor is red, and vice-versa for a "Green" event. It also allows you to configure the event id for these events.

Now we move to the part of the management pack where we use all these defined module types.

First lets look at System.Connectors.Test.AlertOnThreshold and System.Connectors.Test.AlertOnEvent. Both these rules use the generic performance data and event data sources as mentioned in an earlier post. They produce performance data and events for any monitoring object they were inserted against, and as such, you'll notice both rules are targeted to Microsoft.SystemCenter.RootManagementServer; only have a single instance of each rule will be running. The nice thing about this is that you can generate alerts for thousands of different instances with a single workflow, assuming your criteria for the alert is the same. Which brings me to the second part of the rule, which is the expression filter. Each rule has its own expression filter module that matches the data coming in to a particular threshold or event number.  Lastly, each includes the appropriate write action to actually generate the alert, and using parameter replacement to populate the name and description of the alert.

The other two rules, System.Connectors.Test.AlertOnThresholdForComputer and System.Connectors.Test.AlertOnEventForComputer, are similar, only they use the targeted SDK data source modules and as such are targeted at System.Computer. It is important to note that targeting towards computer will only work on computers that have database access running under an account that has database access. I used this as an example because it didn't require me to discovery any new objects, plus, I had a single machine install where the only System.Computer was the root management server. The key difference between these two rules and the previous rules is that there will be a new instance of this rule running for every System.Computer object. So you can imagine, if you created a rule like this and targeted to a custom type you had defined for which you discovered hundreds or thousands of instances, you would run into performance issues. From a pure modeling perspective, this is the "correct" way to do it, since logically you would like to target your workflows to your type, however, practically, it's better to use the previous types of rules to ensure better performance.

The last object in the sample is System.Connectors.Test.Monitor. This monitor is a instance of the monitor type we defined earlier. It maps the GreenEvent type state of the monitor type to the Success health state and the RedEvent to the Error health state. It defines via configuration that events with id 1, will make the monitor go red and events with id 2 will make it go back to green. It also defines that an alert should be generated when the state goes to Error and also that the alert should be auto-resolved when the state goes back to Success. Lastly you'll notice the alert definition here actually uses the AlertMessage paradigm for alert name and description. This allows for fully localized alert names and descriptions.

This monitor uses the targeted data source and thus will create an instance of this monitor per discovered object. We are working on a similar solution to the generic alert processing rules for monitors and it will be available in RTM, it's just not available yet.

Now, what doesn't work? Well, everything that uses events should work fine. For performance data, the targeted versions of workflows won't work, but the generic non-targeted ones will. Also, any string fields in the performance data item are truncated by 4 bytes, yay marshalling. Like I said earlier, these issues have been resolved in the latest builds.  

Here is some sample code to drive the example management pack:

using System;

using System.Collections.ObjectModel;

using Microsoft.EnterpriseManagement;

using Microsoft.EnterpriseManagement.Configuration;

using Microsoft.EnterpriseManagement.Monitoring;

 

namespace Jakub_WorkSamples

{

    partial class Program

    {

        static void DriveSystemConnectorLibraryTestManagementPack()

        {

            // Connect to the sdk service on the local machine

            ManagementGroup localManagementGroup = new ManagementGroup("localhost");

 

            // Get the MonitoringClass representing a Computer

            MonitoringClass computerClass =

                localManagementGroup.GetMonitoringClass(SystemMonitoringClass.Computer);

 

            // Use the class to retrieve partial monitoring objects

            ReadOnlyCollection<PartialMonitoringObject> computerObjects =

                localManagementGroup.GetPartialMonitoringObjects(computerClass);

 

            // Loop through each computer

            foreach (PartialMonitoringObject computer in computerObjects)

            {

                // Create the perf item (this will generate alerts from

                // System.Connectors.Test.AlertOnThreshold and

                // System.Connectors.Test.AlertOnThresholdForComputer )

                CustomMonitoringPerformanceData perfData =

                    new CustomMonitoringPerformanceData("MyObject", "MyCounter", 40);

                // Allows you to set the instance name of the item.

                perfData.InstanceName = computer.DisplayName;

                // Allows you to specify a time that data was sampled.

                perfData.TimeSampled = DateTime.UtcNow.AddDays(-1);

                computer.InsertCustomMonitoringPerformanceData(perfData);

 

                // Create a red event (this will generate alerts from

                // System.Connectors.Test.AlertOnEvent,

                // System.Connectors.Test.AlertOnEventForComputer and

                // System.Connectors.Test.Monitor

                // and make the state of the computer for this monitor go red)

                CustomMonitoringEvent redEvent =

                    new CustomMonitoringEvent("My publisher", 1);

                redEvent.EventData = "<Data>Some data</Data>";

                computer.InsertCustomMonitoringEvent(redEvent);

 

                // Wait for the event to be processed

                System.Threading.Thread.Sleep(30000);

 

                // Create a green event (this will resolve the alert

                // from System.Connectors.Test.Monitor and make the state

                // go green)

                CustomMonitoringEvent greenEvent =

                    new CustomMonitoringEvent("My publisher", 2);

                greenEvent.EventData = "<Data>Some data</Data>";

                computer.InsertCustomMonitoringEvent(greenEvent);

            }

        }

    }

}

 

Attachment: System.Connectors.Library.Test.xml
Leave a Comment
  • Please add 7 and 7 and type the answer here:
  • Post
  • Could not manage to import the demo MP provided in this blog. I'm running SCOM Beta2 . This is the error message i'm getting ...

    Invalid Management Pack

    Invalid Management Pack : D:\Shared\MP\Sample MPs\Microsoft Demo MPs\System.Connectors.Library.Test.xml .: XSD verification failed for management pack. [Line: 439, Position: 24]

    The 'AlertMessage' attribute is not declared.

  • Yeah, localized alert descriptions were not supported in Beta 2, but instead the alert name and description were directly part of the configuration. You can try removing these references, or most preferably, move to a more recent RC0 build.

  • Alright, i will wait for the RC then. I guess it should be available to the public by this month end from the connect website , rite?

  • Yes, we are working on an RC1 right now. Should be available relatively soon, although I am not 100% sure of the date.

  • I wanted to go through and outline some of the changes we made for MCF since our last release. The things

  • I was testing out this example and the MP with the RC1. I just got a few questions to ask.

    1. In the MP, under the ModuleTypes tag, there is a line saying,

    <ClassID>2325018e-eef4-41a3-8c17-db831b85c93b</ClassID>

    I'm just curious what is this ClassId, Is it the classId of the computerClass? Can't we use some variables, instead of coding the classId directly?

    2. Same as my previous question, also for

    <ChannelId>5BD75C47-95C4-4c33-99B4-BFF75A1C0764</ChannelId> under the WriteActionModuleType tag.

    3. Is it possible to do thresholding only for a particular counter from the sdk inserted data?

  • I also noticed that you create a new event for every Performance Data insert. Is n't that will create so many events ? Is n't that bad for system's performance??

  • 1. and 2. - These are hardcoded values that normally would not be "public" but need to be to allow for the added functionality I talked about.

    3. I am not entirely sure I understand your question, but if I do, then yes, you just need to match on the counter name in a condition expression filter.

    In terms of creating one CustomMonitoringPerformanceData for every insert, this is the only way to do it and as always, performance should be a consideration, but regarding this, probably not a concern. What kind of scale are you looking for?

  • For the CustomMonitoringPerformanceData, I'm looking at a possibility that (approximately), Every 15 mins, 3000 devices will report performance. So that should be about 4,320,000 inserts just per day. And i 've not included the PerformanceCounters in this calculation. So is it okie to create events for each of them?

    BTW, is there any way to clear the previously inserted PerformanceData? I figured out that removing the MP which defined the class causes the PerformanceData to disappear also. But i was wondering is there any possibility from the SDK or CommandShell. And when i remove the MP what happens to all of those PerformanceData? are they completely deleted from the database or archived somewhere?

  • That should be fine in terms of scale. The performance data is deleted based on your grooming settings and if you want to archive it, you need to move it to reporting.

  • If possible, can u explain in detail about where to set the grooming settings and how to move it to reporting?

  • How about clearing the alerts, is it the same way also ?

  • Reporting is not my area so I don't know much about that; I would suggest reading through our docs and if that does not suffice, posting to the beta newsgroups.

    Regarding grooming, yes, alerts are the same. The settings for this can be changed via the UI in Administration -> Settings -> Database Grooming.

  • For the alerts it does not work. Even after i 've removed the management pack which defines the particular class the alerts still exist. Previously the alertname was like "Alert for instance MyInstance" , but after i remove MP with the class definition, it becomes "Alert for instance {xxxx-xxx-xxx}" with the GUID. I was testing by inserting Performance data, loading from a CSV file. That got almost 10000 lines of data. Somehow my thresholding condition went wrong, and i got 10000 alerts in the screen now. And i don't know how to clear them out from the screen. Even i close the alarms it still stays in the screen.

    The worse part is, when i click on the alert view, it takes lots of time to download all the alarms to the screen (Almost 5 to 10 mins). And i also noticed, it creates some files in my temp directory, which are as big as almost 1.5 GB, seems to me that the UI is caching these alarms there.

    Sometimes when i click on the alert view, after 5 mins i receive a message saying , the UI was disconnected from the server (Seems to me like a timeout setting somewhere).

    Pls.. Is there any way? i just want to clear  all those alarms and get rid of them from my system :((.

  • Operational data does not get removed when classes get removed.

    You can do this from the database:

    DELETE FROM dbo.Alert

    This will delete ALL alerts. Do this at your own risk, I take no responsibility for damage caused by editing the database directly.

Page 1 of 5 (64 items) 12345