September, 2006

  • Jakub@Work

    Sample Alert and State Change Insertion

    • 64 Comments

     Update: I have updated the Management Pack to work with the final RTM bits

     First, a disclaimer. Not everything I write here works on the Beta 2 bits that are currently out. I had to fix a few bugs in order to get all these samples working, so only to most recent builds will fully support the sample management pack. I will, however, provide at the end of a the post a list of the things that don't work =).

    I've attached to the post a sample management pack that should import successfully on Beta 2, please let me know if it doesn't and what errors you get. This management pack is for sample purposes only. We will be shipping, either as part of the product or as a web-download, a sealed SDK/MCF management pack that will help alert and state change insertion programmatically and that will support all the things I am demonstrating here.

    What I would like to do, is go through this management pack and talk about how each component works, and then include some sample code at the end that goes over how to drive the management pack from SDK code.

    This first thing you will notice in the management pack is a ConditionDetectionModuleType named System.Connectors.GenericAlertMapper. What this module type does is take as input any data type and output the proper data type for alert insertion into the database (System.Health.AlertUpdateData). This module type is marked as internal, meaning it cannot be referenced outside of this management pack, and simply provides some glue to make the whole process work.

    Next, we have the System.Connectors.PublishAlert WriteActionModuleType which takes the data produced by the aforementioned mapper and publishes it to the database. Regardless of where other parts of a workflow are running, this module type must run on a machine and as an account that has database access. This is controlled by targeting as described in the previous post. This module type is also internal.

    Now we have our first two public WriteActionModuleType's, System.Connectors.GenerateAlertFromSdkEvent and System.Connectors.GenerateAlertFromSdkPerformanceData. These combine the aforementioned module types into a more useable composite. They take as input System.Event.LinkedData and System.Performance.LinkedData, respectively. Note, these are the two data types that are produced by the SDK/MCF operational data insertion API. Both module types have the same configuration, allowing you to specify the various properties of an alert.

    The last of the type definitions is a simple UnitMonitorType, System.Connectors.TwoStateMonitorType. This monitor represents two states, Red and Green, which can be driven by events. You'll notice that it defines two operational state types, RedEvent and GreenEvent, which correspond to the two expression filter definitions that match on the $Config/RedEventId$ and $Config/GreenEventId$ to drive state. What this monitor type essentially defines, is that if a "Red" event comes in, the state of the monitor is red, and vice-versa for a "Green" event. It also allows you to configure the event id for these events.

    Now we move to the part of the management pack where we use all these defined module types.

    First lets look at System.Connectors.Test.AlertOnThreshold and System.Connectors.Test.AlertOnEvent. Both these rules use the generic performance data and event data sources as mentioned in an earlier post. They produce performance data and events for any monitoring object they were inserted against, and as such, you'll notice both rules are targeted to Microsoft.SystemCenter.RootManagementServer; only have a single instance of each rule will be running. The nice thing about this is that you can generate alerts for thousands of different instances with a single workflow, assuming your criteria for the alert is the same. Which brings me to the second part of the rule, which is the expression filter. Each rule has its own expression filter module that matches the data coming in to a particular threshold or event number.  Lastly, each includes the appropriate write action to actually generate the alert, and using parameter replacement to populate the name and description of the alert.

    The other two rules, System.Connectors.Test.AlertOnThresholdForComputer and System.Connectors.Test.AlertOnEventForComputer, are similar, only they use the targeted SDK data source modules and as such are targeted at System.Computer. It is important to note that targeting towards computer will only work on computers that have database access running under an account that has database access. I used this as an example because it didn't require me to discovery any new objects, plus, I had a single machine install where the only System.Computer was the root management server. The key difference between these two rules and the previous rules is that there will be a new instance of this rule running for every System.Computer object. So you can imagine, if you created a rule like this and targeted to a custom type you had defined for which you discovered hundreds or thousands of instances, you would run into performance issues. From a pure modeling perspective, this is the "correct" way to do it, since logically you would like to target your workflows to your type, however, practically, it's better to use the previous types of rules to ensure better performance.

    The last object in the sample is System.Connectors.Test.Monitor. This monitor is a instance of the monitor type we defined earlier. It maps the GreenEvent type state of the monitor type to the Success health state and the RedEvent to the Error health state. It defines via configuration that events with id 1, will make the monitor go red and events with id 2 will make it go back to green. It also defines that an alert should be generated when the state goes to Error and also that the alert should be auto-resolved when the state goes back to Success. Lastly you'll notice the alert definition here actually uses the AlertMessage paradigm for alert name and description. This allows for fully localized alert names and descriptions.

    This monitor uses the targeted data source and thus will create an instance of this monitor per discovered object. We are working on a similar solution to the generic alert processing rules for monitors and it will be available in RTM, it's just not available yet.

    Now, what doesn't work? Well, everything that uses events should work fine. For performance data, the targeted versions of workflows won't work, but the generic non-targeted ones will. Also, any string fields in the performance data item are truncated by 4 bytes, yay marshalling. Like I said earlier, these issues have been resolved in the latest builds.  

    Here is some sample code to drive the example management pack:

    using System;

    using System.Collections.ObjectModel;

    using Microsoft.EnterpriseManagement;

    using Microsoft.EnterpriseManagement.Configuration;

    using Microsoft.EnterpriseManagement.Monitoring;

     

    namespace Jakub_WorkSamples

    {

        partial class Program

        {

            static void DriveSystemConnectorLibraryTestManagementPack()

            {

                // Connect to the sdk service on the local machine

                ManagementGroup localManagementGroup = new ManagementGroup("localhost");

     

                // Get the MonitoringClass representing a Computer

                MonitoringClass computerClass =

                    localManagementGroup.GetMonitoringClass(SystemMonitoringClass.Computer);

     

                // Use the class to retrieve partial monitoring objects

                ReadOnlyCollection<PartialMonitoringObject> computerObjects =

                    localManagementGroup.GetPartialMonitoringObjects(computerClass);

     

                // Loop through each computer

                foreach (PartialMonitoringObject computer in computerObjects)

                {

                    // Create the perf item (this will generate alerts from

                    // System.Connectors.Test.AlertOnThreshold and

                    // System.Connectors.Test.AlertOnThresholdForComputer )

                    CustomMonitoringPerformanceData perfData =

                        new CustomMonitoringPerformanceData("MyObject", "MyCounter", 40);

                    // Allows you to set the instance name of the item.

                    perfData.InstanceName = computer.DisplayName;

                    // Allows you to specify a time that data was sampled.

                    perfData.TimeSampled = DateTime.UtcNow.AddDays(-1);

                    computer.InsertCustomMonitoringPerformanceData(perfData);

     

                    // Create a red event (this will generate alerts from

                    // System.Connectors.Test.AlertOnEvent,

                    // System.Connectors.Test.AlertOnEventForComputer and

                    // System.Connectors.Test.Monitor

                    // and make the state of the computer for this monitor go red)

                    CustomMonitoringEvent redEvent =

                        new CustomMonitoringEvent("My publisher", 1);

                    redEvent.EventData = "<Data>Some data</Data>";

                    computer.InsertCustomMonitoringEvent(redEvent);

     

                    // Wait for the event to be processed

                    System.Threading.Thread.Sleep(30000);

     

                    // Create a green event (this will resolve the alert

                    // from System.Connectors.Test.Monitor and make the state

                    // go green)

                    CustomMonitoringEvent greenEvent =

                        new CustomMonitoringEvent("My publisher", 2);

                    greenEvent.EventData = "<Data>Some data</Data>";

                    computer.InsertCustomMonitoringEvent(greenEvent);

                }

            }

        }

    }

     

  • Jakub@Work

    Workflow Targeting and Classes

    • 22 Comments

    I set out this week trying to put together a post about designing and deploying rules and monitors that utilize the SDK data sources that I talked about in the last post. Unfortunately, it's not ready yet. Among other things this week, I have been trying to write and deploy a sample management pack that would demonstrate the various techniques that we recommend for inserting operational data via the SDK to SCOM, but in the process I have run into a few issues that need resolving before presenting the information. I am definitely working on it and I'll get something up and soon as we work through the issues I have run into. If you need something working immediately, please contact me directly with questions you have so I can better address your specific scenario.

    In the meantime, I wanted to discuss classes in SCOM and how they relate to workflow (take this to mean a rule, monitor or task in SCOM 2007) targeting. I think this topic is a good stepping stone for understanding the techniques I'll talk about when the aforementioned post is ready.

    First, what do I mean by targeting? In MOM 2005 rules were deployed based on the rule groups they were in and their association to computer groups. Rules would be deployed irrespective of whether they were needed on a particular computer. The targeting mechanism in 2007 is much different and based entirely around the class system that describes the object space. Each workflow is assigned a specific target class and the agents will receive rules when they have objects of that particular class being managed on that machine.

    Ok, so what does that all mean? Let's start with a sample class hierarchy. First, we have a base class of all classes, System.Entity (this is the actual base class for all classes in SCOM 2007). This class is abstract (meaning that there cannot be an instance of just System.Entity). Next, suppose we have a class called Microsoft.SqlServer (note this is not the actual class hierarchy we will ship, this is only for illustrative purposes). This class is not abstract and defines all the key properties that identify a SQL Server. Key properties are the properties the uniquely identify an instance in an enterprise. For a SQL Server this would be a combination of the server name as well as the computer name the server is on.  Next, there is a class Microsoft.SqlServer.2005 which derives from Microsoft.SqlServer adding properties specific to SQL Server 2005, but it adds no key properties (and in fact cannot add any). This means that a SQL Server 2005 in your enterprise would be both a Microsoft.SqlServer AND and Microsoft.SqlServer.2005 and the object that represented it, from an identity perspective, would be indistinguishable (i.e. its the same SQL Server). Lastly, SQL Servers can't exist by themselves, so we add a System.Computer class to the mix that derives from System.Entity. We now have all the classes defined that we need to talk about our first workflow, discovery.

    Let's assume we already have a computer discovered in our enterprise, Computer1. In order to discover a SQL Server, we need two things:

    1. We need to define a discovery rule that can discover a SQL Server.
    2. We need to deploy and run the rule

    In order to make our rule deploy and run, we'll need to target it to a type that gets discovered before SQL Server, in our case System.Computer. If we target a discovery rule to the type its discovering, you'll have a classic chicken or the egg problem on your hands. When we target our discovery rule to System.Computer, the configuration service knows that there is a Computer1 in the enterprise that is running and being managed by an agent and it will deploy any workflow targeted at System.Computer, including our discovery rule, to that machine and in turn execute the rule. Once the rule executes it will submit new discovery data to our system and the SQL Server will appear; lets call the server Sql1. Our SQL Server 2005 discovery rule can be targeted to System.Computer, or we could actually target it to Microsoft.SqlServer since it will already be discovered by the aforementioned rule. This illustrates a waterfall approach to discovery, and the recommended way discovery is done in the system. There needs to be a "seed' discovered object that is leveraged for further discovery that can generated subsequent "seeds" for related objects. In SCOM 2007 we jump start the system by pre-discovering the primary management server (also a computer) and allow manual computer discovery and agent deployment that jump starts discovery on those machines.

    This example also illustrates the workflow targeting and deployment mechanism in SCOM 2007. When objects are discovered in an enterprise, they are all discovered and identified as instances of particular classes. In the previous example, Computer1 is both a System.Entity and a System.Computer. Sql1 is a System.Entity, Microsoft.SqlServer and Microsoft.SqlServer.2005. We maintain this state in the configuration service and deploy workflows to agents that are managing these instances, based on the types of instances they are managing. This ensures that workflows get deployed and executed on the agents that need them, with no need to manage targeting.

    Another example of targeting would be with a task. Let's say we have a task that restarts SQL Server. This task can actually run on any Microsoft.SqlServer. Maybe we have another task that disables a specific SQL 2005 feature that only makes sense to run against objects that are in fact SQL Server 2005 instances. The first task would be targeted against Microsoft.SqlServer while the second against Microsoft.SqlServer.2005. If you select a SQL Server object in the SCOM 2007 UI that is not a SQL Server 2005, but instead is SQL Server 2000, the 2005 specific task will not be available. If you try to run it anyway via the SDK or the command shell, it will fail because the instance you are trying to run against isn't the right class and doesn't understand the task. The first task however, restarting the service, will run against both the SQL 2000 and 2005 instances and be available for both in the UI.

    I hope this helps make a bit more sense out of the new class system and leveraging it for targeting. Like with anything else, there are edge cases, more complicated hierarchies and other considerations when designing and targeting workflows, but from a "pure" modeling perspective, this should give you an idea as to how things work.

  • Jakub@Work

    Inserting Operational Data

    • 52 Comments

    I received a few requests to talk about operational data insertion in SCOM 2007. I was a bit hesitant to take this on now, mostly because I haven't really covered a lot of the core concepts required to truly understand the data insertion process, but I decided that people can ask clarifying questions if they have any while getting this out there was an important step in helping customers and partners move from 2005 to 2007.

    There are two objects that are used for inserting operational data; CustomMonitoringEvent and CustomMonitoringPerformanceData. Inserting these objects is supported both via the SDK directly as well as the MCF web-service.

    This data can be inserted against any discovered object (unlike MOM 2005 where only programmatically inserted computers were supported for event/performance data insertion in the SDK) via the InsertCustomMonitoringEvent(s) and InsertCustomMonitoringPerformanceData methods on the PartialMonitoringObject class as well as the InsertMonitoringEvents and InsertMonitoringPerformanceData methods on the MCF endpoint.

    The below sample illustrates finding all computers in the management group and inserting a single performance data value against each:

    using System;

    using System.Collections.ObjectModel;

    using Microsoft.EnterpriseManagement;

    using Microsoft.EnterpriseManagement.Configuration;

    using Microsoft.EnterpriseManagement.Monitoring;

     

    namespace Jakub_WorkSamples

    {

        partial class Program

        {

            static void InsertCustomMonitoringPerformanceData()

            {

                // Connect to the sdk service on the local machine

                ManagementGroup localManagementGroup = new ManagementGroup("localhost");

     

                // Get the MonitoringClass representing a Computer

                MonitoringClass computerClass =

                    localManagementGroup.GetMonitoringClass(SystemMonitoringClass.Computer);

     

                // Use the class to retrieve partial monitoring objects

                ReadOnlyCollection<PartialMonitoringObject> computerObjects =

                    localManagementGroup.GetPartialMonitoringObjects(computerClass);

     

                // Loop through each computer

                foreach (PartialMonitoringObject computer in computerObjects)

                {

                    // Create a CustomMonitoringPerformanceData object

                    CustomMonitoringPerformanceData perfData =

                        new CustomMonitoringPerformanceData("CPU", "CPU Threshold", 21.3);

                    perfData.InstanceName = computer.Name;

     

                    // Insert the data

                    computer.InsertCustomMonitoringPerformanceData(perfData);

                }

            }

        }

    }

    The pattern for events is very similar, there are just more properties available on CustomMonitoringEvent.

    In order to actually use these objects and methods, it's important to understand what happens when any of the insert calls complete. First, these objects get converted to their runtime counterpart as xml; for events this is the System.Event.LinkedData* as defined in the System.Library* management pack and for performance data this is System.Performance.LinkedData* as defined in the System.Performance.Library* management pack. These items are then inserted into the PendingSdkDataSource table in the database. Once this happens, the call succeeds and returns, even though the data has not yet been processed.

    In order to actually pick up and utilize the inserted data, I also wrote several DataSourceModuleTypes (this is a management pack level concept that describes data sources (providers in MOM 2005) that are available for use in rules and monitors) that read from the PendingSdkDataSource table in the db and process the newly inserted objects. "Normal" performance data and event rules use the system defined DataSourceModuleTypes that read the system performance counters and the event log, respectively. The data inserted via the SDK will not be processed if using these data sources. All the data sources that facilitate SDK insertion are on a fixed polling interval of 30 seconds, and wake up that often to process any new data in the database. There are two DataSourceModuleTypes available for both events and performance data, all defined in the Microsoft.SystemCenter.Library*:

    Microsoft.SystemCenter.SdkEventProvider - This data source will output a System.Event.LinkedData object for every CustomMonitoringEvent inserted via the SDK, regardless of the object it was inserted against.

    Microsoft.SystemCenter.TargetEntitySdkEventProvider - This data source will only output a System.Event.LinkedData object for CustomMonitoringEvents inserted via the SDK that were inserted against the target of the workflow that is using the DataSourceModuleType. For instance, if you create a new rule and target it to the System.Computer type and use this DataSourceModuleType as the data source of the rule, the only events that will come out of the data source will be events that were inserted against objects of the System.Computer class.

    Microsoft.SystemCenter.SdkPerformanceDataProvider - The same as the SdkEventProvider, only for System.Performance.LinkedData and CustomMonitoringPerformanceData.

    Microsoft.SystemCenter.TargetEntitySdkPerformanceDataProvider - The same as the TargetEntitySdkEventProvider, only for System.Performance.LinkedData and CustomMonitoringPerformanceData.

    So, in order to actually drive state of discovered objects, or perform other actions based on the data inserted via the SDK, you will need to write rules or monitors that use the aforementioned DataSourceModuleTypes. We do ship and install by default in the Microsoft.SystemCenter.Internal* management pack two rules that automatically collect all the data inserted via the SDK; Microsoft.SystemCenter.CollectSdkEventData* and Microsoft.SystemCenter.CollectSdkPerformanceData*. These rules are both targeted at the RootManagementServer* class and will only be instantiated, as such, on the Principal Management Server.

    One very important thing to note: if you write rules or monitors that use the sdk data sources, they must be executed on a server that has database access AND the account the rule or monitor is running under must have the required database permissions. In general a rule or monitor is executed on the machine that discovered the object of a particular class; i.e. if you discover an instance of SQL Server on computer A and computer A has an agent on it, all rules and monitors targeted to SQL Server will be run on that particular SQL Server objects behalf on the agent on computer A. That's a bit of a mouthful, but I hope it makes sense.

    * These names are subject to change prior to RTM.

Page 1 of 1 (3 items)