October, 2006

  • Jakub@Work

    How Stuff Works

    • 15 Comments

    In reply to a comment I received, I wanted to put together a post about how things work in general, mostly with respect to the SDM as implemented in SCOM 2007.

    For those of you familiar with MOM 2005, you'll know that things were very computer centric, which might make the 2007 concepts a bit foreign to you. In trying to bring SCOM closer to a service-oriented management product, the idea that the computer is central has somewhat been removed. I say somewhat, because much of the rest of the world of management has not moved entirely off the computer being of central importance, so we still have to honor that paradigm to make integration with other products easier. One example of this compromise is that on an Alert object you will see the properties NetbiosComputerName, NetbiosDomainName and PrincipalName; these are present for easier integration with ticketing systems that are still largely computer centric.

    With this shift in thinking, we are able to model an enterprise to a much finer level of detail on any individual computer, as well as move above a single computer and aggregate services across machines in a single service-based representation. So what does this mean practically? For one, it means potentially a huge proliferation of discovered objects. Instead of just having a computer with a few roles discovered, we can go so far as to model each processor, each hard drive, every process, etc on every box as its own object. Now, instead of seeing a computer, you can actually see a more accurate representation of the things that are important to you in your enterprise. More importantly, however, is that this allows you to define each objects health separately from the health of the objects that depend on it and define how it's health affects the health of those things. For instance, just because one of the hard drives on a particular machine is full, doesn't necessarily mean the health of the machine is bad. Or maybe it it, and that can be modeled as well.

    As was the case with MOM 2005, at its core SCOM 2007 is driven by management packs. Managements packs define the class hierarchy that enable this form of deep discovery and they define all the tools necessary to manage these objects.

    Let's begin with discussing classes and the class hierarchy. We define a base class that all classes must derive from called System.Entity. Every class ever shipped in any management pack will derive at its core from this class. This class is abstract, meaning that there can never be an object discovered that is just a System.Entity and nothing else. We ship with an extensive abstract class hierarchy that we are still working on tweaking, but should allow for users to plug in their classes somewhere in the hierarchy that makes sense for them. You will be able to define your own abstract hierarchies as well as your own concrete classes. Concrete classes (i.e. non-abstract) are discoverable. Once you define a class as concrete, it's key properties (those that define the identity of an instance of that class) cannot be changed. For instance, if you want to specialize System.Computer, you can't define a new key property on it that would change it's identity, although you are free to add as many non-key properties as you like. In our system, the values of the key properties for a discovered instance are what uniquely identify that instance. In fact, the unique identifier (Guid) that is used internally to identity these instances is actually built off the key property values. Extending this computer example, if you do extend computer yourself, and someone else does as well, it is possible for a computer to be both of your classes at the same time, however, it will be identified as the same instance. You could imagine that Dell ships a management pack and some of your computers as both Dell Computers and Windows Computers, both of which would derive from the non-abstract Computer class that defines it's key properties. Thus an instance that is discovered as both would always we both in every context. The reason that the class of a discovered instance is important is that targeting of all workflows relies on this, but I'll talk more about this a bit later.

    In addition to discovering individual instances, the system also allows you to define relationships between instances. The relationships also have a class hierarchy that works very similarly to the class hierarchy for individual instances. The most base class here is call System.Reference and all relationship types (classes) will derive from this. There are two abstract relationship types that are of great importance that I wanted to discuss here. First, there is System.Containment which derives directly from System.Reference. While reference defines a loose coupling of instances, Containment implies that the source object of the relationship in some way contains the target object. This is very important internally to use as containment is used to allow things to flow across a hierarchy. For instance, in the UI alert views that might look at a particular set of instances (say Computers) will also automatically include alerts for anything that is contained on that computer. So a computers alert view would show alerts for hard drives on that computer as well. This is something that is an option when making the direct SDK calls, but in the UI it is the method that is chosen. Even more important than this is the fact that security scopes flow across containment relationships. If a user is given access to a particular group, they are given access to everything contained in that group by the System.Containment relationship type (or any type that derives from it). An even more specialized version of containment that is also very important is System.Hosting. This indicates a hosting relationships exists between the source and target where the target's identity is dependant upon its host. For instance, a SQL Server is hosted by a computer, since it would not exist outside the context of that computer. Going back to what I said in the previous paragraph about using key properties of an instance to calculate the unique id of it, we actually also use the key properties of all its hosts to identify it as well. Taking the SQL Server as an example, I can have hundreds of Instance1 SQL Servers running in my enterprise. Are they all the same? Of course not, they are different based on the computer they are on. That's how we differentiate them. Even in the SDK, when you get MonitoringObjects back, the property values that are populated include not only the immediate properties of the instance, but also the key properties of the host(s).

    All the examples I've mentioned thus far talk about drilling down on an individual computer, but we can also build up. I can define a service as being dependant on many components that span across physical boundaries. I can use the class hierarchy to create these new service types and extend the relationship type hierarchy to define the relationships between my service and its components.

    Before we talk about how all these instances get discovered, let's talk about why being an instance of a particular type is important. SCOM actually uses the class information about a discovered instance to determine what should run on its behalf. Management pack objects, such as rules, tasks and monitors, are all authored with a specific class as their target. What this actually means is that the management pack wants the specified workflow to run for every instance of that class that is discovered. If I have a rule that monitors the transaction log for SQL, I want that rule deployed and executed on every machine that has a SQL server discovered. What our configuration service does is determine what rules need to be deployed where, based on the discovered instance space and on where those particular instance are managed (and really, if they are managed, although that is a discussion for another post). So another important attribute about instances, is where they are being managed; essentially every discovered instance is managed by some agent in your enterprise, and it's usually the agent on the machine where the instance was discovered. When the agent receives configuration from the configuration service, it instantiates all the workflows necessary for all the instances that it manages. This is when all the discovery rules, rules and monitors will start running. Tasks, Diagnostics and Recoveries are a bit different in that they run on demand, but when they are triggered, they will actually flow to the agent that manages the instance that workflow was launched against. Class targeting is important here as well, as Tasks, Diagnostics and Recoveries can only execute against instances of the class they are targeted to. It wouldn't make sense, for instance, to launch a "Restart Service" task against a hard drive.

    Discovering instance and relationships is interesting. SCOM uses a "waterfall" approach to discovery. I will use SQL to illustrate. We'll begin by assuming we have a computer discovered. We create a discovery that discovers SQL servers and we'll target it the Computer. The system will then run this discovery rule on every Computer it knows about. When it runs on a computer that in fact has SQL installed, it will publish discovery data to our server and a new instance of SQL Server will be instantiated. Next, we have a rule targeted to SQL Server that discovers individual databases on the server, Once the configuration service gets notified of the new SQL instance, it will recalculate configuration and publish new configuration to the machine with the SQL server that includes this new discovery rule. This rule will then run and publish discovery information for the databases on the server. This allows deep discovery to occur without any user intervention, except for actually starting the waterfall. The first computer needs to be discovered, either by the discovery wizard, manual agent installation or programmatically via the SDK. For the latter, we support programmatic discovery of instances using the MCF portion of the SDK. Each connector is considered a discovery source and is able to submit discovery data on its behalf. When the connector goes away, all instances that were discovered solely by that connector also go away.

    The last thing I wanted to talk about were Monitors. Monitors define the state of an instance. Monitors also come in a hierarchy to help better model the state of an instance. The base of the hierarchy is called System.Health.EntityState and it represents THE state of an instance. Whenever you see state in the UI, it is the state of this particular monitor for that instance, unless stated otherwise. This particular monitor is an AggregateMonitor that rolls up state for its child monitors. The roll up semantics for aggregates are BestOf, WorstOf and Percentage. At the end of a monitor hierarchy chain must exist either a UnitMonitor or a DependencyMonitor. A UnitMonitor defines some single state aspect of a particular instance. For example, it may be monitoring the value of a particular performance counter. The importance of this particular monitor to the overall state of the instance is expressed by the monitor hierarchy it is a part of. Dependency monitors allow the state of one instance to depend on the state of another. Essentially, a dependency monitor allows you to define the relationship that is important to the state of this instance and the particular monitor of the target instance of this relationship that should be considered. One cool thing about monitors, is that their definition is inherited based on the class hierarchy. So System.Health.EntityState is actually targeted to System.Entity and thus all instance automatically inherit this monitor and can roll up state to it. What this means practically is that if you want to specialize a class into a class of your own, you don't need to redefine the entire health model, you can simply augment it by deriving your class from the class you which to extend and adding your own monitors to your class. You can even simply add monitors to the existing health model by targeting them anywhere in the class hierarchy that makes sense for you particular monitor.

    As always, let me know if there are any questions, anything you would like me to elaborate on or any ideas for future posts.

  • Jakub@Work

    More with Alert and State Change Insertion

    • 25 Comments

    Update: I have updated the management pack to work with the final RTM bits 

    Update #2: You cannot use the aggregate method described below to set the state of any instance not hosted on the Root Management Server.

    The last thing I had wanted to demonstrate about alert and state change insertion finally became resolved. This will not work in any of the public bits right now, but RC1 should be available soon and it will work there. Attached is the most recent version of the management pack to reference for this post. You'll have to clean up the references to match the proper public keys, but it should work otherwise.

    What I wanted to demonstrate was being able to define a monitor for a given class, but not having that monitor actually be instantiated for every instance of that class. Normally, if you define a monitor, you will get an instance of it (think a new workflow on your server) for every instance of the class the monitor is targeted to that is discovered. If you have thousands of instances, this can lead to significant performance issues. Now, we support the ability to define an aggregate monitor that will not be instantiated, as long as there are not monitors that roll up to it. In the sample MP attached you will find System.Connectors.SpecialAggregate that is an example of this kind of monitor. It works like any other aggregate monitor in the system, it just doesn't actually have any other monitors from which to roll up state. So how does its state gets set? That's where the additional changes come in.

    The first new addition is System.Connectors.Health.SetStateAction. The is the most basic form of a module that will be used to set the state of the aforementioned monitor. It this form, it accepts as configuration the ManagementGroupId, MonitorId (this is the monitor you want to set the state of), ManagedEntityId (this is the id of the instance you want to set the state of the monitor for) and the HealthState (this is the actual state you want to set the monitor to). There is a wrapper that abstracts away the need to set the ManagedEntityId and ManagementGroupId properties called System.Connectors.Health.TargetSetStateAction that will work with the SDK data types. There are also three further wrappers that explicitly define the state to set to monitor to, leaving only the MonitorId as configuration.

    I have included a sample rule (System.Connectors.SpecialAggregate.Error) that will drive the state of the aggregate monitor using the same sample code I posted earlier. Note that the rule is targeted to RootManagementServer since it will process data for a variety of instances, while the aggregate monitor should be targeted at the proper class that represents the instance you want to model and drive state for.

  • Jakub@Work

    Caching on the Brain

    • 16 Comments

    This week I have been working a lot on caching in the SDK, trying to optimize some code paths and improve performance as much as I can, so I decided to share a bit about how the cache works, and some insights into its implementation while it's fresh on my mind.

    SCOM 2007 relies heavily on configuration data to function. Class and relationship type definitions become especially important when dealing with discovered objects. We found that it was very often common to want to move up and down these class hierarchies, which would prove very costly from a performance standpoint if each operation required a roundtrip to the server and database. We also recognized that not all applications require this type of functionality, and incurring the additional memory hit was not desired (this was especially true for modules that need the SDK). Given this, the SDK has been designed with 3 different cache modes: Configuration, ManagementPacks and None. The cache mode you want to use can be specified using the ManagementGroupConnectionSettings object.

    First, let's go over what objects each cache mode will actually cache:

    Configuration: ManagementPack, MonitoringClass, MonitoringRelationshipClass, MonitoringViewType, UnitMonitorType, ModuleType derived classes, MonitoringDataType, MonitoringPage and MonitoringOverrideableParameter

    ManagementPacks: ManagementPack

    None: None =)

    For the first two modes there is also an event on ManagementGroup that will notify users of changes. OnTypeCacheRefresh is only fired in Configuration cache mode and indicates that something in the cache, other than ManagementPack objects changed. This means that the data in the cache is actually different. Many things can trigger a ManagementPack changing, but not all of them change anything other than the ManagementPack objects LastModified property (for instance, creating a new view, or renaming one). OnManagementPackCacheRefresh gets triggered when any ManagementPack object changes for whatever reason, even if it didn't change anything else in the cache. This event is available in both Configuration and ManagementPacks mode.

    So, when do you want to use each mode? Configuration is great is you are doing lots of operations in the configuration space, especially moving up and down the various type hierarchies. It is also useful when working extensively with MonitoringObject (not PartialMonitoringObject) and having the need to access the property values of many instances of different class types. Our UI runs in this mode. ManagementPacks is useful when configuration related operations are used, but not extensively. This is actually a good mode to do MP authoring in, which requires extensive hits to getting management packs, but not necessarily other objects. One thing that is important to note here is that every single object that exists in a management pack (rule, class, task, etc) requires a ManagementPack that is not returned in the initial call. If you call ManagementGroup.GetMonitoringRules(), every rule that comes back will make another call to the server to get its ManagementPack object if in cache mode None. If you are doing this, run in at least ManagementPacks cache mode, that's what it's for. None is great mode for operational data related operations. If you are mostly working with alerts or performance data, or even simply submitting a single task, this mode is for you. (None was not available until recently, and is not in the bits that are currently available for download).

    One more thing I want to mention. ManagementPacks, when cached, will always maintain the exact same instance of a ManagementPack object in memory, even if properties change. Other objects are actually purged and recreated. This is extremely useful for authoring as you can guarantee that when you have an instance of a management pack that may have been retrieved in different ways, it is always the same instance in memory. A practical example is you get a management pack calling ManagementGroup.GetManagementPack(Guid) and then you get a rule by calling ManagementGroup.GetMonitoringRule(Guid). The rule is in the same management pack conceptually as the GetManagementPack call returned, but who is to say it is the same instance? When you edit the rule, you will want to call ManagementPack.AcceptChanges() which (if the instances were not the same) would not change your rule, since the rule's internal ManagementPack may have been a different instance, and that's what maintains any changed state. This is not the case in Configuration and ManagementPacks cache mode. The instance that represents a certain management pack will always be the exact same instance and maintain the same state about what is being editing across the board. Now, that brings multi-threading and working with the same management pack across threads trickier, but there are public locking mechanisms for the ManagementPack object exposed to help with that.

    Lastly a quick note about how the cache works. The definitive copy of the cache is actually maintained in memory in the SDK service. The service registers a couple query notifications with SQL Server to be notified when things of interest change in the database, and that triggers a cache update on the server. When this update completes, the service loops through and notifies all clients that, when they had connected, requested to be notified of cache changes. Here we see another benefit of None cache mode; less server load in that less clients need to be notified of changes.

  • Jakub@Work

    RCO Sample Management Pack for Alert and State Change Insertion

    • 5 Comments

    It was brought to my attention that the management pack in the previous post did not import in RCO. I have attached an updated management pack that should.

Page 1 of 1 (4 items)