• musc@> $daniele.work.ToString()

    Microsoft Monitoring Agent, System Center Operations Manager and Visual Studio Application Insights

    • 4 Comments

    Since the release of System Center 2012 R2 Preview (and more after GA was announced) a lot of people asked me why did we rename the Operations Manager agent to "Microsoft Monitoring Agent"? Some information that went out together with the GA of SC2012R2 can be found at the following link: http://technet.microsoft.com/en-us/library/dn465154.aspx    
    And here is a post from Marnix, System Center MVP http://thoughtsonopsmgr.blogspot.com/2013/09/scom-2012-r2-hello-mma-microsoft.html    
    Essentially, Microsoft Monitoring Agent is not *only* the SCOM agent  anymore - the agent is now licensed with System Center OR with Visual Studio. When it was first released, it could already be used when reporting to SCOM (for monitoring), and it could also be used for standalone IntelliTrace collection (diagnostics scenario, more geared towards Dev/App owners). Read more in these other blog posts by Larry: Introducing Microsoft Monitoring Agent and Performance Details in IntelliTrace.

    Enter ‘Application Insights’

    With Microsoft Monitoring Agent 2013 Update Rollup 1 (at the time of this writing available as a preview), Microsoft Monitoring Agent can now also be used to report APM data to the brand new Application Insights Preview feature in Visual Studio Online that was announced a couple of weeks ago. Application Insights is an Azure-backed SaaS solution allowing teams to “[…] get insights from monitoring and going back and make the app better. Some people call it DevOps [...] but it's a sort of holistic view of the application: 360 degrees telemetry data like usage, performance, exception data, crash data, all that you need to have in live site to know how well your application is doing.[…]” (see the complete interview to Soma here).

    You can read more also on
    http://blogs.msdn.com/b/somasegar/archive/2013/11/13/visual-studio-2013-launch-announcing-visual-studio-online.aspx
    http://blogs.msdn.com/b/visualstudioalm/archive/2013/11/13/announcing-application-insights-preview.aspx    
    Application Insights 360 Dashboard

    So what powers some (but not all) of the data that you have at your fingertips in Application Insights – like you might have imagined - is the APM agent within MMA: the same APM agent you can use with OpsMgr. And in Application Insights you’ll see the same familiar data you see in OpsMgr such as exceptions and performance events (which can be exported to IntelliTrace format), and performance counters, but this time in a multi-tenant SaaS solution specifically designed for DevOps teams.

    APM Events in Application Insights

    MMA 2013 UR1 Preview is available as a standalone download from Microsoft Download Center (as well as from the Application Insights Preview itself, within the Visual Studio Online portal) and it is the first version of the agent that allows the agent to connect to both on-premises System Center OpsMgr systems as well as to the SaaS service.
    http://www.microsoft.com/en-us/download/details.aspx?id=41143
    http://msdn.microsoft.com/en-us/library/dn481094.aspx
    Microsoft Monitoring Agent - Select Connect to Application Insights

    NOTE: Keep in mind that at the time of this writing, this is a CTP (“Preview”) release of the agent. It is not supported by CSS for non-Visual Studio Online-related scenarios. Even though we are not currently aware of any major compatibility issues between this CTP and SCOM (or when multi-homing between Application Insights and SCOM), only very limited testing was done for this agent working together with SCOM at this stage. We encourage SCOM customers NOT to use it in their production environments and wait for the final Microsoft Monitoring Agent 2013 Update 1 release.

    In the future, anyhow, dual homing could be used to let your agent differentiate what data to send to which solution: i.e. send only alerting and performance information necessary for monitoring and triaging production issues to the on-prem System Center Operations Manager system, while the detailed and much more verbose code-level information can be sent for developers to consume in our multi-tenant SaaS APM offering within Team Foundation Service online (so you don’t have to worry about managing extra storage for APM data in the SCOM database), or to SCOM (maybe only data from some application, i.e. ‘PROD’ applications), or to both systems in various combinations – based on environment, project, operational model, processes and teams/ownership… It is for example very practical to use Application Insights to conduct functional and load tests in development and test environments – without the need to stand up another OpsMgr infrastructure, or to affect the scale and performance of the one that is designed to handle ‘prod’ data – to feed rich and actionable diagnostic information into the development lifecycle, to improve those applications even before they go in production.

    Maarten, one of the System Center MVP’s, has also started a series of post on Application Insights where he started sharing his perspective about the powerful hybrid monitoring scenarios that have been enabled when using Microsoft Monitoring Agent with Application Insights and with System Center 2012 R2.

    APM for Azure PaaS

    Added benefit - MMA, when used with Visual Studio online, can also be installed in Azure Cloud Services's instances (PaaS) - which was not a supported scenario in System Center (see this post where I mentioned this before). This is the first time we are able to offer true APM monitoring for Azure PaaS. In OpsMgr, agents are uniquely identified by their FQDN (Fully Qualified Domain Name), and everything in SCOM from connector to DB to DAL to SDK - all these components rely on agent names. Machine names in most corporate networks are well-defined pieces of information, follow a logical naming convention, and rarely change. SCOM Management Servers also rely on Kerberos/AD on premises and/or certificates (again using the FQDN) to authenticate the agents, and expect to only be talking to ‘well known’, pre-authorized, machines. But with Azure Cloud Services (PaaS) and IaaS (in certain configurations) you can have any number of cloned, identical, elastic instances of 'roles' (worker and web) deployed and running at any given time which appear and disappear as you scale them up and down. Machine names don't last and don’t matter much in Azure PaaS like they do on-prem... and it is much more natural to have Azure send APM data to Azure, not to on-prem - which would otherwise require opening inbound ports in your perimeter - remember the agent initiates connection to the infrastructure it reports to, be it a SCOM Management Group, or Application Insights in the cloud. The agent includes a brand new connector that can talk web-friendly protocols to report to the SaaS offering, which is a very different backend than an OpsMgr Management Server/Group. Application Insights uses a newly built backend running in Azure, written with Cloud-First principles. The way to authenticate to the service is thru an ‘application key’ (which represents the Application Insights tenant); the cloud service does not use the machine’s identity. You place the application key in the application’s configuration file, and Visual Studio allows you to package/bundle a script to silently install and configure the agent automatically, so that every time your PaaS roles are re-deployed, you will have the agent installed on it. Machines come and go, applications stay, and they need to be monitored – those applications and their lifecycle are what Application Insights and Visual Studio Online are all about. For infrastructure-level info you don’t need an agent, instead: from System Center Operations Manager, you can of course keep using the Azure Management Pack, which polls from the Azure Management API and does a better job to create/dispose of those ‘elastic’ objects that come and go (thru discoveries); if you are only in the cloud (=no on-prem infrastructure) you can find that type of OS-level info (CPU/Memory/Disk) in Azure Portal.

    Availability Monitoring

    Availability information (and other metrics such as external response time) that is tracked in Application Insights comes from synthetic tests providing an ‘Outside In’ perspective: single URL probes or Visual Studio webtests. If you are one of my OpsMgr readers, you would have probably understood this is backed by Global Service Monitor – the same service, offering ‘watcher nodes in the cloud’, that you can attach to OpsMgr.

    More than just APM (as we knew it in System Center)

    More explicit instrumentation can be added to apps in various ways, when reporting to Application Insights. These include:

    • Client side Usage monitoring : Client-side monitoring instrumentation in Application Insights is a completely different solution than the one in OpsMgr. First, the focus is on usage, visitors, and their experience – but more in the analytics sense, rather than with the alerting angle of the one in OpsMgr. Second, enablement is different: Application Insights provides you with a javascript snippet that can be added to any website, also if not .Net – unlike in OpsMgr where .Net server-side monitoring is used to hook up automatic injection of javascript – but the change must be done by a developer. The manual method in the end proves more compatible with many applications and browsers.
    • Server SDK's by which you can instrument logging for custom metrics in your code and have it report to the service directly
    • Client SDK for Windows Phone 8 apps by which you can instrument logging for custom metrics in your code and have it report to the service
    • Deployment information can be collected – see post from Charles – this is extremely useful to understand if changes in performance or reliability are related with deployments of new versions of the app/service
    • Beautiful Customizable Dashboards and a fresh, modern UI on top of all this data

    How can I try it out?

    Application Insights is currently in preview and you need an invitation to try it out. You might want to go to www.visualstudio.com and register for a VSO subscription and add yourself to the waiting list by clicking the blue “Try Application Insights” button.

    Some more links

    Series of videos on how to use it http://channel9.msdn.com/Series/Application-Insights-for-Visual-Studio-Online     
    Forum on MSDN http://social.msdn.microsoft.com/Forums/vstudio/en-US/home?forum=ApplicationInsights
    Documentation on MSDN http://msdn.microsoft.com/en-us/library/dn481095.aspx

  • musc@> $daniele.work.ToString()

    Programmatically create APM objects and configuration (w/ APM Explorer sample app)

    • 0 Comments

    I have been speaking to multiple customers, and a lot of them had the same feedback: “the APM template/wizard is great, BUT what if I want to automate enablement of monitoring when I provision new applications, without using the UI ?”. The request seems fair, but our extensibility/programmability story for APM currently doesn’t easily allow that.

     

    The APM template, like all templates, generates a management pack (or adds “stuff” to an existing management pack). Many other templates actually create classes/discoveries/rules/monitors… but APM provides a lot of settings which are really peculiar to its functionality, and don’t easily fit into the “standard” management packs/discoveries/rules/monitors pattern. What the APM template does is really to capture INTENT, and use that information to generate the right configuration on the agent.

    Sure, it still creates an MP, and it does create an object (<EntityType>) for the “application group” you are defining, within that MP. If you are wondering what an “application group” is, you might want to refer to this previous post of mine that explains more at a high level what objects are created by APM first: http://blogs.technet.com/b/momteam/archive/2012/01/14/apm-object-model.aspx then come back here.

    It also creates a discovery for the application group. What the APM discovery has that is really special, is its data source configuration, which features a “special sauce” you can see below (from an exported MP in my demo environment):

    AppChunk APM Data Source Config

    Is XML the one I see within the <AppChunk1> tag? It surely is, encoded XML, nested within the “outer” Management Pack’s XML… but XML nonetheless. For the non-programmers here, those &gt; and &lt; are encoded versions of open and closed tags: “>” and “<”. It gets encoded this way in order to have XML within other XML… that is because technically, the whole AppChunk1 is just a string – that is right: the MP Schema has NOT been extended to support special APM “stuff” – it all is just a string that gets used by a data source module as configuration. This configuration happens to be quite complex, but the MP will validate and import even if you write something incorrect in this string. BUT then the discovery module will choke on this input and fail (and raise an alert on the management server). Since “normally” the MP gets only written by our official UI/Template, then this is not an issue because we guarantee we will write it correctly (if not, we likely have a bug somewhere).

    But this is the reason why it isn’t supported to edit it outside of the template: because it is a non-trivial exercise, and it is easy to get it wrong without having a public schema to validate against. Also, since the AppChunk1 module configuration is not in the MP Schema, a future version of the module and template might change the way this piece of XML looks like.

    So, with the warning that all that I am going to write starting now is TOTALLY UNSUPPORTED, I will show you how to look at what the template builds, and try to replicate it. I won’t use any “insider knowledge” nor code officially released by Microsoft nor part of any product: I will just guide you thru looking at the XML output and try to make sense out of it. When I did this myself, I came up with some SAMPLE CODE  which I will provide, which can generate the same XML.

    Sounds easy enough, so let’s take a look at this <AppChunk1> and look at it once you add some carriage returns to make it more readable:

    AppChunk1

    I highlighted a few different blocks in it:

    • some global settings about the application group (name, environment tag, a unique GUID)
    • server-side monitoring settings (global)
    • client-side monitoring settings (global)
    • application-component-specific configuration (which application components to enable for monitoring for – this is essentially the list of “application components” within the “application group” (again, refer to this blog post for the object model and terminology)

    Most of the settings are self-explanatory, when you look at them… you will recognize they are all the same things that you configured in the template: namespaces, thresholds, etc…

    So with this knowledge, what does it take to create the same XML?

    It takes some code, of course. Some sample code is what I am going to provide in this post, linked below: I built a small sample application to demonstrate this. It is built with Visual Studio 2010 and compiled to run against .NET Framework 4.x. The sample application will let you connect to a management server (I only tested it against SC 2012 SP1) and will list all he applications you have (that have been discovered) and will show whether they are already enabled for monitoring, in which MP, and some of the settings applied to them. Please note that this is just an EXAMPLE, so it has not gone thru full testing and there is no guarantee that it will keep working with future updates. It also doesn’t understand nor show things like group scoping of the template, nor if the same application has been configured more than once in multiple templates, etc – in fact it might even be broken in some of those scenarios, as I have not done extensive testing!

    APM Explorer GUI

     

    To be totally fair and give due credit, this first part of the application got started by my friend Eric, when he was trying to figure out which applications he had configured and which ones he was still missing. So the first part of the code, which enumerates your endpoints, is actually coming from him. I eventually ended up writing a SQL query and SSRS report for THAT scenario (see here http://blogs.technet.com/b/momteam/archive/2012/08/22/apm-configured-endpoint-report.aspx ) but then re-used the GUI for this “configuration” experiment, instead.

    So back to configuration, the GUI is only meant to be a quick way to multi-select various “application components” from your inventory, and quickly create an MP to monitor them with APM:

    Right click Configure APM

     

    This brings up a form with the same basic settings in the template (note not ALL the settings available in the template have been implemented in my sample!)

    Application Group Settings

     

    When you click OK, a management pack will be written to disk (in the same folder where you launched the EXE from).

    This is a totally “importable” MP that should be working and creating all the right things that the template would have created, at least for the server-side settings.

     

    The tool isn’t extremely useful as-is (because you still have to go thru a UI, after all! – and if you have to use a UI, you might just as well use the official one that ships with the product!)… but that is not the point! The main goal is to show that it is possible to create the right XML thru custom code, to automate provisioning of APM monitoring for your apps. I built a UI on it to let you validate how it works. But eventually, if you want to automate enablement of monitoring for your own apps, you will only really care about the “APMMPGenerator” class, which is the one that does the “dirty” work of creating the MP!

    As I wrote earlier, I didn’t use any insider knowledge, and this code is absolutely NOT the same code as to what is used within the product itself – it just happens to produce the same XML fragment as output. To further prove that this could be done by anyone, I purposely kept the code NOT elegant: by this I mean that I didn’t treat XML as such nor used classes in the framework to deal with XML like a true programmer would have done, no schema validation, nothing of that sort! Instead, I hacked together the required XML by using quick and dirty STRING manipulation and token replacement. While most real programmers will be probably thinking this code should be posted on TheDailyWTF, I stand behind my choice, and I believe many IT Pro’s and operations manager administrator that don’t write code every day will actually appreciate it and find it more readable, and probably easier to port to PowerShell, Perl, or their favorite scripting language. APMMPGenerator is the main class in this sample code that is relevant to learn how to write the required pieces of the MP:

    APMMPGenerator class

     

    This class and its methods are heavily commented ‘step by step’, and it will show how you can generate the XML for a management packs to be used with SCOM to enable APM.

    Writing XML for management packs thru code (and concatenating strings) is in my opinion a very powerful technique that I have also used in the past to build other MPs that were “repetitive” (i.e. needed to contain many “similar” groups, rules, monitors, etc), and should allow people to more easily port it to other languages (i.e. PowerShell, for automation, sounds like a good choice…).

     

     

    Gotcha’s / Disclaimer

    • Client-side monitoring settings cannot be created by just writing XML “offline” and “outside” of an OpsMgr Management Group, because the real APM template creates a binary RunAs Account that is used as an encryption key to secure the browser-to-server communication (so that random attackers cannot just feed bad data to the CSM Collector endpoint, but they need to be valid/real browsers doing that). This is something that has to call the SDK on the management server, to see if such an account already exists or not, etc… it gets a lot trickier and it is simply not possible, with the current design to create such part of the MP “offline” just by crafting XML. This said, once the MP is imported, you can go and EDIT it again in the template, add client-side monitoring, apply/save, and the right things should happen there and then.
    • The tool’s code does NOT create a FOLDER and VIEWs for the monitored applications. I left that as an exercise for the reader. if you look at the views that the template creates, there really isn’t anything too special about them – they are just standard views, like those in any other MP. Hence I didn’t spend time there… there are examples on how to add views here http://msdn.microsoft.com/en-us/library/bb437626.aspx and here http://msdn.microsoft.com/en-us/library/bb960509.aspx (among other places…). Like the above, editing the template after the fact should add the views at that point, when saving.
    • Other than the above “AppChunk1”, there are a few more things that the class creates but I didn’t describe: things like references to other MPs, display strings, and info required to make the “template instance” appear in the “Authoring” pane of the console, so it can be further edited later on. I am not describing those since they are all “standard” Management Packs elements… documentation on MSDN, like for the views, above.
    • All of this (tool, sample code, post) is TOTALLY NOT SUPPORTED. I repeat it: NOT SUPPORTED. I am not encouraging anybody to use this! The only supported way to do this stuff is to use the Template, which is what has been written by professional developers and tested. What I did here is to put myself in the customers’ shoes, look at what the template builds, and tried to replicate it. I didn’t use any “insider knowledge” nor code owned by Microsoft to do this – I did what anyone of you could have done: observe the MP, and try to build one that looks the same. Call it reverse engineering, if you wish. Anyway, since some people have expressed the need to automate enablement of monitoring… this is the only way I can think of enabling that with the current product. I know. There are plenty of smart people in the OpsMgr community, who don’t' get scared about creating custom solutions on top of the platform. This is a post for them.
    • All of the above is not supported. No, really. Just in case you missed it.  If you really want to use it, please evaluate in your test environment first! As expected, this solution is provided AS-IS, with no warranties and confers no rights. Future versions of this tool may be created based on time and requests.

     

  • musc@> $daniele.work.ToString()

    Lonely blog for almost a year, and see you at MMS 2013 next week

    • 0 Comments

    Wow, I haven’t written here in a while. My last post on this blog is from over a year ago, and referred to the BETA of System Center 2012 Service Pack !

    Since then, the final version of that Service Pack 1 has shipped and Global Service Monitor has been made general available too - http://blogs.technet.com/b/momteam/archive/2013/01/15/system-center-2012-sp1-operations-manager-is-generally-available.aspx

    I have not really been completely silent, tho – just on this blog. With regards to SP1, I have recorded a short presentation for Microsoft Virtual Academy about what is new in SP1 http://technet.microsoft.com/en-US/video/JJ873818 – if you are coming at MMS 2013 next week, you will hear a lot more about these enhancements.

    I also blogged a few technical posts on the momteam blog, a few of those posts here in case you missed them:

    APM Configured Endpoint report
    http://blogs.technet.com/b/momteam/archive/2012/08/22/apm-configured-endpoint-report.aspx

    Event-to-Alert ratio, reviewing problems and understanding trends for APM data in OpsMgr 2012
    http://blogs.technet.com/b/momteam/archive/2012/06/18/event-to-alert-ratio-reviewing-problems-and-understanding-trends-for-apm-data-in-opsmgr-2012.aspx

    APM Agent Throttling settings and other APM Overrides in SC2012 Operations Manager
    http://blogs.technet.com/b/momteam/archive/2012/12/19/apm-throttling-settings-and-other-apm-overrides-in-sc2012-operations-manager.aspx

    I also kept updating and fixing bugs in MPViewer and  OverrideExplorer – for which I keep updating always the same post here http://blogs.msdn.com/b/dmuscett/archive/2012/02/19/boris-s-tools-updated.aspx

    I have also been busy with a couple of personal projects, such as restoring a mis-treated guitar I got in a thrift store ( http://www.muscetta.com/2013/01/21/restoring-an-electric-guitar/ ) an building one (almost) from scratch ( http://www.flickr.com/photos/dani3l3/sets/72157632658946681/ ).

     

    I will be at MMS 2013 next week, and you can catch me at a couple of different sessions:

    IM-B202 System Center 2012 SP1 Operations Manager Overview  - Tuesday, April 9 8:30 AM - 9:45 AM South Seas B

    IM-B318 Panel Discussion: System Center Operations Manager - Tuesday, April 9 10:15 AM - 11:30 AM Mandalay Bay Ballroom L

    AM-B302 Developers and Operations Engineers: System Center and Visual Studio - Wednesday, April 10 12:00 PM - 1:15 PM South Seas F

    AM-B306 DevOps: Azure Monitoring & Authoring Updates for Operations Manager 2012 SP1 - Thursday, April 11 2:45 PM - 4:00 PM Jasmine E

  • musc@> $daniele.work.ToString()

    Operations Manager 2012 SP1 BETA is out, and some cool things you might not (yet) know about it

    • 4 Comments

    It has been a couple of months since we released the CTP2 release (I had blogged about that here http://www.muscetta.com/2012/06/16/operations-manager-2012-sp1-ctp2-is-out-and-my-teched-na-talk-mgt302/ ) and we have now reached the Beta milestone!

    Albeit you might have already seen a number of posts about this last week (i.e. http://blogs.technet.com/b/server-cloud/archive/2012/09/10/system-center-2012-sp1-beta-available-evaluate-with-windows-server-2012.aspx or http://blogs.technet.com/b/momteam/archive/2012/09/11/system-center-2012-service-pack-1-beta-now-available-for-download.aspx), I see the information on the blogs so far didn’t quite explain all the various new features that went into it, and I want to give a better summary specifically about the component that I work on: Operations Manager.

    Keep in mind the below is just my personal summary – the official one is here http://technet.microsoft.com/en-us/library/jj656650.aspx – and it actually does explain these things… but since some OpsMgr community reads a lot of blogs, I wanted to highlight some points of this release.

    Platform Support

    • Support for installing the product on Windows Server 2012 for all components: agent, server, databases, etc.
    • Support for using SQL Server 2012 to host the databases

    Cloud Services

    • Global Service Monitor - This is actually something that Beta version enables, but the required MPs don’t currently ship with the Beta download directly - you will be able to sign up for the Beta of GSM here. Once you have registered and imported the new MPs, you will be able to use our cloud based capability to monitor the health of your web applications from geo-distributed perspective that Microsoft manages and runs on Windows Azure, just like you would from your own agent/watcher nodes. Think of it as an extension of your network, or “watcher nodes in the cloud”

    APM-Related improvements

    this is my area and what myself and the team I am in specifically works on – so I personally had the privilege to drive some of this work (not all - some other PMs drove some of this too!)

    • Support for IIS8 with APM (.NET application performance monitoring) – this enables APM to monitor applications running on Windows Server 2012, not just 2008 anymore. The new Windows Server 2012 and IIS8 Management packs are required for this to work. Please note that, if you have imported the previous, “Beta” Windows 8 Management packs, they will need to be removed prior to installing the official Windows Server 2012 Management Packs. About Windows Server 2012 support and MPs, read more here http://blogs.technet.com/b/momteam/archive/2012/09/05/windows-server-2012-system-center-operations-manager-support.aspx
    • Monitoring of WCF, ASP.NET MVC and .NET NT services – we made changes to the agent so that we better understand and present data related to calls to WCF Services, we support monitoring of ASP.NET MVC applications, and we enabled monitoring of Windows Services that are built on the .NET framework – the APM documentation here is updated in regards to these changes and refers to both 2012 RTM and SP1 (pointing out the differences, when needed) http://technet.microsoft.com/en-us/library/hh457578.aspx
    • Introduction of Azure SDK support – this means you can monitor applications that make use of Azure Storage with APM, and the agent is now aware of Azure tables, blobs, queues as SQL Azure calls. It essentially means that APM events will tell you things like “your app was slow when copying that azure blob” or “you got an access denied when writing to that table”
    • 360 .NET Application Monitoring Dashboards – this brings together different perspectives of application health in one place: it displays information from Global Service Monitor, .NET Application Performance Monitoring, and Web Application Availability Monitoring to provide a summary of health and key metrics for 3-tier applications in a single view. Documentation here http://technet.microsoft.com/en-us/library/jj614613.aspx
    • Monitoring of SharePoint 2010 with APM (.NET application performance monitoring) - this was a very common ask from the customers and field, and some folks were trying to come up with manual configurations to enable it (i.e. http://blogs.technet.com/b/shawngibbs/archive/2012/03/01/system-center-2012-operation-manager-apm.aspx ) but now this comes out of the box and it is, in fact, better than what you could configure: we had to change some of the agent code, not just configuration, to deal with some intricacies of Sharepoint…
    • Integration with Team Foundation Server 2010 and Team Foundation Server 2012 - functionality has also been enhanced in comparison to the previous TFS Synchronization management pack (which was shipped out of band, now it is part of Operations Manager). It allows Operations teams to forward APM alerts ( http://blogs.technet.com/b/momteam/archive/2012/01/23/custom-apm-rules-for-granular-alerting.aspx ) to Developers in the form of TFS Work Items, for things that operations teams might not be able to address (i.e. exceptions or performance events that could require fixes/code changes)
    • Conversion of Application Performance Monitoring events to IntelliTrace format – this enables developers to get information about exceptions from their applications in a format that can be natively used in Visual Studio. Documentation for this feature is not yet available, and it will likely appear as we approach the final release of the Service Pack 1. This is another great integration point between Operations and Development teams and tools, contributing to our DevOps story (my personal take on which was the subject of an earlier post of mine: http://www.muscetta.com/2012/02/05/apm-in-opsmgr-2012-for-dev-and-for-ops/)

    Unix/Linux Improvements

    Audit Collection Services

    • Support for Dynamic Access Control in Windows Server 2012 - When was the last time that an update to ACS was made? Seems like a long time ago to me…. Windows Server 2012 enhances the existing Windows ACL model to support Dynamic Access Control. System Center 2012 Service Pack 1 (SP1) contributes to the fulfilling these scenarios by providing enterprise-wide visibility into the use of the Dynamic Access Control.

    Network Monitoring

    • Additional network devices models supported – new models have been tested and added to the supported list
    • Visibility into virtual network switches in vicinity dashboard – this requires integration with Virtual Machine Manager to discover the network switches exposed by the hypervisor

     

     

    Reminders:

    • Production use is NOT supported for customers who are not part of the TAP program
    • Upgrade from CTP2 to Beta is NOT Supported
    • Upgrade from 2012 RTM to SP1 Beta will ONLY be supported for customers participating in the TAP Program
    • Procedures not covered in the documentation might not work

     

     

     

    Download http://www.microsoft.com/en-us/download/details.aspx?id=34607

  • musc@> $daniele.work.ToString()

    Operations Manager 2012 SP1 CTP2 is out, and my TechED NA talk (MGT302)

    • 0 Comments

    As you might have already heard, this has been an amazing week at TechEd North America: System Center 2012 has been voted as the Best Microsoft Product at TechEd, and we have released the Community Technology Preview (CTP2) of all System Center 2012 SP1 components.

    I wrote a (quick) list of the changes in Operations Manager CTP2 in this other blog post and many of those are related to APM (formerly AVIcode technology). I have also demoed some of these changes in my session on thursday – you can watch the recording here. I think one of the most-awaited change is support for monitoring Windows Services written in .NET – but there is more than that!

    In the talk I also covered a bit of Java monitoring (which is the same as in 2012, no changes in SP1) and my colleague  Åke Pettersson talked about Synthetic Transactions, and how to bring all together (synthetic and APM) in a single new dashboard (also shipping in SP1 CTP2) that gives you a 360 degrees view of your applications. The CTP2 documentation covers both the changes to APM as well as how to light up this new dashboard.

    When it comes to synthetics  – I know you have been using them from your own agents/watcher nodes – but to have a complete picture from the outside in (or last mile), we have now also announced the Beta of Global Service Monitoring (it was even featured in the Keynote!) – where essentially we extend your OpsMgr infrastructure to the cloud, and allow you to upload your tests to our Azure-based service and we will run those tests against your Internet-facing applications from our watcher nodes in various datacenters around the globe and feed back the data to your OpsMgr infrastructure, so that you can see how your application is available and responding from those locations. You can sign up for the consumer preview of GSM from the connect site.

    Enjoy your beta testing! (Isn’t that what weekends are for, geeks?)

  • musc@> $daniele.work.ToString()

    Boris’s OpsMgr Tools – Updated

    • 60 Comments

    Over the years, Boris has released a set of phenomenal tools, that saved lives of OpsMgr administrators quite some time in performing common OpsMgr tasks in OpsMgr 2007 and 2007 R2..

    The sad news is that Boris has moved to another team within Microsoft. He has made a tremendous contribution over the years to the OpsMgr product, and I am sure he will rock on into his new role and team. At the same time he will be missed.

    In order to not let those tools go to waste, since I know many people use them, I have asked him to give me the code of his tools and allow me to update and maintain those tools going forward. And so I did: I updated a couple of his tools to work with OpsMgr 2012:

     

    Tool Description
    MPViewer 2.3.3 The previous version 1.7 (that works with OpsMgr 2007 and 2007 R2) was released here. Version 2.3.3 has been updated to work with OpsMgr 2012, and now includes support for MPB files (MP Bundles), shows embedded resources in bundles (such as images or scripts), loads MPs asynchronously, and has the ability to Unseal and Unpack MP Bundles.
    OverrideExplorer 3.7 The previous version 3.3 (that works with OpsMgr 2007 and 2007 R2) was released here. Version 3.7 has been updated to work with OpsMgr2012 and includes some minor fixes, as well as the capability to Export all overrides to an Excel spreadsheet. It also now shows both Windows and Unix computers in the computers view.
    Proxy Settings 1.2 The previous version 1.1 (that works with OpsMgr 2007 and 2007 R2) was released here. Version 1.2 is functionally identical to the previous version but has been just recompiled to work with OpsMgr 2012 SDK.

    OverrideCreator 1.5

    The previous version (that works with OpsMgr 2007 and 2007 R2) was released here. Version 1.5 is functionally identical to the previous version but has been just recompiled to work with OpsMgr 2012 SDK.
                     

    All the above tools require the Operations Manager Console being installed on the machine where you run them, as well as the .NET framework 4.0.

    According to my information, the above four tools were the most used/useful. Feel free to comment if need any other one being updated and/or have bug reports or feature requests – albeit I don’t promise I will be able to fix or update everything Smile

    Disclaimer

    Just like their predecessors, it is necessary to make clear that this posting is provided "AS IS" with no warranties, and confers no rights.
    Use of included utilities are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm

     

    Changelog / Updates

    [Updated on March 8th 2012 with MPViewer 1.9.1 that contains a fix for the Excel export of some MPs]

    [Updated on March 15th 2012 with MPViewer 2.0 that now allows you to Unseal/Unpack MPs and MPBundles]

    [Updated on March 21st 2012 with OverrideExplorer 3.5 which now allows to export Overrides to Excel]

    [Updated on July 19th 2012 with MPViewer 2.1 that now shows the PublicKeyToken for referernces/dependencies]

    [Updated on August 29th 2012 with MPViewer 2.1.2 that contains fixes to show Perf Objects, Counters and Frequency for some more modules]

    [Updated on September 29th 2012 with MPViewer 2.2 that contains cosmetic as well as reliability/responsiveness fixes]

    [Updated on October 3rd 2012 with MPViewer 2.2.1 that contains a fix for a crash when opening Unsealed MPs]

    [Updated on November 20th 2012 with OverrideExplorer 3.6 that contains a fix for the “change target” operation that was creating broken overrides when changing target from a group to another group]

    [Updated on April 26th 2013 with MPViewer 2.2.2 that contains a fix for some rules in the IIS MP that were incorrectly being reported as not generating alerts, and another fix for the "unseal/unbundle" menu item that sometimes was not being enabled]

    [Updated on May 9th 2013 with MPViewer 2.3 that now can also handle MP Bundles that contain multiple ManagementPacks in a single bundle]

    [Updated on May 14th 2013 with OverrideCreator 1.5 – first working version for OpsMgr 2012]

    [Updated on November 23rd 2013 with OverrideExplorer 3.7 - now includes Unix computers in the computers view]

    [Updated on February 17th 2014 with MPViewer 2.3.2 - now shows (most) event ID's and Event Sources for Event Rules]

    [Updated on March 21st 2014 with MPViewer 2.3.3 - now allows both HTML and XLS export in bulk thru command line - more info in the comment thread below]

  • musc@> $daniele.work.ToString()

    A couple of OpsMgr / APM Posts

    • 0 Comments

    Just some shameless personal plug here, pointing out that I recently wrote two technical posts on the momteam blog about the APM feature in Operations Manager 2012 – maybe you want to check them out:

    1. APM object model – describes the object model that gets created by the APM Template/Wizard when you configure .NET application monitoring
    2. Custom APM Rules for Granular Alerting – explains how you can leverage management pack authoring techniques to create alerting rules with super-granular criteria’s (building beyond what the GUI would let you do)

    Hope you find them useful – if you are one of my “OpsMgr readers” Smile

  • musc@> $daniele.work.ToString()

    Operations Manager 2012 Release Candidate is out of the bag!

    • 0 Comments

    Go read the announcement at http://blogs.technet.com/b/server-cloud/archive/2011/11/10/system-center-operations-manager-2012-release-candidate-from-the-datacenter-to-the-cloud.aspx

    This is the first public release since I am part of the team (I started in this role the day after the team had shipped Beta) and this is the first release that contains some direct output of my work. It feels so good!

    Documentation has also been refreshed – it starts here http://technet.microsoft.com/en-us/library/hh205987.aspx

    The part specifically about the APM feature is here http://technet.microsoft.com/en-us/library/hh457578.aspx

    Enjoy!

  • musc@> $daniele.work.ToString()

    Repost: Useful SetSPN tips

    • 0 Comments

    I just saw that my former colleague (PFE) Tristan has posted an interesting note about the use of SetSPN “–A” vs SetSPN “–S”. I normally don’t repost other people’s content, but I thought this would be useful as there are a few SPN used in OpsMgr and it is not always easy to get them all right… and you can find a few tricks I was not aware of, by reading his post.

    Check out the original post at http://blogs.technet.com/b/tristank/archive/2011/10/10/psa-you-really-need-to-update-your-kerberos-setup-documentation.aspx

  • musc@> $daniele.work.ToString()

    A month in a new life

    • 0 Comments

    Hey, I have just realized that I have been in my new PM role for a month already – time flies!

    If you are one of my OpsMgr readers, in case you haven’t noticed, I have been silent here but I have published a post on the momteam blog – check it out: http://blogs.technet.com/b/momteam/archive/2011/08/12/application-performance-monitoring-in-opsmgr-2012-beta.aspx

    If you are one of those few readers interested in following what I do, instead – I can tell you that I am loving the new job. Lot to do, of course, and that also applies to  the private sphere – did you know that relocating to another continent takes some energy and effort? - but we are settling in nicely and things are going very smooth overall.

  • musc@> $daniele.work.ToString()

    I have been chosen; Farewell my friends...

    • 1 Comments

    I have been in Premier Field Engineering for nearly 7 years (it was not even called PFE when I joined - it was just "another type of support"...) and I have to admit that it has been a fun, fun ride: I worked with awesome people and managed to make a difference with our products and services for many customers - directly working with some of those customers, as well as indirectly thru the OpsMgr Health Check program - the service I led for the last 3+ years, which nowadays gets delivered hundreds of times a year around the globe by my other fellow PFEs.

    But it is time to move on: I have decided to go thru a big life change for me and my family, and I won't be working as a Premier Field Engineer anymore as of next week.

    But don't panic - I am staying at Microsoft!

    I have actually never been closer to Microsoft than now: we are packing and moving to Seattle the coming weekend, and on July 18th I will start working as a Program Manager in the Operations Manager product team, in Redmond. I am hoping this will enable me to make a difference with even more customers.

    Exciting times ahead - wish me luck!

    Farewell my friends, I go on to a better place

     

    That said – PFE is hiring! If you are interested in working for Microsoft – we have open positions (including my vacant position in Italy) for almost all the Microsoft technologies. Simply visit http://careers.microsoft.com and search on “PFE”.

    As for the OpsMgr Health Check, don't you worry: it will continue being improved - I left it in the hands of some capable colleagues: Bruno Gabrielli, Stefan Stranger and Tim McFadden - and they have a plan and commitment to update it to OpsMgr 2012.

  • musc@> $daniele.work.ToString()

    Improved ACS Partitions Query

    • 0 Comments

    This has been sitting on my hard drive for a long time. Long story short, the report I posted at Permanent Link to Audit Collection Services Database Partitions Size Report had a couple of bugs:

    1. it did not consider the size of the dtString_XXX tables but only the size of dtEvent_XXX tables – this would still give you an idea of the trends, but it could lead to quite different SIZE calculations
    2. the query was failing on some instances that have been installed with the wrong (unsupported) Collation settings.

    I fixed both bugs, but I don’t have a machine with SQL 2005 and Visual Studio 2005 anymore… so I can’t rebuild my report – but I don’t want to distribute one that only works on SQL 2008 because I know that SQL2005 is still out there. This is partially the reason that held this post back.

    Without waiting so much longer, therefore, I decided I’ll just give you the fixed query. Enjoy Smile

     

    --Query to get the Partition Table
    --for each partition we launch the sp_spaceused stored procedure to determine the size and other info
    
    --partition list
    select PartitionId,Status,PartitionStartTime,PartitionCloseTime 
    into #t1
    from dbo.dtPartition with (nolock)
    order by PartitionStartTime Desc 
    
    
    --sp_spaceused holder table for dtEvent
    create table #t2 (
        PartitionId nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
        rows nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
        reserved nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
        data nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
        index_size nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
        unused nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS    
    )
    
    --sp_spaceused holder table for dtString
    create table #t3 (
        PartitionId nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
        rows nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
        reserved nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
        data nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
        index_size nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
        unused nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS    
    )
    
    
    set nocount on
    
    --vars used for building Partition GUID and main table name
    declare @partGUID nvarchar(MAX)
    declare @tblName nvarchar(MAX)
    declare @tblNameComplete nvarchar(MAX)
    declare @schema nvarchar(MAX)
    DECLARE @vQuery NVARCHAR(MAX)
    
    --cursor
    declare c cursor for 
        select PartitionID from #t1
    open c
    fetch next from c into @partGUID
    
    --start cursor usage
    while @@FETCH_STATUS = 0
    begin
    
    --tblName - first usage for dtEvent
    set @tblName = 'dtEvent_' + @partGUID
    
    --retrieve the schema name
    SET @vQuery = 'SELECT @dbschema = TABLE_SCHEMA from INFORMATION_SCHEMA.tables where TABLE_NAME = ''' + @tblName + ''''
    EXEC sp_executesql @vQuery,N'@dbschema nvarchar(max) out, @dbtblName nvarchar(max)',@schema out, @tblname
    
    --tblNameComplete
    set @tblNameComplete = @schema + '.' + @tblName
    
    INSERT #t2 
        EXEC sp_spaceused @tblNameComplete
    
    
        
        
        
    --tblName - second usage for dtString
    set @tblName = 'dtString_' + @partGUID
    
    --retrieve the schema name
    SET @vQuery = 'SELECT @dbschema = TABLE_SCHEMA from INFORMATION_SCHEMA.tables where TABLE_NAME = ''' + @tblName + ''''
    EXEC sp_executesql @vQuery,N'@dbschema nvarchar(max) out, @dbtblName nvarchar(max)',@schema out, @tblname
    
    --tblNameComplete
    set @tblNameComplete = @schema + '.' + @tblName
    
    INSERT #t3 
        EXEC sp_spaceused @tblNameComplete
    
        
        
        
    fetch next from c into @partGUID
    end
    close c
    deallocate c
    
    
    --select * from #t2
    --select * from #t3
    
    
    --results
    select #t1.PartitionId, 
        #t1.Status, 
        #t1.PartitionStartTime, 
        #t1.PartitionCloseTime, 
        #t2.rows,
        (CAST(LEFT(#t2.reserved,LEN(#t2.reserved)-3) AS NUMERIC(18,0)) + CAST(LEFT(#t2.reserved,LEN(#t2.reserved)-3) AS NUMERIC(18,0))) as 'reservedKB', 
        (CAST(LEFT(#t2.data,LEN(#t2.data)-3) AS NUMERIC(18,0)) + CAST(LEFT(#t3.data,LEN(#t3.data)-3) AS NUMERIC(18,0)))as 'dataKB', 
        (CAST(LEFT(#t2.index_size,LEN(#t2.index_size)-3) AS NUMERIC(18,0)) + CAST(LEFT(#t3.index_size,LEN(#t3.index_size)-3) AS NUMERIC(18,0))) as 'indexKB', 
        (CAST(LEFT(#t2.unused,LEN(#t2.unused)-3) AS NUMERIC(18,0)) + CAST(LEFT(#t3.unused,LEN(#t3.unused)-3) AS NUMERIC(18,0))) as 'unusedKB'
    from #t1
    join #t2
    on #t2.PartitionId = ('dtEvent_' + #t1.PartitionId)
    join #t3
    on #t3.PartitionId = ('dtString_' + #t1.PartitionId)
    order by PartitionStartTime desc
    
    
    
    --cleanup
    drop table #t1
    drop table #t2
    drop table #t3
  • musc@> $daniele.work.ToString()

    OpsMgr Agents and Gateways Failover Queries

    • 0 Comments

    The following article by Jimmy Harper explains very well how to set up agents and gateways’ failover paths thru Powershell http://blogs.technet.com/b/jimmyharper/archive/2010/07/23/powershell-commands-to-configure-gateway-server-agent-failover.aspx . This is the approach I also recommend, and that article is great – I encourage you to check it out if you haven’t done it yet!

    Anyhow, when checking for the actual failover paths that have been configured, the use of Powershell suggested by Jimmy is rather slow – especially if your agent count is high. In the Operations Manager Health Check tool I was also using that technique at the beginning, but eventually moved to the use of SQL queries just for performance reasons. Since then, we have been using these SQL queries quite successfully for about 3 years now.

    But this the season of giving... and I guess SQL Queries can be a gift, right? Therefore I am now donating them as Christmas Gift to the OpsMrg community Smile

    Enjoy – and Merry Christmas!

     

    --GetAgentForWhichServerIsPrimary
    SELECT SourceBME.DisplayName as Agent,TargetBME.DisplayName as Server
    FROM Relationship R WITH (NOLOCK) 
    JOIN BaseManagedEntity SourceBME 
    ON R.SourceEntityID = SourceBME.BaseManagedEntityID 
    JOIN BaseManagedEntity TargetBME 
    ON R.TargetEntityID = TargetBME.BaseManagedEntityID 
    WHERE R.RelationshipTypeId = dbo.fn_ManagedTypeId_MicrosoftSystemCenterHealthServiceCommunication() 
    AND SourceBME.DisplayName not in (select DisplayName 
    from dbo.ManagedEntityGenericView WITH (NOLOCK) 
    where MonitoringClassId in (select ManagedTypeId 
    from dbo.ManagedType WITH (NOLOCK) 
    where TypeName = 'Microsoft.SystemCenter.GatewayManagementServer') 
    and IsDeleted ='0') 
    AND SourceBME.DisplayName not in (select DisplayName from dbo.ManagedEntityGenericView WITH (NOLOCK) 
    where MonitoringClassId in (select ManagedTypeId from dbo.ManagedType WITH (NOLOCK) 
    where TypeName = 'Microsoft.SystemCenter.ManagementServer') 
    and IsDeleted ='0') 
    AND R.IsDeleted = '0'
    
    
    --GetAgentForWhichServerIsFailover
    SELECT SourceBME.DisplayName as Agent,TargetBME.DisplayName as Server
    FROM Relationship R WITH (NOLOCK) 
    JOIN BaseManagedEntity SourceBME 
    ON R.SourceEntityID = SourceBME.BaseManagedEntityID 
    JOIN BaseManagedEntity TargetBME 
    ON R.TargetEntityID = TargetBME.BaseManagedEntityID 
    WHERE R.RelationshipTypeId = dbo.fn_ManagedTypeId_MicrosoftSystemCenterHealthServiceSecondaryCommunication() 
    AND SourceBME.DisplayName not in (select DisplayName 
    from dbo.ManagedEntityGenericView WITH (NOLOCK) 
    where MonitoringClassId in (select ManagedTypeId 
    from dbo.ManagedType WITH (NOLOCK) 
    where TypeName = 'Microsoft.SystemCenter.GatewayManagementServer') 
    and IsDeleted ='0') 
    AND SourceBME.DisplayName not in (select DisplayName 
    from dbo.ManagedEntityGenericView WITH (NOLOCK) 
    where MonitoringClassId in (select ManagedTypeId 
    from dbo.ManagedType WITH (NOLOCK) 
    where TypeName = 'Microsoft.SystemCenter.ManagementServer') 
    and IsDeleted ='0') 
    AND R.IsDeleted = '0'
    
    
    --GetGatewayForWhichServerIsPrimary
    SELECT SourceBME.DisplayName as Gateway, TargetBME.DisplayName as Server
    FROM Relationship R WITH (NOLOCK) 
    JOIN BaseManagedEntity SourceBME 
    ON R.SourceEntityID = SourceBME.BaseManagedEntityID 
    JOIN BaseManagedEntity TargetBME 
    ON R.TargetEntityID = TargetBME.BaseManagedEntityID 
    WHERE R.RelationshipTypeId = dbo.fn_ManagedTypeId_MicrosoftSystemCenterHealthServiceCommunication() 
    AND SourceBME.DisplayName in (select DisplayName 
    from dbo.ManagedEntityGenericView WITH (NOLOCK) 
    where MonitoringClassId in (select ManagedTypeId 
    from dbo.ManagedType WITH (NOLOCK) 
    where TypeName = 'Microsoft.SystemCenter.GatewayManagementServer') 
    and IsDeleted ='0') 
    AND R.IsDeleted = '0'
        
    
    --GetGatewayForWhichServerIsFailover
    SELECT SourceBME.DisplayName As Gateway, TargetBME.DisplayName as Server
    FROM Relationship R WITH (NOLOCK) 
    JOIN BaseManagedEntity SourceBME 
    ON R.SourceEntityID = SourceBME.BaseManagedEntityID 
    JOIN BaseManagedEntity TargetBME 
    ON R.TargetEntityID = TargetBME.BaseManagedEntityID 
    WHERE R.RelationshipTypeId = dbo.fn_ManagedTypeId_MicrosoftSystemCenterHealthServiceSecondaryCommunication() 
    AND SourceBME.DisplayName in (select DisplayName 
    from dbo.ManagedEntityGenericView WITH (NOLOCK) 
    where MonitoringClassId in (select ManagedTypeId 
    from dbo.ManagedType WITH (NOLOCK) 
    where TypeName = 'Microsoft.SystemCenter.GatewayManagementServer') 
    and IsDeleted ='0') 
    AND R.IsDeleted = '0'
    
    
    --xplat agents
    select bme2.DisplayName as XPlatAgent, bme.DisplayName as Server
    from dbo.Relationship r with (nolock) 
    join dbo.RelationshipType rt with (nolock) 
    on r.RelationshipTypeId = rt.RelationshipTypeId 
    join dbo.BasemanagedEntity bme with (nolock) 
    on bme.basemanagedentityid = r.SourceEntityId 
    join dbo.BasemanagedEntity bme2 with (nolock) 
    on r.TargetEntityId = bme2.BaseManagedEntityId 
    where rt.RelationshipTypeName = 'Microsoft.SystemCenter.HealthServiceManagesEntity' 
    and bme.IsDeleted = 0 
    and r.IsDeleted = 0 
    and bme2.basemanagedtypeid in (SELECT DerivedTypeId 
    FROM DerivedManagedTypes with (nolock) 
    WHERE BaseTypeId = (select managedtypeid 
    from managedtype where typename = 'Microsoft.Unix.Computer') 
    and DerivedIsAbstract = 0)
  • musc@> $daniele.work.ToString()

    Got Orphaned OpsMgr Objects?

    • 0 Comments

    Have you ever wondered what would happen if, in Operations Manager, you’d delete a Management Server or Gateway that managed objects (such as network devices) or has agents pointing uniquely to it as their primary server?

    The answer is simple, but not very pleasant: you get ORPHANED objects, which will linger in the database but you won’t be able to “see” or re-assign anymore from the GUI.

    So the first thing I want to share is a query to determine IF you have any of those orphaned agents. Or even if you know, since you are not able to "see" them from the console, you might have to dig their name out of the database. Here's a query I got from a colleague in our reactive support team:


    -- Check for orphaned health services (e.g. agent).
    declare @DiscoverySourceId uniqueidentifier;
    SET @DiscoverySourceId = dbo.fn_DiscoverySourceId_User();
    SELECT TME.[TypedManagedEntityid], HS.PrincipalName
    FROM MTV_HealthService HS
    INNER JOIN dbo.[BaseManagedEntity] BHS WITH(nolock)
        ON BHS.[BaseManagedEntityId] = HS.[BaseManagedEntityId]
    -- get host managed computer instances
    INNER JOIN dbo.[TypedManagedEntity] TME WITH(nolock)
        ON TME.[BaseManagedEntityId] = BHS.[TopLevelHostEntityId]
        AND TME.[IsDeleted] = 0
    INNER JOIN dbo.[DerivedManagedTypes] DMT WITH(nolock)
        ON DMT.[DerivedTypeId] = TME.[ManagedTypeId]
    INNER JOIN dbo.[ManagedType] BT WITH(nolock)
        ON DMT.[BaseTypeId] = BT.[ManagedTypeId]
        AND BT.[TypeName] = N'Microsoft.Windows.Computer'
    -- only with missing primary
    LEFT OUTER JOIN dbo.Relationship HSC WITH(nolock)
        ON HSC.[SourceEntityId] = HS.[BaseManagedEntityId]
        AND HSC.[RelationshipTypeId] = dbo.fn_RelationshipTypeId_HealthServiceCommunication()
        AND HSC.[IsDeleted] = 0
    INNER JOIN DiscoverySourceToTypedManagedEntity DSTME WITH(nolock)
        ON DSTME.[TypedManagedEntityId] = TME.[TypedManagedEntityId]
        AND DSTME.[DiscoverySourceId] = @DiscoverySourceId
    WHERE HS.[IsAgent] = 1
    AND HSC.[RelationshipId] IS NULL;

    Once you have identified the agent you need to re-assign to a new management server, this is doable from the SDK. Below is a powershell script I wrote which will re-assign it to the RMS. It has to run from within the OpsMgr Command Shell.
    You still need to change the logic which chooses which agent - this is meant as a starting base... you could easily expand it into accepting parameters and/or consuming an input text file, or using a different Management Server than the RMS... you get the point.

    1. $mg = (get-managementgroupconnection).managementgroup  
    2. $mrc = Get-RelationshipClass | where {$_.name –like "*Microsoft.SystemCenter.HealthServiceCommunication*"}  
    3. $cmro = new-object Microsoft.EnterpriseManagement.Monitoring.CustomMonitoringRelationshipObject($mrc)  
    4. $rms = (get-rootmanagementserver).HostedHealthService  
    5.  
    6. $deviceclass = $mg.getmonitoringclass(“HealthService”)  
    7. $mc = Get-connector | where {$_.Name –like “*MOM Internal Connector*”}  
    8.    
    9. Foreach ($obj in $mg.GetMonitoringObjects($deviceclass))  
    10. {  
    11.     #the next line should be changed to pick the right agent to re-assign  
    12.     if ($obj.DisplayName -match 'dsxlab')  
    13.     {  
    14.                 Write-host $obj.displayname  
    15.                 $imdd = new-object Microsoft.EnterpriseManagement.ConnectorFramework.IncrementalMonitoringDiscoveryData  
    16.                 $cmro.SetSource($obj)  
    17.                 $cmro.SetTarget($rms)  
    18.                 $imdd.Add($cmro)  
    19.                 $imdd.Commit($mc)  
    20.     }  

     

    Similarly, you might get orphaned network devices. The script below is used to re-assign all Network Devices to the RMS. This script is actually something I have had even before the other one (yes, it has been sitting in my "digital drawer" for a couple of years or more...) and uses the same concept - only you might notice that the relation's source and target are "reversed", since the relationships are different:

    • the Management Server (source) "manages" the Network Device (target)
    • the Agent (source) "talks" to the Management Server (target)

    With a bit of added logic it should be easy to have it work for specific devices.

    1. $mg = (get-managementgroupconnection).managementgroup  
    2.  
    3. $mrc = Get-RelationshipClass | where {$_.name –like "*Microsoft.SystemCenter.HealthServiceShouldManageEntity*"}  
    4.  
    5. $cmro = new-object Microsoft.EnterpriseManagement.Monitoring.CustomMonitoringRelationshipObject($mrc)  
    6. $rms = (get-rootmanagementserver).HostedHealthService  
    7.  
    8. $deviceclass = $mg.getmonitoringclass(“NetworkDevice”)  
    9.  
    10. Foreach ($obj in $mg.GetMonitoringObjects($deviceclass))  
    11.  
    12. {  
    13.                 Write-host $obj.displayname  
    14.                 $imdd = new-object Microsoft.EnterpriseManagement.ConnectorFramework.IncrementalMonitoringDiscoveryData  
    15.                 $cmro.SetSource($rms)  
    16.                 $cmro.SetTarget($obj)  
    17.                 $imdd.Add($cmro)  
    18.  
    19.                 $mc = Get-connector | where {$_.Name –like “*MOM Internal Connector*”}  
    20.  
    21.                 $imdd.Commit($mc)  

     

    Disclaimer

    The information in this weblog is provided "AS IS" with no warranties, and confers no rights. This weblog does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my own personal opinion. All code samples are provided "AS IS" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular purpose.

  • musc@> $daniele.work.ToString()

    Microsoft.Linux.RHEL.5.LogicalDisk.DiskBytesPerSecond Type Mismatch

    • 0 Comments

    I have had the following in my notes for a while… and I have not blogged in a while (been too busy) so I decided to blog it today, before the topic gets too old and starts stinking Smile

     

    It all started when a customer showed me an Alert he was seeing in his environment from some XPlat workflow. The alert looks like the following:

    Generic Performance Mapper Module Failed Execution
    Alert Description Source: RLWSCOM02.domain.dom
    Module was unable to convert parameter to a double value
    Original parameter: '$Data///*[local-name()="BytesPerSecond"]$'
    Parameter after $Data replacement: ''
    Error: 0x80020005
    Details: Type mismatch.
    One or more workflows were affected by this.
    Workflow name: Microsoft.Linux.RHEL.5.LogicalDisk.DiskBytesPerSecond.Collection
    Instance name: /
    Instance ID: {4F6FA8F5-C56F-4C9B-ED36-12DAFF4073D1}
    Management group: DataCenter
    Path: RLWSCOM02.domain.dom\RLWSCOM02.domain.dom Alert Rule: Generic Performance Mapper Module Runtime Failure Created: 6/28/2010 11:30:28 PM

     

    First I stumbled into this forum post which mentions he same symptom http://social.technet.microsoft.com/Forums/en-US/crossplatformgeneral/thread/62e0bf3e-be6f-4218-a37b-f1e66f02aa49 - but when looking at the resolution, the locale on the customer machine was good (== set to US settings), so I concluded that it was not the same root cause.

     

    Then I looked at what that rule was supposed to do, and queried the same CIM class both remotely thru WS-Man and locally via CIM, and concluded that my issue was that certain values were returning as NULL while we were expecting to see a number on the Management Server – therefore the Type Mismatch!

    I have explained previously how to run CIM queries against the XPlat agent; in this case it was the following one:

    winrm enumerate http://schemas.microsoft.com/wbem/wscim/1/cim-schema/2/SCX_FileSystemStatisticalInformation?__cimnamespace=root/scx -username:scomuser -password:password -r:https://rllspago01.domain.dom:1270/wsman -auth:basic –skipCACheck -skipCNCheck

     

    SCX_FileSystemStatisticalInformation

    AverageDiskQueueLength = null

    AverageTransferTime = null

    BytesPerSecond = null

    Caption = File system information

    Description = Performance statistics related to a logical unit of secondary storage

    ElementName = null

    FreeMegabytes = 4007

    IsAggregate = false

    IsOnline = true

    Name = /

    PercentBusyTime = null

    PercentFreeSpace = 55

    PercentIdleTime = null

    PercentUsedSpace = 45

    ReadBytesPerSecond = null

    ReadsPerSecond = null

    TransfersPerSecond = null

    UsedMegabytes = 3278

    WriteBytesPerSecond = null

    WritesPerSecond = null

     

    See the NULLs ? Those are our issue.

    Now, before you continue reading, I will tell you that I have investigated this also internally, and apparently we have just (in Cumulative Update 3) changed this behaviour in our XPlat modules, so that when NULL is returned, we consider it to be ZERO. Good or bad that is, it will at least take care of the error. But if you don’t get any data from the Unix system… well, you are not getting any data – so that might cause a surprise later on when you go and look at those charts and expect to see your disk “performance counters” but in fact all you have is a bunch of ZERO’s (how very interesting!). So, basically, the fix in CU3 suppresses the symptom, but does not address the cause.

    So, let’s see what is actually causing this, as you might well want to get those statistics, or probably you would not be monitoring that server!

    I looked at the Cimd.log (set to verbose) only says the following (basically not much: is getting info for 3 partitions… and the provider code is working)

    2010-09-01T08:38:32,796Z Trace      [scx.core.providers.diskprovider:5964:3086830480] BaseProvider::EnumInstances()

    2010-09-01T08:38:33,359Z Trace      [scx.core.providers.diskprovider:5964:3086830480] Object Path = //rllspago01.domain.dom/root/scx:SCX_FileSystemStatisticalInformation

    2010-09-01T08:38:33,359Z Trace      [scx.core.providers.diskprovider:5964:3086830480] BaseProvider::EnumInstances() - Calling DoEnumInstances()

    2010-09-01T08:38:33,359Z Trace      [scx.core.providers.diskprovider:5964:3086830480] DiskProvider DoEnumInstances

    2010-09-01T08:38:33,359Z Trace      [scx.core.providers.diskprovider:5964:3086830480] DiskProvider GetDiskEnumeration - type 3

    2010-09-01T08:38:33,360Z Trace      [scx.core.providers.diskprovider:5964:3086830480] BaseProvider::EnumInstances() - DoEnumInstances() returned - 3

    2010-09-01T08:38:33,360Z Trace      [scx.core.providers.diskprovider:5964:3086830480] BaseProvider::EnumInstances() - Call ReturnDone

    2010-09-01T08:38:33,360Z Trace      [scx.core.providers.diskprovider:5964:3086830480] BaseProvider::EnumInstances() - return OK

    2010-09-01T08:38:33,360Z Trace      [scx.core.provsup.cmpibase.singleprovider.DiskProvider:5964:3086830480] SingleProvider::EnumInstances() - Returning - 0

     

    but it still did not give me an idea as to why we would not get data for those “counters”. A this point I stopped using complex troubleshooting techniques and simply turned intuition on, and tried with some help from a search engine: http://www.bing.com/search?q=How+do+I+find+out+Linux+Disk+utilization 

    the results I got all mentioned that on Linux you would use the “iostat” command.

    So I tried to use and… lol and behold: the iostat commend was NOT INSTALLED on that machine!

    Guess what? We installed it (it is included in the “sysstat” package for RedHat linux, so a simple “yum install sysstat” took care of this) and the counters started working!

    Hope that is useful to some.

  • musc@> $daniele.work.ToString()

    Microsoft Way

    • 0 Comments

    Microsoft Way 

    In the last couple of weeks we have been driving thru America from the east coast (New York) to the west coast (Seattle).

    I figured out I needed to show my family the Microsoft campus too. Of course they know I work at Microsoft... but having only seen the office of a subsidiary - the one in Rome, with about 250 people at its max - might not have given them (especially the kids) an idea of the actual size of the company.

  • musc@> $daniele.work.ToString()

    OpsMgr Event IDs Spreadsheet

    • 4 Comments

    I work in support (mostly with System Center Operations Manager, as you know), and I work with event logs every day. The following are typical situations:

    1. I get a colleague or a customer telling me “I am having a problem and the SCOM agent is showing 21037 events and 20002 events.  What’s wrong with it?”   
    2. I want to tune an OpsMgr environment and reduce load on the database by turning off a few event collections, as my friend Kevin Holman suggests here http://blogs.technet.com/kevinholman/archive/2009/11/25/tuning-tip-turning-off-some-over-collection-of-events.aspx .
    3. I am analyzing, sorting and grouping Events with Powershell like I have written on my blog lately http://www.muscetta.com/2009/12/16/opsmgr-eventlog-analysis-with-powershell/ but I can’t read those long descriptions properly.
    4. I exported an EVT from a customer environment and I load it on a machine that does not have OpsMgr message DLLs installed – all I see are EventIDs and type (Warning, Error) – but no real description – and I still want to figure out what those events are trying to tell me.

    Getting to the point: I, like everyone – don’t have every OpsMgr event memorized.

    This is why I thought of building this spreadsheet, and I hope it might come in handy to more people.

    The spreadsheet contains an “AllEvents” list – and then the same events are broken down by event source as well:

    clip_image002

    When you want to search for an events (in one of the situations described above) just open up the spreadsheet, go to the “AllEvents” tab, hit CTRL+F (“Find”) and type in the Event ID you are searching for:

    clip_image004

    And this will take you to the row containing the event, so you can look up its description:

    clip_image006

    The description shows the event standard text (which is in the message DLL, therefore is the part you will not see if opening an EVT on another machine that does not have OpsMgr installed), and where the event parameters are (%1, %2, etc – which will be the strings you see in the EVT anyway).

    That way you can get an understanding of what the original message would have looked like on the original machine.

    This is just one possible usage pattern of this reference. It can also be useful to just read/study the events, learning about new ones you have never encountered, or remembering those you HAVE seen in the past but did not quite remember. And of course you can also find other creative ways to use it.

    You can get it from here.

     

    A few last words to give due credit: this spreadsheet has been compiled by using Eventlog Explorer (http://blogs.technet.com/momteam/archive/2008/04/02/eventlog-explorer.aspx ) to extract the event information out of the message DLLs on a OpsMgr2007 R2 installation. That info has been then copied and pasted in Excel in order to have an “offline” reference. Also I would like to thank Kevin Holman for pointing me to Eventlog Explorer first, and then for insisting I should not keep this spreadsheet in my drawer, as it could be useful to more people!

  • musc@> $daniele.work.ToString()

    How to convert (and fixup) the RedHat RPM to run on Debian/Ubuntu

    • 0 Comments

    In an earlier post I had shown how I got the Xplat agent running on Ubuntu. I perfected the technique over time, and what follows is a step-by-step process on how to convert and change the RedHat package to run on Debian/Ubuntu. Of course this is still a hack… but some people asked me to detail it a bit more. At the same time, the cross platform team is working to update the the source code on codeplex with extra bits that will make more straightforward to grab it, modify it and re-compile it than it is today. Until then, here is how I got it to work.

    I assume you have already copied the right .RPM package off the OpsMgr server’s /AgentManagement directory to the Linux box here. The examples below refer to the 32bit package, but of course the same identical technique would work for the 64bit version.

    We start by converting the RPM package to DEB format:

    root# alien -k scx-1.0.4-258.rhel.5.x86.rpm --scripts

    scx_1.0.4-258_i386.deb generated

     

    Then we need to create a folder where we will extract the content of the package, modify stuff, and repackage it:

    root# mkdir scx_1.0.4-258_i386

    root# cd scx_1.0.4-258_i386

    root# ar -x ../scx_1.0.4-258_i386.deb

    root# mkdir debian

    root# cd debian

    root# mkdir DEBIAN

    root# cd DEBIAN

    root# cd ../..

    root# rm debian-binary

    root# mv control.tar.gz debian/DEBIAN/

    root# mv data.tar.gz debian/

    root# cd debian

    root# tar -xvzf data.tar.gz

    root# rm data.tar.gz

    root# cd DEBIAN/

    root# tar -xvzf control.tar.gz

    root# rm control.tar.gz

    Now we have the “skeleton” of the package easily laid out on the filesystem and we are ready to modify the package and add/change stuff to and in it.

     

    First, we need to add some stuff to it, which is expected to be found on a redhat distro, but is not present in debian. In particular:

    1. You should copy the file “functions” (that you can get from a redhat/centos box under /etc/init.d) under the debian/etc/init.d folder in our package folder. This file is required/included by our startup scripts, so it needs to be deployed too.

    Then we need to chang some of the packacge behavior by editing files under debian/DEBIAN:

    2. edit the “control” file (a file describing what the package is, and does):

    'control' file

    3. edit the “preinst” file (pre-installation instructions): we need to add instructions to copy the “issue” file onto “redhat-release” (as the SCX_OperatingSystem class will look into that file, and this is hard-coded in the binary, we need to let it find it):

    'preinst' file

    these are the actual command lines to add for both packages (DEBIAN or UBUNTU):

    # symbolic links for libaries called differently on Ubuntu and Debian vs. RedHat

    ln -s /usr/lib/libcrypto.so.0.9.8 /usr/lib/libcrypto.so.6

    ln -s /usr/lib/libssl.so.0.9.8 /usr/lib/libssl.so.6

    the following bit would be Ubuntu-specific:

    #we need this file for the OS provider relies on it, so we convert what we have in /etc/issue

    #this is ok for Ubuntu (“Ubuntu 9.0.4 \n \l” becomes “Ubuntu 9.0.4”)

    cat /etc/issue | awk '/\\n/ {print $1, $2}' > /etc/redhat-release

    while the following bit is Debian-specific:

    #this is ok for Debian (“Debian GNU/Linux 5.0 \n \l” becomes “Debian GNU/Linux 5.0”)

    cat /etc/issue | awk '/\\n/ {print $1, $2, $3}' > /etc/redhat-release

     

    4. Then we edit/modify the “postinst” file (post-installation instructions) as follows:

    a. remove the 2nd and 3rd lines which look like the following

    RPM_INSTALL_PREFIX=

    export RPM_INSTALL_PREFIX

    as they are only useful for the RPM system, not DEB/APT, so we don’t need them.

    b. change the following 2 functions which contain RedHat-specific commands:

    configure_pegasus_service() {

               /usr/lib/lsb/install_initd /etc/init.d/scx-cimd

    }

    start_pegasus_service() {

               service scx-cimd start

    }

    c. We need to change in the Debian equivalents for registering a service in INIT and starting it:

    configure_pegasus_service() {

                   update-rc.d scx-cimd defaults

    }

    start_pegasus_service() {

                  /etc/init.d/scx-cimd start

    }

    5. Modify the “prerm” file (pre-removal instructions):

    a. Just like “postinst”, remove the lines

    RPM_INSTALL_PREFIX=

    export RPM_INSTALL_PREFIX

    b. Locate the two functions stopping and un-installing the service

    stop_pegasus_service() {

             service scx-cimd stop

    }

    unregister_pegasus_service() {

              /usr/lib/lsb/remove_initd /etc/init.d/scx-cimd

    }

    c. Change those two functions with the Debian-equivalent command lines

    stop_pegasus_service() {

               /etc/init.d/scx-cimd stop

    }

    unregister_pegasus_service() {

               update-rc.d -f scx-cimd remove

    }

    At this point the change we needed have been put in place, and we can re-build the DEB package.

    Move yourself in the main folder of the application (the scx_1.0.4-258_i386 folder):

    root# cd ../..

    Create the package starting from the folders

    root# dpkg-deb --build debian

    dpkg-deb: building package `scx' in `debian.deb'.

    Rename the package (for Ubuntu)

    root# mv debian.deb scx_1.0.4-258_Ubuntu_9_i386.deb

    Rename the package (for Debian)

    root# mv debian.deb scx_1.0.4-258_Debian_5_i386.deb

    Install it

    root# dpkg -i scx_1.0.4-258_Platform_Version_i386.deb

    All done! It should install and work!

     

    Next step would be creating a Management Pack to monitor Debian and Ubuntu. It is pretty similar to what Robert Hearn has described step by step for CentOS, but with some different replacements of strings, as you can imagine. I have done this but have not written down the procedure yet, so I will post another article on how to do this as soon as I manage to get it standardized and reliable. There is a bit more work involved for Ubuntu/Debian… as some of the daemons/services have different names, and certain files too… but nothing terribly difficult to change so you might want to try it already and have a go at it!

    In the meantime, as a teaser, here’s my server’s (http://www.muscetta.com) performance, being monitored with this “hack”:

    OpsMgr monitoring Debian

     

    Disclaimer

    The information in this weblog is provided "AS IS" with no warranties, and confers no rights. This weblog does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my own personal opinion. All code samples are provided "AS IS" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular purpose.
    THIS WORK IS NOT ENDORSED AND NOT EVEN CHECKED, AUTHORIZED, SCRUTINIZED NOR APPROVED BY MY EMPLOYER, AND IT ONLY REPRESENT SOMETHING WHICH I'VE DONE IN MY FREE TIME. NO GUARANTEE WHATSOEVER IS GIVEN ON THIS. THE AUTHOR SHALL NOT BE MADE RESPONSIBLE FOR ANY DAMAGE YOU MIGHT INCUR WHEN USING THIS INFORMATION. The solution presented here IS NOT SUPPORTED by Microsoft.

  • musc@> $daniele.work.ToString()

    Audit Collection Services Database Partitions Size Report

    • 0 Comments

    A number of people I have talked to liked my previous post on ACS sizing. One thing that was not extremely easy or clear to them in that post was *how* exactly I did one thing I wrote:

    […] use the dtEvent_GUID table to get the number of events for that day, and use the stored procedure “sp_spaceused”  against that same table to get an overall idea of how much space that day is taking in the database […]

    To be completely honest, I do not expect people to do this manually a hundred times if they have a hundred partitions. In fact, I have been doing this for a while with a script which will do the looping for me and run that sp_spaceused for me a number of time. I cannot share that script, but I do realize that this automation is very useful, therefore I wrote a “stand-alone” SQL query which, using a couple of temporary tables, produces a similar type of output. I also went a step further and packaged it into a SQL Server Reporting Services Report for everyone’s consumption. The report should look like the following screenshot, featuring a chart and the table with the numerical information about each and every partition in the database:

    ACS Partitions Report

    You can download the report from here.

    You need to upload it to your report server, and change the data source to the shared Data Source that also the built-in ACS Reports use, and it should work.

    Enjoy!

  • musc@> $daniele.work.ToString()

    A few thoughts on sizing Audit Collection System

    • 0 Comments

    People were already collecting logs with MOM, so why not the security log? Some people were doing that, but it did not scale enough; for this reason, a few years ago Eric Fitzgerald announced that he was working on Microsoft Audit Collection System. Anyhow, the tool as it was had no interface… and the rest is history: it has been integrated into System Center Operations Manager. Anyhow, ACS remains a lesser-known component of OpsMgr.

    There are a number of resources on the web that is worth mentioning and linking to:

    and, of course, many more, I cannot link them all.

    As for myself, I have been playing with ACS since those early beta days (before I joined Microsoft and before going back to MOM, when I was working in Security), but I never really blogged about this piece.

    Since I have been doing quite a lot of work around ACS lately, again, I thought it might be worth consolidating some thoughts about it, hence this post.

    Anatomy of an “Online” Sizing Calculation

    What I would like to explain here is the strategy and process I go thru when analyzing the data stored in a ACS database, in order to determine a filtering strategy: what to keep and what not to keep, by applying a filter on the ACS Collector.

    So, the first thing I usually start with is using one of the many “ACS sizer” Excel spreadsheets around… which usually tell you that you need more space than it really is necessary… basically giving you a “worst case” scenario. I don’t know how some people can actually do this from a purely theoretical point of view, but I usually prefer a bottom up approach: I look at the actual data that the ACS is collecting without filters, and start from there for a better/more accurate sizing.

    In the case of a new install this is easy – you just turn ACS on, set the retention to a few days (one or two weeks maximum), give the DB plenty of space to make sure it will make it, add all your forwarders… sit back and wait.

    Then you come back 2 weeks later and start looking at the data that has been collected.

    What/How much data are we collecting?

    First of all, if we have not changed the default settings, the grooming and partitioning algorithm will create new partitioned tables every day. So my first step is to see how big each “partition” is.

    But… what is a partition, anyway? A partition is a set of 4 tables joint together:

    1. dtEvent_GUID
    2. dtEventData_GUID
    3. dtPrincipal_GUID
    4. dtSTrings_GUID

    where GUID is a new GUID every day, and of course the 4 tables that make up a daily partition will have the same GUID.

    The dtPartition table contains a list of all partitions and their GUIDs, together with their start and closing time.

    Just to get a rough estimate we can ignore the space used by the last three tables – which are usually very small – and only use the dtEvent_GUID table to get the number of events for that day, and use the stored procedure “sp_spaceused”  against that same table to get an overall idea of how much space that day is taking in the database.

    By following this process, I come up with something like the following:

    Partition ID Status Partition Start Time Partition Close Time Rows Reserved  KB Total GB
    9b45a567_c848_4a32_9c35_39b402ea0ee2 0 2/1/2010 2:00 2/1/2010 2:00 29,749,366 7,663,488 7,484
    8d8c8ee1_4c5c_4dea_b6df_82233c52e346 2 1/31/2010 2:00 2/1/2010 2:00 28,067,438 9,076,904 8,864
    34ce995b_689b_46ae_b9d3_c644cfb66e01 2 1/30/2010 2:00 1/31/2010 2:00 30,485,110 9,857,896 9,627
    bb7ea5d3_f751_473a_a835_1d1d42683039 2 1/29/2010 2:00 1/30/2010 2:00 48,464,952 15,670,792 15,304
    ee262692_beae_4d81_8079_470a54567946 2 1/28/2010 2:00 1/29/2010 2:00 48,980,178 15,836,416 15,465
    7984b5b8_ddea_4e9c_9e51_0ee7a413b4c9 2 1/27/2010 2:00 1/28/2010 2:00 51,295,777 16,585,408 16,197
    d93b9f0e_2ec3_4f61_b5e0_b600bbe173d2 2 1/26/2010 2:00 1/27/2010 2:00 53,385,239 17,262,232 16,858
    8ce1b69a_7839_4a05_8785_29fd6bfeda5f 2 1/25/2010 2:00 1/26/2010 2:00 55,997,546 18,105,840 17,681
    19aeb336_252d_4099_9a55_81895bfe5860 2 1/24/2010 2:00 1/24/2010 2:00 28,525,304 7,345,120 7,173
    1cf70e01_3465_44dc_9d5c_4f3700dc408a 2 1/23/2010 2:00 1/23/2010 2:00 26,046,092 6,673,472 6,517
    f5ec207f_158c_47a8_b15f_8aab177a6305 2 1/22/2010 2:00 1/22/2010 2:00 47,818,322 12,302,208 12,014
    b48dabe6_a483_4c60_bb4d_93b7d3549b3e 2 1/21/2010 2:00 1/21/2010 2:00 55,060,150 14,155,392 13,824
    efe66c10_0cf2_4327_adbf_bebb97551c93 2 1/20/2010 2:00 1/20/2010 2:00 58,322,217 15,029,216 14,677
    0231463e_8d50_4a42_a834_baf55e6b4dcd 2 1/19/2010 2:00 1/19/2010 2:00 61,257,393 15,741,248 15,372
    510acc08_dc59_482e_a353_bfae1f85e648 2 1/18/2010 2:00 1/18/2010 2:00 64,579,122 16,612,512 16,223

    If you have just installed ACS and let it run without filters with your agents for a couple of weeks, you should get some numbers like those above for your “couple of weeks” of analysis. If you graph your numbers in Excel (both size and number of rows/events per day) you should get some similar lines that show a pattern or trend:

    Trend: Space user by day

    Trend: Number of events by day

    So, in my example above, we can clearly observe a “weekly” pattern (monday-to-friday being busier than the weekend) and we can see that – for that environment – the biggest partition is roughly 17GB. If we round this up to 20GB – and also considering the weekends are much quieter – we can forecast 20*7 = 140GB per week. This has an excess “buffer” which will let the system survive event storms, should they happen. We also always recommend having some free space to allow for re-indexing operations.

    In fact, especially when collecting everything without filters, the daily size is a lot less predictable: imagine worms “trying out” administrator account’s passwords, and so on… those things can easily create event storms.

    Anyway, in the example above, the customer would have liked to keep 6 MONTHS (180days) of data online, which would become 20*180 = 3600GB = THREE TERABYTE and a HALF! Therefore we need a filtering strategy – and badly – to reduce this size.

    [edited on May 7th 2010 - if you want to automate the above analysis and produce a table and graphs like those just shown, you should look at my following post.]

    Filtering Strategies

    Ok, then we need to look at WHAT actually comprises that amount of events we are collecting without filters. As I wrote above, I usually run queries to get this type of information.

    I will not get into HOW TO write a filter here – a collector’s filter is a WMI notification query and it is already described pretty well elsewhere how to configure it.

    Here, instead, I want to walk thru the process and the queries I use to understand where the noise comes from and what could be filtered – and get an estimate of how much space we could be saving if filter one way or another.

    Number of Events per User

    --event count by User (with Percentages)
    declare @total float
    select @total = count(HeaderUser) from AdtServer.dvHeader
    select count(HeaderUser),HeaderUser, cast(convert(float,(count(HeaderUser)) / (convert(float,@total)) * 100) as decimal(10,2))
    from AdtServer.dvHeader
    group by HeaderUser
    order by count(HeaderUser) desc

    In our example above, over the 14 days we were observing, we obtained percentages like the following ones:

    #evt HeaderUser Account Percent
    204,904,332 SYSTEM 40.79 %
    18,811,139 LOCAL SERVICE 3.74 %
    14,883,946 ANONYMOUS LOGON 2.96 %
    10,536,317 appintrauser 2.09 %
    5,590,434 mossfarmusr

    Just by looking at this, it is pretty clear that filtering out events tracked by the accounts “SYSTEM”, “LOCAL SERVICE” and “ANONYMOUS”, we would save over 45% of the disk space!

    Number of Events by EventID

    Similarly, we can look at how different Event IDs have different weights on the total amount of events tracked in the database:

    --event count by ID (with Percentages)
    declare @total float
    select @total = count(EventId) from AdtServer.dvHeader
    select count(EventId),EventId, cast(convert(float,(count(EventId)) / (convert(float,@total)) * 100) as decimal(10,2))
    from AdtServer.dvHeader
    group by EventId
    order by count(EventId) desc

    We would get some similar information here:

    Event ID Meaning Sum of events Percent
    538 A user logged off 99,494,648 27.63
    540 Successful Network Logon 97,819,640 27.16
    672 Authentication Ticket Request 52,281,129 14.52
    680 Account Used for Logon by (Windows 2000) 35,141,235 9.76
    576 Specified privileges were added to a user's access token. 26,154,761 7.26
    8086 Custom Application ID 18,789,599 5.21
    673 Service Ticket Request 10,641,090 2.95
    675 Pre-Authentication Failed 7,890,823 2.19
    552 Logon attempt using explicit credentials 4,143,741 1.15
    539 Logon Failure - Account locked out 2,383,809 0.66
    528 Successful Logon 1,764,697 0.49

    Also, do not forget that ACS provides some report to do this type of analysis out of the box, even if for my experience they are generally slower – on large datasets – than the queries provided here. Also, a number of reports have been buggy over time, so I just prefer to run queries and be on the safe side.

    Below an example of such report (even if run against a different environment – just in case you were wondering why the numbers were not the same ones :-)):Event Counts ACS Default Report

    The numbers and percentages we got from the two queries above should already point us in the right direction about what we might want to adjust in either our auditing policy directly on Windows and/or decide if there is something we want to filter out at the collector level (here you should ask yourself the question: “if they aren’t worth collecting are they worth generating?” – but I digress).

    Also, a permutation of the above two queries should let you see which user is generating the most “noise” in regards to some events and not other ones… for example:

    --event distribution for a specific user (change the @user) - with percentages for the user and compared with the total #events in the DB
    declare @user varchar(255)
    set @user = 'SYSTEM'
    declare @total float
    select @total = count(Id) from AdtServer.dvHeader
    declare @totalforuser float
    select @totalforuser = count(Id) from AdtServer.dvHeader where HeaderUser = @user
    select count(Id), EventID, cast(convert(float,(count(Id)) / convert(float,@totalforuser) * 100) as decimal(10,2)) as PercentageForUser, cast(convert(float,(count(Id)) / (convert(float,@total)) * 100) as decimal(10,2)) as PercentageTotal
    from AdtServer.dvHeader
    where HeaderUser = @user
    group by EventID
    order by count(Id) desc

    The above is particularly important, as we might want to filter out a number of events for the SYSTEM account (i.e. logons that occur when starting and stopping services) but we might want to keep other events that are tracked by the SYSTEM account too, such as an administrator having wiped the Security Log clean – which might be something you want to keep:

    Event ID 517 Audit Log was cleared

    of course the amount of EventIDs 517 over the total of events tracked by the SYSTEM account will not be as many, and we can still filter the other ones out.

    Number of Events by EventID and by User

    We could also combine the two approaches above – by EventID and by User:

    select count(Id),HeaderUser, EventId

    from AdtServer.dvHeader

    group by HeaderUser, EventId

    order by count(Id) desc

    This will produce a table like the following one

    SQL Query: Events by EventID and by User

    which can be easily copied/pasted into Excel in order to produce a pivot Table:

    Pivot Table

    Cluster EventLog Replication

    One more aspect that is less widely known, but I think is worth showing, is the way that clusters behave when in ACS. I don’t mean all clusters… but if you keep the “eventlog replication” feature of clusters enabled (you should disable it also from a monitoring perspective, but I digress), each cluster node’s security eventlog will have events not just for itself, but for all other nodes as well.

    Albeit I have not found a reliable way to filter out – other than disabling eventlog replication altogether.

    Anyway, just to get an idea of how much this type of “duplicate” events weights on the total, I use the following query, that tells you how many events for each machine are tracked by another machine:

    --to spot machines that are cluster nodes with eventlog repliation and write duplicate events (slow)

    select Count(Id) as Total,replace(right(AgentMachine, (len(AgentMachine) - patindex('%\%',AgentMachine))),'$','') as ForwarderMachine, EventMachine

    from AdtServer.dvHeader

    --where ForwarderMachine <> EventMachine

    group by EventMachine,replace(right(AgentMachine, (len(AgentMachine) - patindex('%\%',AgentMachine))),'$','')

    order by ForwarderMachine,EventMachine

    Cluster Events

    Those presented above are just some of the approaches I usually look into at first. Of course there are a number more. Here I am including the same queries already shown in action, plus a few more that can be useful in this process.

    I have even considered building a page with all these queries – a bit like those that Kevin is collecting for OpsMgr (we actually wrote some of them together when building the OpsMgr Health Check)… shall I move the below queries on such a page? I though I’d list them here and give some background on how I normally use them, to start off with.

    Some more Useful Queries

    --top event ids
    select count(EventId), EventId
    from AdtServer.dvHeader
    group by EventId
    order by count(EventId) desc

    --event count by ID (with Percentages)
    declare @total float
    select @total = count(EventId) from AdtServer.dvHeader
    select count(EventId),EventId, cast(convert(float,(count(EventId)) / (convert(float,@total)) * 100) as decimal(10,2))
    from AdtServer.dvHeader
    group by EventId
    order by count(EventId) desc

    --which machines have ever written event 538
    select distinct EventMachine, count(EventId) as total
    from AdtServer.dvHeader
    where EventID = 538
    group by EventMachine

    --machines
    select * from dtMachine

    --machines (more readable)
    select replace(right(Description, (len(Description) - patindex('%\%',Description))),'$','')
    from dtMachine

    --events by machine
    select count(EventMachine), EventMachine
    from AdtServer.dvHeader
    group by EventMachine

    --rows where EventMachine field not available (typically events written by ACS itself for chekpointing)
    select *
    from AdtServer.dvHeader
    where EventMachine = 'n/a'

    --event count by day
    select convert(varchar(20), CreationTime, 102) as Date, count(EventMachine) as total
    from AdtServer.dvHeader
    group by convert(varchar(20), CreationTime, 102)
    order by convert(varchar(20), CreationTime, 102)

    --event count by day and by machine
    select convert(varchar(20), CreationTime, 102) as Date, EventMachine, count(EventMachine) as total
    from AdtServer.dvHeader
    group by EventMachine, convert(varchar(20), CreationTime, 102)
    order by convert(varchar(20), CreationTime, 102)

    --event count by machine and by date (distinuishes between AgentMachine and EventMachine
    select convert(varchar(10),CreationTime,102),Count(Id),EventMachine,AgentMachine
    from AdtServer.dvHeader
    group by convert(varchar(10),CreationTime,102),EventMachine,AgentMachine
    order by convert(varchar(10),CreationTime,102) desc ,EventMachine

    --event count by User
    select count(Id),HeaderUser
    from AdtServer.dvHeader
    group by HeaderUser
    order by count(Id) desc

    --event count by User (with Percentages)
    declare @total float
    select @total = count(HeaderUser) from AdtServer.dvHeader
    select count(HeaderUser),HeaderUser, cast(convert(float,(count(HeaderUser)) / (convert(float,@total)) * 100) as decimal(10,2))
    from AdtServer.dvHeader
    group by HeaderUser
    order by count(HeaderUser) desc

    --event distribution for a specific user (change the @user) - with percentages for the user and compared with the total #events in the DB
    declare @user varchar(255)
    set @user = 'SYSTEM'
    declare @total float
    select @total = count(Id) from AdtServer.dvHeader
    declare @totalforuser float
    select @totalforuser = count(Id) from AdtServer.dvHeader where HeaderUser = @user
    select count(Id), EventID, cast(convert(float,(count(Id)) / convert(float,@totalforuser) * 100) as decimal(10,2)) as PercentageForUser, cast(convert(float,(count(Id)) / (convert(float,@total)) * 100) as decimal(10,2)) as PercentageTotal
    from AdtServer.dvHeader
    where HeaderUser = @user
    group by EventID
    order by count(Id) desc

    --to spot machines that write duplicate events (such as cluster nodes with eventlog replication enabled)
    select Count(Id),EventMachine,AgentMachine
    from AdtServer.dvHeader
    group by EventMachine,AgentMachine
    order by EventMachine

    --to spot machines that are cluster nodes with eventlog repliation and write duplicate events (better but slower)
    select Count(Id) as Total,replace(right(AgentMachine, (len(AgentMachine) - patindex('%\%',AgentMachine))),'$','') as ForwarderMachine, EventMachine
    from AdtServer.dvHeader
    --where ForwarderMachine <> EventMachine
    group by EventMachine,replace(right(AgentMachine, (len(AgentMachine) - patindex('%\%',AgentMachine))),'$','')
    order by ForwarderMachine,EventMachine

    --which user and from which machine is target of elevation (network service doing "runas" is a 552 event)
    select count(Id),EventMachine, TargetUser
    from AdtServer.dvHeader
    where HeaderUser = 'NETWORK SERVICE'
    and EventID = 552
    group by EventMachine, TargetUser
    order by count(Id) desc

    --by hour, minute and user
    --(change the timestamp)... this query is useful to search which users are active in a given time period...
    --helpful to spot "peaks" of activities such as password brute force attacks, or other activities limited in time.
    select datepart(hour,CreationTime) as Hours, datepart(minute,CreationTime) as Minutes, HeaderUser, count(Id) as total
    from AdtServer.dvHeader
    where CreationTime < '2010-02-22T16:00:00.000'
    and CreationTime > '2010-02-22T15:00:00.000'
    group by datepart(hour,CreationTime), datepart(minute,CreationTime),HeaderUser
    order by datepart(hour,CreationTime), datepart(minute,CreationTime),HeaderUser

  • musc@> $daniele.work.ToString()

    OpsMgr Eventlog analysis with Powershell

    • 0 Comments

    The following technique should already be understood by any powersheller. Here we focus on Operations Manager log entries, even if the data mining technique shows is entirely possibly – and encouraged :-) - with any other event log.

    Let’s start by getting our eventlog into a variable called $evt:

    PS  >> $evt = Get-Eventlog “Operations Manager”

    The above only works locally in POSH v1.

    In POSH v2 you can go remotely by using the “-computername” parameter:

    PS  >> $evt = Get-Eventlog “Operations Manager” –computername RMS.domain.com

    Anyhow, you can get to this remotely also in POSHv1 with this other more “dotNET-tish” syntax:

    PS >> $evt = (New-Object System.Diagnostics.Eventlog -ArgumentList "Operations Manager").get_Entries()

    you could even export this (or any of the above) to a CLIXML file:

    PS >> (New-Object System.Diagnostics.Eventlog -ArgumentList "Operations Manager").get_Entries() | export-clixml -path c:\evt\Evt-OpsMgr-RMS.MYDOMAIN.COM.xml

    and then you could reload your eventlog to another machine:

    PS  >> $evt = import-clixml c:\evt\Evt-OpsMgr-RMS.MYDOMAIN.COM.xml

    whatever way you used to populate your $evt  variable, be it from a “live” eventlog or by re-importing it from XML, you can then start analyzing it:

    PS  >> $evt | where {$_.Entrytype -match "Error"} | select EventId,Source,Message | group eventid

    Count Name                      Group
    ----- ----                      -----
    1510 4509                      {@{EventID=4509; Source=HealthService; Message=The constructor for the managed module type "Microsoft.EnterpriseManagement.Mom.DatabaseQueryModules.GroupCalculatio.
       15 20022                     {@{EventID=20022; Source=OpsMgr Connector; Message=The health service {7B0E947B-2055...
        3 26319                     {@{EventID=26319; Source=OpsMgr SDK Service; Message=An exception was thrown while p...
        1 4512                      {@{EventID=4512; Source=HealthService; Message=Converting data batch to XML failed w...

    the above is functionally identical to the following:

    PS  >> $evt | where {$_.Entrytype -eq 1} | select EventID,Source,Message | group eventid

    Count Name                      Group
    ----- ----                      -----
    1510 4509                      {@{EventID=4509; Source=HealthService; Message=The constructor for the managed modul...
       15 20022                     {@{EventID=20022; Source=OpsMgr Connector; Message=The health service {7B0E947B-2055...
        3 26319                     {@{EventID=26319; Source=OpsMgr SDK Service; Message=An exception was thrown while p...
        1 4512                      {@{EventID=4512; Source=HealthService; Message=Converting data batch to XML failed w...

    Note that Eventlog Entries’ type is an ENUM that has values of 0,1,2 – similarly to OpsMgr health states – but beware that their order is not the same, as shown in the following table:

    Code OpsMgr States Events EntryType
    0 Not Monitored Information
    1 Success Error
    2 Warning Warning
    3 Critical --

    Let’s now look at Information Events (Entrytype –eq 0)

    PS  >> $evt | where {$_.Entrytype -eq 0} | select EventID,Source,Message | group eventid

    Count Name                      Group
    ----- ----                      -----
    4135 2110                      {@{EventID=2110; Source=HealthService; Message=Health Service successfully transferr...
    1548 21025                     {@{EventID=21025; Source=OpsMgr Connector; Message=OpsMgr has received new configura...
    4644 7026                      {@{EventID=7026; Source=HealthService; Message=The Health Service successfully logge...
    1548 7023                      {@{EventID=7023; Source=HealthService; Message=The Health Service has downloaded sec...
    1548 7025                      {@{EventID=7025; Source=HealthService; Message=The Health Service has authorized all...
    1548 7024                      {@{EventID=7024; Source=HealthService; Message=The Health Service successfully logge...
    1548 7028                      {@{EventID=7028; Source=HealthService; Message=All RunAs accounts for management gro...
       16 20021                     {@{EventID=20021; Source=OpsMgr Connector; Message=The health service {7B0E947B-2055...
       13 7019                      {@{EventID=7019; Source=HealthService; Message=The Health Service has validated all ...
        4 4002                      {@{EventID=4002; Source=Health Service Script; Message=Microsoft.Windows.Server.Logi...

     

    And “Warning” events (Entrytype –eq 2):

    PS  >> $evt | where {$_.Entrytype -eq 2} | select EventID,Source,Message | group eventid

    Count Name                      Group
    ----- ----                      -----
    1511 1103                      {@{EventID=1103; Source=HealthService; Message=Summary: 1 rule(s)/monitor(s) failed ...
      501 20058                     {@{EventID=20058; Source=OpsMgr Connector; Message=The Root Connector has received b...
        5 29202                     {@{EventID=29202; Source=OpsMgr Config Service; Message=OpsMgr Config Service could ...
      421 31501                     {@{EventID=31501; Source=Health Service Modules; Message=No primary recipients were ...
       18 10103                     {@{EventID=10103; Source=Health Service Modules; Message=In PerfDataSource, could no...
        1 29105                     {@{EventID=29105; Source=OpsMgr Config Service; Message=The request for management p...

     

     

    Ok now let’s see those event 20022, for example… so we get an idea of which healthservices they are referring to (20022 indicates" “hearthbeat failure”, btw):

    PS  >> $evt | where {$_.eventid -eq 20022} | select message

    Message
    -------
    The health service {7B0E947B-2055-C12A-B6DB-DD6B311ADF39} running on host webapp3.domain1.mydomain.com and s...
    The health service {E3B3CCAA-E797-4F08-860F-47558B3DA477} running on host SERVER1.domain2.mydomain.com and serving...
    The health service {E3B3CCAA-E797-4F08-860F-47558B3DA477} running on host SERVER1.domain2.mydomain.com and serving...
    The health service {E3B3CCAA-E797-4F08-860F-47558B3DA477} running on host SERVER1.domain2.mydomain.com and serving...
    The health service {52E16F9C-EB1A-9FAF-5B9C-1AA9C8BC28E3} running on host DC4WK3.domain1.mydomain.com and se...
    The health service {F96CC9E6-2EC4-7E63-EE5A-FF9286031C50} running on host VWEBDL2.domain1.mydomain.com and s...
    The health service {71987EE0-909A-8465-C32D-05F315C301CC} running on host VDEVWEBPROBE2.domain2.mydomain.com....
    The health service {BAF6716E-54A7-DF68-ABCB-B1101EDB2506} running on host XP2SMS002.domain2.mydomain.com and serving mana...
    The health service {30C81387-D5E0-32D6-C3A3-C649F1CF66F1} running on host stgweb3.domain3.mydomain.com and...
    The health service {3DCDD330-BBBB-B8E8-4FED-EF163B27DE0A} running on host VWEBDL1.domain1.mydomain.com and s...
    The health service {13A47552-2693-E774-4F87-87DF68B2F0C0} running on host DC2.domain4.mydomain.com and ...
    The health service {920BF9A8-C315-3064-A5AA-A92AA270529C} running on host FSCLU2 and serving management group Pr...
    The health service {FAA3C2B5-C162-C742-786F-F3F8DC8CAC2F} running on host WEBAPP4.domain1.mydomain.com and s...
    The health service {3DCDD330-BBBB-B8E8-4FED-EF163B27DE0A} running on host WEBDL1.domain1.mydomain.com and s...
    The health service {3DCDD330-BBBB-B8E8-4FED-EF163B27DE0A} running on host WEBDL1.domain1.mydomain.com and s...

     

    or let’s look at some warning for the Config Service:

    PS  >> $evt | where {$_.Eventid -eq 29202}

       Index Time          EntryType   Source                 InstanceID Message
       ----- ----          ---------   ------                 ---------- -------
    5535065 Dec 07 21:18  Warning     OpsMgr Config Ser...   2147512850 OpsMgr Config Service could not retrieve a cons...
    5543960 Dec 09 16:39  Warning     OpsMgr Config Ser...   2147512850 OpsMgr Config Service could not retrieve a cons...
    5545536 Dec 10 01:06  Warning     OpsMgr Config Ser...   2147512850 OpsMgr Config Service could not retrieve a cons...
    5553119 Dec 11 08:24  Warning     OpsMgr Config Ser...   2147512850 OpsMgr Config Service could not retrieve a cons...
    5555677 Dec 11 10:34  Warning     OpsMgr Config Ser...   2147512850 OpsMgr Config Service could not retrieve a cons...

    Once seen those, can you remember of any particular load you had on those days that justifies the instance space changing so quickly that the Config Service couldn’t keep up?

     

    Or let’s group those events with ID 21025 by day, so we know how many Config recalculations we’ve had (which, if many, might indicate Config Churn):

    PS  >> $evt | where {$_.Eventid -eq 21025} | select TimeGenerated | % {$_.TimeGenerated.ToShortDateString()} | group

    Count Name                      Group
    ----- ----                      -----
       39 12/7/2009                 {12/7/2009, 12/7/2009, 12/7/2009, 12/7/2009...}
      203 12/8/2009                 {12/8/2009, 12/8/2009, 12/8/2009, 12/8/2009...}
      217 12/9/2009                 {12/9/2009, 12/9/2009, 12/9/2009, 12/9/2009...}
      278 12/10/2009                {12/10/2009, 12/10/2009, 12/10/2009, 12/10/2009...}
      259 12/11/2009                {12/11/2009, 12/11/2009, 12/11/2009, 12/11/2009...}
      224 12/12/2009                {12/12/2009, 12/12/2009, 12/12/2009, 12/12/2009...}
      237 12/13/2009                {12/13/2009, 12/13/2009, 12/13/2009, 12/13/2009...}
       91 12/14/2009                {12/14/2009, 12/14/2009, 12/14/2009, 12/14/2009...}

     

    Event ID 21025 shows that there is a new configuration for the Management Group.

    Event ID 29103 has a similar wording, but shows that there is a new configuration for a given Healthservice. These should normally be many more events, unless your only health Service is the RMS, which is unlikely…

    If we look at the event description (“message”) in search for the name (or even the GUID, as both are present) or our RMS, as follows, then they should be the same numbers of the 21025 above:

    PS  >> $evt | where {$_.Eventid -eq 29103} | where {$_.message -match "myrms.domain.com"} | select TimeGenerated | % {$_.TimeGenerated.ToShortDateString()} | group

    Count Name                      Group
    ----- ----                      -----
       39 12/7/2009                 {12/7/2009, 12/7/2009, 12/7/2009, 12/7/2009...}
      203 12/8/2009                 {12/8/2009, 12/8/2009, 12/8/2009, 12/8/2009...}
      217 12/9/2009                 {12/9/2009, 12/9/2009, 12/9/2009, 12/9/2009...}
      278 12/10/2009                {12/10/2009, 12/10/2009, 12/10/2009, 12/10/2009...}
      259 12/11/2009                {12/11/2009, 12/11/2009, 12/11/2009, 12/11/2009...}
      224 12/12/2009                {12/12/2009, 12/12/2009, 12/12/2009, 12/12/2009...}
      237 12/13/2009                {12/13/2009, 12/13/2009, 12/13/2009, 12/13/2009...}
       91 12/14/2009                {12/14/2009, 12/14/2009, 12/14/2009, 12/14/2009...}

     

    Going back to the initial counts of events by their IDs, when showing the errors the counts above had spotted the presence of a lonely 4512 event, which might have gone undetected if just browsing the eventlog with the GUI, since it only occurred once.

    Let’s take a look at it:

    PS  >> $evt | where {$_.eventid -eq 4512}

       Index Time          EntryType   Source                 InstanceID Message
       ----- ----          ---------   ------                 ---------- -------
    5560756 Dec 12 11:18  Error       HealthService          3221229984 Converting data batch to XML failed with error ...

    Now, when it is about counts, Powershell is great.  But sometimes Powershell makes it difficult to actually READ the (long) event messages (descriptions) in the console. For example, our event ID 4512 is difficult to read in its entirety and gets truncated with trailing dots…

    we can of course increase the window size and/or selecting only THAT one field to read it better:

    PS  >> $evt | where {$_.eventid -eq 4512} | select message

    Message
    -------
    Converting data batch to XML failed with error "Not enough storage is available to complete this operation." (0x8007000E) in rule "Microsoft.SystemCenter.ConfigurationService.CollectionRule.Event.ConfigurationChanged" running for instance "RMS.MYDOMAIN.COM" with id:"{04F4ADED-2C7F-92EF-D620-9AF9685F736F}" in management group "SCOMPROD"

    Or, worst case, if it still does not fit, we can still go and search for it in the actual, usual eventlog application… but at least we will have spotted it!

     

    The above wants to give you an idea of what is easily accomplished with some simple one-liners, and how it can be a useful aid in analyzing/digging into Eventlogs.

    All of the above is ALSO be possible with Logparser, and it would actually be even less heavy on memory usage and it will be quicker, to be honest!

    I just like Powershell syntax a lot more, and its ubiquity, which makes it a better option for me. Your mileage may vary, of course.

  • musc@> $daniele.work.ToString()

    Invoking Methods on the Xplat agent with WINRM

    • 0 Comments

    So I was testing other stuff tonight, to be honest, but I got pinged on Instant Messenger by my geek friend and colleague Stefan Stranger who pointed me at his request for help here http://friendfeed.com/sstranger/4571f39b/help-needed-on-winrs-or-winrm-and-openwsman-to

    He wanted to use WINRM or any other command line utility to interact with the Xplat agent, and call methods on the Unix machine from windows. This could be very useful to – for example – restart a service (in fact it is what the RECOVERY actions in the Xplat Management Packs do, btw).

    At first I told him I had only tested enumerations – such as on this other post http://www.muscetta.com/2009/06/01/using-the-scx-agent-with-wsman-from-powershell-v2/ … but the question intrigued me, so I check out the help for winrm’s INVOKE verb:

    clip_image002

    Which told me that you can pass in the parameters for the method to be called/invoked either as an hashtable @{KEY=”value”;KEY2=”value”}, or as an input XML file. I first tried the XML file but I could not get its format right.

    After a few more minutes of trying, I figured out the right syntax.

    This one works, for example:

    winrm invoke ExecuteCommand http://schemas.microsoft.com/wbem/wscim/1/cim-schema/2/SCX_OperatingSystem?__cimnamespace=root/scx @{command="ps";timeout="60"} -username:root -password:password -auth:basic -r:https://virtubuntu.huis.dom:1270/wsman -skipCACheck -encoding:UTF-8

    clip_image004

    Happy remote management of your unix systems from Windows :-)

  • musc@> $daniele.work.ToString()

    PS> Get-Milk

    • 0 Comments

    PS> Get-Milk

    I printed a tshirt for Sara with a baby-friendly Powershell cmdlet ("Get-Milk").
    She already seems to be wondering what script she can write with it.

    PS> Get-Milk

    PS> Get-Milk

  • musc@> $daniele.work.ToString()

    The mystery of the lost registry values

    • 0 Comments

    During the OpsMgr Health Check engagement we use custom code to assess the customer’s Management group, as I wrote here already. Given that the customer tells us which machine is the RMS, one of the very first things that we do in our tool is to connect to the RMS’s registry, and check the values under HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Setup to see which machine holds the database. It is a rather critical piece of information for us, as we run a number of queries afterward… so we need to know where the db is, obviously :-)

    I learned from here http://mybsinfo.blogspot.com/2007/01/powershell-remote-registry-and-you-part.html how to access registry remotely thru powershell, by using .Net classes. This is also one of the methods illustrated in this other article on Technet Script Center http://www.microsoft.com/technet/scriptcenter/resources/qanda/jan09/hey0105.mspx

    Therefore the “core” instructions of the function I was using to access the registry looked like the following

    1. Function GetValueFromRegistry ([string]$computername, $regkey, $value)   
    2. {  
    3.      $reg = [Microsoft.Win32.RegistryKey]::OpenRemoteBaseKey('LocalMachine', $computername)  
    4.      $regKey= $reg.OpenSubKey("$regKey")  
    5.      $result = $regkey.GetValue("$value")  
    6.      return $result 
    7. }  

     

    [Note: the actual function is bigger, and contains error handling, and logging, and a number of other things that are unnecessary here]

    Therefore, the function was called as follows:
    GetValueFromRegistry $RMS "SOFTWARE\\Microsoft\\Microsoft Operations Manager\\3.0\\Setup" "DatabaseServerName"
    Now so far so good.

    In theory.

    Now for some reason that I could not immediately explain, we had noticed that this piece of code performing registry accessm while working most of the times, only on SOME occasions was giving errors about not being able to open the registry value…

    image

    When you are onsite with a customer conducting an assessment, the PFE engineer does not always has the time to troubleshoot the error… as time is critical, we have usually resorted to just running the assessment from ANOTHER machine, and this “solved” the issue… but always left me wondering WHY this was giving an error. I had suspected an issue with permissions first, but it could not be as the permissions were obviously right: performing the assessment from another machine but with the same user was working!

    A few days ago my colleague and buddy Stefan Stranger figured out that this was related to the platform architecture:

    • X64 client to x64 RMS was working
    • X64 client to x86 RMS was working
    • X86 client to x86 RMS was working
    • X86 client to x64 RMS was NOT working

    You don’t need to use our custom code to reproduce this, REGEDIT shows the behavior as well.

    If, from a 64-bit server, you open a remote registry connection to 64-bit RMS server, you can see all OpsMgr registry keys:

    clip_image002

    If, anyhow, from a 32-bit server, you open a remote registry connection to 64-bit RMS server, you don’t see ALL – but only SOME - OpsMgr registry keys:
    clip_image004

    So here’s the reason! This is what was happening! How could I not think of this before? It was nothing related to permissions, but to registry redirection! The issue was happening because the 32 bit machine is using the 32bit registry editor and what it will do when accessing a 64bit machine will be to default to the Wow6432Node location in the registry. There all OpsMgr data won’t be in the WOW64 location on a 64bit machine, only some.

    So, just like regedit, the 32bit powershell and the 32bit .Net framework were being redirected to the 32bit-compatibility registry keys… not finding the stuff we needed, whereas a 64bit application could find that. Any 32bit application by default gets redirected to a 32bit-safe registry.

    So, after finally UNDERSTANDING what the issue was, I started wondering: ok... but how can I access the REAL “HLKM\SOFTWARE\Microsoft” key on a 64bit machine when running this FROM a 32bit machine – WITHOUT being redirected to “HKLM\SOFTWARE\Wow6432Node\Microsoft” ? What if my application CAN deal just fine with those values and actually NEEDs to access them?

    The answer wasn’t as easy as the question. I did a bit of digging on this, and still I have NOT yet found a way to do this with the .Net classes. It seems that in a lot of situations, Powershell or even .Net classes are nice and sweet wrappers on the underlying Windows APIs… but for how sweet and easy they are, they are very often not very complete wrappers – letting you do just about enough for most situations, but not quite everything you would or could with the APi underneath. But I digress, here...

    The good news is that I did manage to get this working, but I had to resort to using dear old WMI StdRegProvider… There are a number of locations on the Internet mentioning the issue of accessing 32bit registry from 64bit machines or vice versa, but all examples I have found were using VBScript. But I needed it in Powershell. Therefore I started with the VBScript example code that is present here, and I ported it to Powershell.

    Handling the WMI COM object from Powershell was slightly less intuitive than in VBScript, and it took me a couple of hours to figure out how to change some stuff, especially this bit that sets the parameters collection:

    Set Inparams = objStdRegProv.Methods_("GetStringValue").Inparameters

    Inparams.Hdefkey = HKLM

    Inparams.Ssubkeyname = RegKey

    Inparams.Svaluename = RegValue

    Set Outparams = objStdRegProv.ExecMethod_("GetStringValue", Inparams,,objCtx)

    INTO this:

    $Inparams = ($objStdRegProv.Methods_ | where {$_.name -eq "GetStringValue"}).InParameters.SpawnInstance_()

    ($Inparams.Properties_ | where {$_.name -eq "Hdefkey"}).Value = $HKLM

    ($Inparams.Properties_ | where {$_.name -eq "Ssubkeyname"}).Value = $regkey

    ($Inparams.Properties_ | where {$_.name -eq "Svaluename"}).Value = $value

    $Outparams = $objStdRegProv.ExecMethod_("GetStringValue", $Inparams, "", $objNamedValueSet)

    I have only done limited testing at this point and, even if the actual work now requires nearly 15 lines of code to be performed vs. the previous 3 lines in the .Net implementation, it at least seems to work just fine.

    What follows is the complete code of my replacement function, in all its uglyness glory:

    1. Function GetValueFromRegistryThruWMI([string]$computername, $regkey, $value)  
    2. {  
    3.     #constant for the HLKM  
    4.     $HKLM = "&h80000002" 
    5.  
    6.     #creates an SwbemNamedValueSet object
    7.     $objNamedValueSet = New-Object -COM "WbemScripting.SWbemNamedValueSet" 
    8.  
    9.     #adds the actual value that will requests the target to provide 64bit-registry info
    10.     $objNamedValueSet.Add("__ProviderArchitecture", 64) | Out-Null 
    11.  
    12.     #back to all the other usual COM objects for WMI that you have used a zillion times in VBScript
    13.     $objLocator = New-Object -COM "Wbemscripting.SWbemLocator" 
    14.     $objServices = $objLocator.ConnectServer($computername,"root\default","","","","","",$objNamedValueSet)  
    15.     $objStdRegProv = $objServices.Get("StdRegProv")  
    16.  
    17.     # Obtain an InParameters object specific to the method.  
    18.     $Inparams = ($objStdRegProv.Methods_ | where {$_.name -eq "GetStringValue"}).InParameters.SpawnInstance_()  
    19.  
    20.     # Add the input parameters  
    21.     ($Inparams.Properties_ | where {$_.name -eq "Hdefkey"}).Value = $HKLM 
    22.     ($Inparams.Properties_ | where {$_.name -eq "Ssubkeyname"}).Value = $regkey 
    23.     ($Inparams.Properties_ | where {$_.name -eq "Svaluename"}).Value = $value 
    24.  
    25.     #Execute the method  
    26.     $Outparams = $objStdRegProv.ExecMethod_("GetStringValue", $Inparams, "", $objNamedValueSet)  
    27.  
    28.     #shows the return value  
    29.     ($Outparams.Properties_ | where {$_.name -eq "ReturnValue"}).Value  
    30.  
    31.     if (($Outparams.Properties_ | where {$_.name -eq "ReturnValue"}).Value -eq 0)  
    32.     {  
    33.        write-host "it worked" 
    34.        $result = ($Outparams.Properties_ | where {$_.name -eq "sValue"}).Value  
    35.        write-host "Result: $result" 
    36.        return $result 
    37.     }  
    38.     else 
    39.     {  
    40.         write-host "nope" 
    41.     }  
    42. }  

     

    which can be called similarly to the previous one:
    GetValueFromRegistryThruWMI $RMS "SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Setup" "DatabaseServerName"

    [Note: you don’t need the double\escape backslashes here, compared to the .Net implementation]

    Enjoy your cross-architecture registry access: from 32bit to 64bit - and back!

  • musc@> $daniele.work.ToString()

    SCX Evolutions

    • 0 Comments

    During the beta of the Cross-Platform extensions and of System Center Operations Manager 2007 R2, the product team had promised to eventually release the SCX Providers'source code.

    Now that this promise has been mantained, and the SCX providers have been released on Codeplex at http://xplatproviders.codeplex.com/ it should be finally possible to entirely build your own unsupported agent package, starting from source code, without having to modify the original package as I have shown earlier on this blog.
    Of course this will still be unsupported by Microsoft Product support, but will eventually work just fine!
    This is an extraordinary event in my opinion, as it is not a common event that Microsoft releases code as open source, especially when this is part of one of the product it sells. I suspect we will see more of this as we going forward.

    Also, at R2 release time, some official documentation about buildilng Cross-Plaform Management Packs has been published on Technet.

    Anyway, I have in the past posted a number of posts on my blog under this tag http://www.muscetta.com/tag/xplat/ (I will continue to use that tag going forward) which show/describe how I hacked/modified both the existing MPs AND the SCX agent package to let it run on unsupported distributions (and I think they are still useful as they show a number of techniques about how to test, understand and troubleshoot the Xplat agent a bit. In fact, I have first learned how to understand and modify the RedHat MPs to monitor CentOS and eventually even modified the RPM package to run on Ubuntu (which also works on Debian 5/Lenny), eventually, as you can see because I am now using it to monitor - from home, across the Internet - the machine running this blog:

    www.muscetta.com Performance in OpsMgr

    Or even, with or without OpsMgr 2007 R2, you could write your own scripts to interact with those providers, by using your favourite Scripting Language.

    After all, those experimentations with Xplat got me a fame of being a "Unix expert at Microsoft" (this expression still makes me laugh), as I was tweeting here:
    Unix expert at Microsoft

    But really, I have never hidden my interest for interoperability and the fact that I have been using Linux quite a bit in the past, and still do.

    Also, one more related information is that the fine people at Xandros have released their Bridgeways Management Packs and at the same time also started their own blog at http://blog.xplatxperts.com/ where they discuss some troubleshooting techniques for the Xplat agent, both similar to what I have been writing about here and also - of course - specific to their own providers, that are in their XSM namespace.

    Disclaimer

    The information in this weblog is provided "AS IS" with no warranties, and confers no rights. This weblog does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my own personal opinion. All code samples are provided "AS IS" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular purpose.
    THIS WORK IS NOT ENDORSED AND NOT EVEN CHECKED, AUTHORIZED, SCRUTINIZED NOR APPROVED BY MY EMPLOYER, AND IT ONLY REPRESENT SOMETHING WHICH I'VE DONE IN MY FREE TIME. NO GUARANTEE WHATSOEVER IS GIVEN ON THIS. THE AUTHOR SHALL NOT BE MADE RESPONSIBLE FOR ANY DAMAGE YOU MIGHT INCUR WHEN USING THIS INFORMATION. The solution presented here IS NOT SUPPORTED by Microsoft.

Page 1 of 2 (49 items) 12