October, 2010

  • musc@> $daniele.work.ToString()

    Microsoft.Linux.RHEL.5.LogicalDisk.DiskBytesPerSecond Type Mismatch

    • 0 Comments

    I have had the following in my notes for a while… and I have not blogged in a while (been too busy) so I decided to blog it today, before the topic gets too old and starts stinking Smile

     

    It all started when a customer showed me an Alert he was seeing in his environment from some XPlat workflow. The alert looks like the following:

    Generic Performance Mapper Module Failed Execution
    Alert Description Source: RLWSCOM02.domain.dom
    Module was unable to convert parameter to a double value
    Original parameter: '$Data///*[local-name()="BytesPerSecond"]$'
    Parameter after $Data replacement: ''
    Error: 0x80020005
    Details: Type mismatch.
    One or more workflows were affected by this.
    Workflow name: Microsoft.Linux.RHEL.5.LogicalDisk.DiskBytesPerSecond.Collection
    Instance name: /
    Instance ID: {4F6FA8F5-C56F-4C9B-ED36-12DAFF4073D1}
    Management group: DataCenter
    Path: RLWSCOM02.domain.dom\RLWSCOM02.domain.dom Alert Rule: Generic Performance Mapper Module Runtime Failure Created: 6/28/2010 11:30:28 PM

     

    First I stumbled into this forum post which mentions he same symptom http://social.technet.microsoft.com/Forums/en-US/crossplatformgeneral/thread/62e0bf3e-be6f-4218-a37b-f1e66f02aa49 - but when looking at the resolution, the locale on the customer machine was good (== set to US settings), so I concluded that it was not the same root cause.

     

    Then I looked at what that rule was supposed to do, and queried the same CIM class both remotely thru WS-Man and locally via CIM, and concluded that my issue was that certain values were returning as NULL while we were expecting to see a number on the Management Server – therefore the Type Mismatch!

    I have explained previously how to run CIM queries against the XPlat agent; in this case it was the following one:

    winrm enumerate http://schemas.microsoft.com/wbem/wscim/1/cim-schema/2/SCX_FileSystemStatisticalInformation?__cimnamespace=root/scx -username:scomuser -password:password -r:https://rllspago01.domain.dom:1270/wsman -auth:basic –skipCACheck -skipCNCheck

     

    SCX_FileSystemStatisticalInformation

    AverageDiskQueueLength = null

    AverageTransferTime = null

    BytesPerSecond = null

    Caption = File system information

    Description = Performance statistics related to a logical unit of secondary storage

    ElementName = null

    FreeMegabytes = 4007

    IsAggregate = false

    IsOnline = true

    Name = /

    PercentBusyTime = null

    PercentFreeSpace = 55

    PercentIdleTime = null

    PercentUsedSpace = 45

    ReadBytesPerSecond = null

    ReadsPerSecond = null

    TransfersPerSecond = null

    UsedMegabytes = 3278

    WriteBytesPerSecond = null

    WritesPerSecond = null

     

    See the NULLs ? Those are our issue.

    Now, before you continue reading, I will tell you that I have investigated this also internally, and apparently we have just (in Cumulative Update 3) changed this behaviour in our XPlat modules, so that when NULL is returned, we consider it to be ZERO. Good or bad that is, it will at least take care of the error. But if you don’t get any data from the Unix system… well, you are not getting any data – so that might cause a surprise later on when you go and look at those charts and expect to see your disk “performance counters” but in fact all you have is a bunch of ZERO’s (how very interesting!). So, basically, the fix in CU3 suppresses the symptom, but does not address the cause.

    So, let’s see what is actually causing this, as you might well want to get those statistics, or probably you would not be monitoring that server!

    I looked at the Cimd.log (set to verbose) only says the following (basically not much: is getting info for 3 partitions… and the provider code is working)

    2010-09-01T08:38:32,796Z Trace      [scx.core.providers.diskprovider:5964:3086830480] BaseProvider::EnumInstances()

    2010-09-01T08:38:33,359Z Trace      [scx.core.providers.diskprovider:5964:3086830480] Object Path = //rllspago01.domain.dom/root/scx:SCX_FileSystemStatisticalInformation

    2010-09-01T08:38:33,359Z Trace      [scx.core.providers.diskprovider:5964:3086830480] BaseProvider::EnumInstances() - Calling DoEnumInstances()

    2010-09-01T08:38:33,359Z Trace      [scx.core.providers.diskprovider:5964:3086830480] DiskProvider DoEnumInstances

    2010-09-01T08:38:33,359Z Trace      [scx.core.providers.diskprovider:5964:3086830480] DiskProvider GetDiskEnumeration - type 3

    2010-09-01T08:38:33,360Z Trace      [scx.core.providers.diskprovider:5964:3086830480] BaseProvider::EnumInstances() - DoEnumInstances() returned - 3

    2010-09-01T08:38:33,360Z Trace      [scx.core.providers.diskprovider:5964:3086830480] BaseProvider::EnumInstances() - Call ReturnDone

    2010-09-01T08:38:33,360Z Trace      [scx.core.providers.diskprovider:5964:3086830480] BaseProvider::EnumInstances() - return OK

    2010-09-01T08:38:33,360Z Trace      [scx.core.provsup.cmpibase.singleprovider.DiskProvider:5964:3086830480] SingleProvider::EnumInstances() - Returning - 0

     

    but it still did not give me an idea as to why we would not get data for those “counters”. A this point I stopped using complex troubleshooting techniques and simply turned intuition on, and tried with some help from a search engine: http://www.bing.com/search?q=How+do+I+find+out+Linux+Disk+utilization 

    the results I got all mentioned that on Linux you would use the “iostat” command.

    So I tried to use and… lol and behold: the iostat commend was NOT INSTALLED on that machine!

    Guess what? We installed it (it is included in the “sysstat” package for RedHat linux, so a simple “yum install sysstat” took care of this) and the counters started working!

    Hope that is useful to some.

Page 1 of 1 (1 items)