Help! My Server is Shutting Down for No Apparent Reason

Help! My Server is Shutting Down for No Apparent Reason

  • Comments 3

Hello - Rob here with the GES team, and I have this nugget to pass on to you. I recently worked an issue where a Windows server rebooted intermittently for no apparent reason. The Windows System Event log did not yield any clues, other than this Event ID 6008-

 

Log Name:      System.evt

Source:        EventLog

Date:          25-8-2008 19:06:58

Event ID:      6008

Task Category: None

Level:         Error

Keywords:      Classic

User:          N/A

Computer:      A2A000001

Description: The previous system shutdown at 6:54:04 PM on 8/25/2008 was unexpected.

 

There were no other symptoms or patterns to which the unexpected shutdown could be related. The shutdown could occur anytime of the day. Eventually we attached a debugger to see if we could catch anything, but this wasn’t successful.  Next we looked at the manufacturer’s mechanism used to log errors and found this piece of information -

 

An Unrecoverable System Error has occurred (Error code 0x0000002D, 0x00000000)

 

Note - each vendor has their own way to handle error codes. We noticed a one to one relationship with the vendor error above and the Event ID 6008 messages in the Windows System Event log.  So we engaged the hardware vendor who determined this error indicated an error on the PCI bus. They also informed us that this kind of error asserts an NMI on the bus.

 

To narrow down which component was causing the error, we set the NMICrashDump DWORD value under the following key in the registry:

 

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashControl

 

This is described in detail in the article, “927069 How to generate a complete crash dump file or a kernel crash dump file by using an NMI on a Windows-based system”

http://support.microsoft.com/default.aspx?scid=kb;EN-US;927069

 

This registry value causes the machine to bugcheck with a STOP 0x80 (NMI_HARDWARE_FAILURE) when Windows detects an NMI, thus producing a dump file, or, if a debugger is attached, it breaks into the debugger

 

After setting this registry value we hooked up the debugger again and waited... after awhile we got lucky because the debugger intercepted a STOP 0x80!

 

At that time, I ran “!pci 0x102 ff” to get an overview of the various PCI devices and their respective states. The !pci output showed the following output (VendorID and DeviceID have been removed):

 

PCI Configuration Space (Segment:0000 Bus:00 Device:1e Function:00)

Common Header:

    00: VendorID       <vendor>

    02: DeviceID       <device>

    04: Command        0147 IOSpaceEn MemSpaceEn BusInitiate PERREn SERREn

    06: Status         4010 CapList SERR

    08: RevisionID     d9

    09: ProgIF         01 Subtractive

    0a: SubClass       04 PCI-PCI Bridge

    0b: BaseClass      06 Bridge Device

    0c: CacheLineSize  0000

    0d: LatencyTimer   00

    0e: HeaderType     01

    0f: BIST           00

    10: BAR0           00000000

    14: BAR1           00000000

    18: PriBusNum      00

    19: SecBusNum      01

    1a: SubBusNum      01

    1b: SecLatencyTmr  20

    1c: IOBase         20

    1d: IOLimit        30

    1e: SecStatus      6280 FB2BCapable InitiatorAbort SERR DEVSELTiming:1

    20: MemBase        f7e0

    22: MemLimit       f7f0

    24: PrefMemBase    d801

    26: PrefMemLimit   dff1

    28: PrefBaseHi     00000000

    2c: PrefLimitHi    00000000

    30: IOBaseHi       0000

    32: IOLimitHi      0000

    34: CapPtr         50

    38: ROMBAR         00000000

    3c: IntLine        ff

    3d: IntPin         00

    3e: BridgeCtrl     000b PERRREnable SERREnable VGAEnable

 

We couldn't have gone much further without the vendor's assistance. They informed us that the Status shows us SERR, which indicates a PCI System Error has occurred in this PCI-PCI Bridge. At this point I had enough conclusive data to pass my findings to the hardware vendor for full collaboration on the problem. They continued investigating the issue.

 

It should be noted that a hardware problem is not the only reason for an Event ID 6008. A quick search in the Microsoft Knowledge Base illustrates other things that could cause the event id to appear in the Windows System log.

Share this post :
Leave a Comment
  • Please add 6 and 6 and type the answer here:
  • Post
Page 1 of 1 (3 items)