Hello all, East here. I wanted to give you a hint on how to use a RHS dump to find what thread was part of the Windows Server 2008 R2 cluster RHS recovery deadlock.
First let me start off with letting you know that Windows Server 2008 R2 will create two types of user-mode dumps:
1 - A heap dump (.hdmp) - which is an extended mini-dump that contains additional data such as the process memory.
2 - A mini dump (.mdmp) - which will only contain mainly stack information and not enough info to help you get further.
When a Windows Server 2008 R2 cluster has a resource that does not respond to one of the main cluster queries in a timely fashion, RHS will try to recover that resource by restarting it in an isolated process as well as creating a WER report and a user-mode dump of the deadlock.
NOTE: More info about RHS and WER reports can be seen in the following blog - http://blogs.msdn.com/b/clustering/archive/2009/06/27/9806160.aspx
When a deadlock is detected the cluster log will show:
"00000dd4.00000aac::2011/05/13-16:12:39.559 ERR [RHS] WER report is submitted. Result : WerReportQueued."
You should find the WER report file and dumps located at the default location:
Here is what a cluster created Windows Error Report looks like:
EventType=WSFC Resource Deadlock
File.Path=WERC1F2.tmp.hdmp <----------------------------- Name of the dump file
FriendlyEventName=WSFC Resource Deadlock
ConsentKey=WSFC Resource Host Monitor
AppName=Failover Cluster Resource Host Subsystem <--------- The application that had an issue - RHS
ReportDescription=WSFC Resource Deadlock
Now you have the area to look for in the cluster log, and an idea of which resource caused RHS to deadlock, and what it was doing at the time:
The Resource Name = New flexdisk
The Resource Type = flexdisk
The type of call the resource was engaged in = ISALIVE
You have all this information in the cluster log as well:
00000dd4.00000aac::2011/05/13-16:12:39.136 ERR [RHS] RhsCall::DeadlockMonitor: Call ISALIVE timed out for resource 'New flexdisk'. RHS deadlock during an Isalive check for the New flexdisk resource.
00000dd4.00000aac::2011/05/13-16:12:39.136 INFO [RHS] Enabling RHS termination watchdog with timeout 1200000 and recovery action 3.
00000dd4.00000aac::2011/05/13-16:12:39.136 ERR [RHS] Resource New flexdisk handling deadlock. Cleaning current operation and terminating RHS process.
00000dd4.00000aac::2011/05/13-16:12:39.136 ERR [RHS] About to send WER report.
000004dc.00000fb4::2011/05/13-16:12:39.136 WARN [RCM] HandleMonitorReply: FAILURENOTIFICATION for 'New flexdisk', gen(3) result 4.
000004dc.00000fb4::2011/05/13-16:12:39.136 INFO [RCM] rcm::RcmResource::HandleMonitorReply: Resource 'New flexdisk' consecutive failure count 1.
00000dd4.00000aac::2011/05/13-16:12:39.559 ERR [RHS] WER report is submitted. Result : WerReportQueued. WER report and user-mode dump created
000004dc.00000fb4::2011/05/13-16:12:39.559 ERR [RCM] rcm::RcmMonitor::RecoverProcess: Recovering monitor process 3540 / 0xdd4
000004dc.00000fb4::2011/05/13-16:12:39.575 INFO [RCM] Created monitor process 3280 / 0xcd0
00000cd0.00000200::2011/05/13-16:12:39.575 INFO [RHS] Initializing.
000004dc.00000fb4::2011/05/13-16:12:39.591 INFO [RCM] rcm::RcmResource::ReattachToMonitorProcess: (New flexdisk, OfflineCallIssued)
000004dc.00000fb4::2011/05/13-16:12:39.591 WARN [RCM] Canceling pending control GET_RO_PRIVATE_PROPERTIES for resource 'New flexdisk' due to monitor crash.
Now if you want to get adventurous and review the dump file to see the call stacks of RHS and what the resource was doing, you will look at the WER dump with the extension “.hdmp”. Since we are looking at a process with multiple threads, we need to find the one that is showing the WER reporting calls and the one showing the RHS deadlock.
Run the following command after opening the dump under your debugger “~*k”. In the below example, thread 3 is the one we are interested in. You can tell this because the process and thread ID matches the cluster log above. If you don’t have a cluster log, this is the thread with wer.dll on the stack.
. 0 Id: dd4.9d4 Suspend: 0 Teb: 000007ff`fffdd000 Unfrozen
Child-SP RetAddr Call Site
00000000`001cf128 000007fe`fd611420 ntdll!ZwWaitForMultipleObjects+0xa
00000000`001cf130 00000000`772c1220 KERNELBASE!WaitForMultipleObjectsEx+0xe8
00000000`001cf230 00000000`ff763d53 kernel32!WaitForMultipleObjects+0xb0
00000000`001cf2c0 00000000`ff74802c rhs!cxl::WaitHandle::WaitAny+0x3f
00000000`001cf3b0 00000000`ff7458a1 rhs!Rhs::WaitForTermination+0x74
00000000`001cf450 00000000`ff769099 rhs!wmain+0x4b1
00000000`001cfbd0 00000000`772c652d rhs!__wmainCRTStartup+0x13d
00000000`001cfc10 00000000`774fc521 kernel32!BaseThreadInitThunk+0xd
00000000`001cfc40 00000000`00000000 ntdll!RtlUserThreadStart+0x1d
1 Id: dd4.d0c Suspend: 0 Teb: 000007ff`fffdb000 Unfrozen
00000000`00c3fa98 000007fe`fd611420 ntdll!ZwWaitForMultipleObjects+0xa
00000000`00c3faa0 00000000`772c1220 KERNELBASE!WaitForMultipleObjectsEx+0xe8
00000000`00c3fba0 00000000`ff763d53 kernel32!WaitForMultipleObjects+0xb0
00000000`00c3fc30 00000000`ff75785c rhs!cxl::WaitHandle::WaitAny+0x3f
00000000`00c3fd20 00000000`ff766988 rhs!PingAliveCallScheduler::Run+0x184
00000000`00c3fda0 00000000`772c652d rhs!cxl::Thread::ThreadStartRoutine+0x18
00000000`00c3fdf0 00000000`774fc521 kernel32!BaseThreadInitThunk+0xd
00000000`00c3fe20 00000000`00000000 ntdll!RtlUserThreadStart+0x1d
2 Id: dd4.e78 Suspend: 0 Teb: 000007ff`fffd9000 Unfrozen
00000000`00baf998 00000000`774eb007 ntdll!ZwWaitForMultipleObjects+0xa
00000000`00baf9a0 00000000`772c652d ntdll!TppWaiterpThread+0x14d
00000000`00bafc40 00000000`774fc521 kernel32!BaseThreadInitThunk+0xd
00000000`00bafc70 00000000`00000000 ntdll!RtlUserThreadStart+0x1d
3 Id: dd4.aac Suspend: 0 Teb: 000007ff`fffd7000 Unfrozen
00000000`00cbe5e8 000007fe`fd611420 ntdll!ZwWaitForMultipleObjects+0xa
00000000`00cbe5f0 00000000`772d2d53 KERNELBASE!WaitForMultipleObjectsEx+0xe8
00000000`00cbe6f0 00000000`773e8f7d kernel32!WaitForMultipleObjectsExImplementation+0xb3
00000000`00cbe780 00000000`773e62b2 user32!RealMsgWaitForMultipleObjectsEx+0x12a
00000000`00cbe820 00000000`773e62e0 user32!MsgWaitForMultipleObjectsEx+0x46
00000000`00cbe860 000007fe`f6f3c35c user32!MsgWaitForMultipleObjects+0x20
00000000`00cbe8a0 000007fe`f6f6dbd0 wer!UtilMsgWaitForMultipleObjects+0x54
00000000`00cbe930 000007fe`f6f54158 wer!CReportManager::ReportProblemOutOfProcess+0x5c4
00000000`00cbecc0 000007fe`f6f3b99b wer!CReportManager::ReportProblem+0x1eb
00000000`00cbede0 000007fe`f6f61cbe wer!CReportHandleInstance::SubmitReport+0x386
00000000`00cbee50 00000000`ff750dcf wer!WerReportSubmit+0x16a
00000000`00cbeef0 00000000`ff75117b rhs!RhsResource::WERReportDeadlock+0x43f
00000000`00cbfaf0 00000000`ff74b270 rhs!RhsResource::HandleDeadlock+0x187
00000000`00cbfb60 00000000`772c652d rhs!RhsCall::DeadlockMonitor+0x324
00000000`00cbfc60 00000000`774fc521 kernel32!BaseThreadInitThunk+0xd
00000000`00cbfc90 00000000`00000000 ntdll!RtlUserThreadStart+0x1d
4 Id: dd4.e48 Suspend: 0 Teb: 000007ff`fffae000 Unfrozen
00000000`00e2f0f8 000007fe`fd611203 ntdll!ZwDelayExecution+0xa
00000000`00e2f100 000007fe`f78f1ad9 KERNELBASE!SleepEx+0xab
00000000`00e2f1a0 00000000`00326aa0 FlexRes+0x11ad9
00000000`00e2f1a8 00000000`00000000 0x326aa0
<snipped to save space>
We will switch context to thread 3 by using the “ ~3s ” command, and follow that with “ kn “ to show the stack with frame numbers. Debug Tip: Since we already know that its thread id is “aac” we can also switch directly to thread 3 by using its thread id in the following command syntax: “ ~~[aac]s ”
# Child-SP RetAddr Call Site
00 00000000`00cbe5e8 000007fe`fd611420 ntdll!NtWaitForMultipleObjects+0xa
01 00000000`00cbe5f0 00000000`772d2d53 KERNELBASE!WaitForMultipleObjectsEx+0xe8
02 00000000`00cbe6f0 00000000`773e8f7d kernel32!WaitForMultipleObjectsExImplementation+0xb3
03 00000000`00cbe780 00000000`773e62b2 user32!RealMsgWaitForMultipleObjectsEx+0x12a
04 00000000`00cbe820 00000000`773e62e0 user32!MsgWaitForMultipleObjectsEx+0x46
05 00000000`00cbe860 000007fe`f6f3c35c user32!MsgWaitForMultipleObjects+0x20
06 00000000`00cbe8a0 000007fe`f6f6dbd0 wer!UtilMsgWaitForMultipleObjects+0x54
07 00000000`00cbe930 000007fe`f6f54158 wer!CReportManager::ReportProblemOutOfProcess+0x5c4
08 00000000`00cbecc0 000007fe`f6f3b99b wer!CReportManager::ReportProblem+0x1eb
09 00000000`00cbede0 000007fe`f6f61cbe wer!CReportHandleInstance::SubmitReport+0x386
0a 00000000`00cbee50 00000000`ff750dcf wer!WerReportSubmit+0x16a
0b 00000000`00cbeef0 00000000`ff75117b rhs!RhsResource::WERReportDeadlock+0x43f
0c 00000000`00cbfaf0 00000000`ff74b270 rhs!RhsResource::HandleDeadlock+0x187
0d 00000000`00cbfb60 00000000`772c652d rhs!RhsCall::DeadlockMonitor+0x324
0e 00000000`00cbfc60 00000000`774fc521 kernel32!BaseThreadInitThunk+0xd
0f 00000000`00cbfc90 00000000`00000000 ntdll!RtlUserThreadStart+0x1d
The frame of our concern will be frame d. You can either go the context ( /c ) or just display the registers ( /r ) for that frame context:
0:003> .frame /r d
rax=000007fffffd7000 rbx=0000000000346920 rcx=00000000007316c0
rdx=0000000000000000 rsi=000000000035f8b0 rdi=000000000035f8b0
rip=00000000ff74b270 rsp=0000000000cbfb60 rbp=0000000000000000
r8=00000000ffffffff r9=00000000000004ff r10=0000000000000000
r11=0000000000cbe810 r12=0000000000000000 r13=0000000000000000
iopl=0 nv up ei pl zr na po nc
cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000246
00000000`ff74b270 448b442430 mov r8d,dword ptr [rsp+30h] ss:00000000`00cbfb90=ffffffff
Here is where we get into tricky territory about where to look. I know from experience that the RDI register from above is going to end up containing my information for this resource. Another way to verify you are in the right area, dumping RDI will show the CallType.
Dump that out to find the thread containing the stack with:
0:003> dqs 000000000035f8b0
00000000`0035f8b0 00000000`ff704eb8 rhs!IsAliveCall::`vftable' == CallType
00000000`0035f8c8 00000000`00000e48 <-- Thread id of concerning resource
If we scan our previous “~*k” output, we should find the thread id above to match our thread with the problem resource, FlexRes:
In this case I was using an in house test app that just went to sleep instead of responding to the LooksAlive() check, which caused IsAlive() to fail.
Also if you check the cluster log 5 minutes prior to the RhsCall::DeadlockMonitor error you will see the resource and thread id as well, showing what was going on prior to the Deadlock:
00000dd4.00000e48::2011/05/13-16:07:37.270 INFO [RES] flexdisk <New flexdisk>: Set new entry point = ISALIVEERROR:1722,ISALIVEFAILTYPE:1,ISALIVEHANG:1 Error = 0.
000004dc.00000fd4::2011/05/13-16:07:37.286 WARN [RCM] ResourceControl(SET_PRIVATE_PROPERTIES) to New flexdisk returned 5024.
00000dd4.00000e48::2011/05/13-16:07:38.023 INFO [RES] flexdisk <New flexdisk>: Set new entry point = ISALIVEERROR:1722,ISALIVEFAILTYPE:1,ISALIVEHANG:0 Error = 0.
00000dd4.00000e48::2011/05/13-16:07:38.023 INFO [RES] flexdisk <New flexdisk>: Going to Hang in IsAlive.
In most cases the cluster log can provide enough information to help you narrow down possible cluster resource issues. If not, the user-mode dump may be able to help verify additional data, especially if you are debugging a third party resource dll that is deadlocking RHS.
Greetings fellow debuggers, today I will be blogging about Event ID 129 messages. These warning events are logged to the system event log with the storage adapter (HBA) driver’s name as the source. Windows’ STORPORT.SYS driver logs this message when it detects that a request has timed out, the HBA driver’s name is used in the error because it is the miniport associated with storport.
Below is an example 129 event:
Event Type: Warning
Event Source: <HBA_Name>
Event Category: None
Event ID: 129
Time: 1:15:45 AM
Reset to device, \Device\RaidPort1, was issued.
So what does this mean? Let’s discuss the Windows I/O stack architecture to answer this.
Windows I/O uses a layered architecture where device drivers are on a “device stack.” In a basic model, the top of the stack is the file system. Next is the volume manager, followed by the disk driver. At the bottom of the device stack are the port and miniport drivers. When an I/O request reaches the file system, it takes the block number of the file and translates it to an offset in the volume. The volume manager then translates the volume offset to a block number on the disk and passes the request to the disk driver. When the request reaches the disk driver it will create a Command Descriptor Block (CDB) that will be sent to the SCSI device. The disk driver imbeds the CDB into a structure called the SCSI_REQUEST_BLOCK (SRB). This SRB is sent to the port driver as part of the I/O request packet (IRP).
The port driver does most of the request processing. There are different port drivers depending on the architecture. For example, ATAPORT.SYS is the port driver for ATA devices, and STORPORT.SYS is the port driver for SCSI devices. Some of the responsibilities for a port driver are:
The port driver interfaces with a driver called the “miniport”. The miniport driver is designed by the hardware manufacturer to work with a specific adapter and is responsible for taking requests from the port driver and sending them to the target LUN. The port driver calls the HwStorStartIo() function to send requests to the miniport, and the miniport will send the requests to the HBA so they can be sent over the physical medium (fibre, ethernet, etc) to the LUN. When the request is complete, the miniport will call StorPortNotification() with the NotificationType parameter value set to RequestComplete, along with a pointer to the completed SRB.
When a request is sent to the miniport, STORPORT will put the request in a pending queue. When the request is completed, it is removed from this queue. While requests are in the pending queue they are timed.
The timing mechanism is simple. There is one timer per logical unit and it is initialized to -1. When the first request is sent to the miniport the timer is set to the timeout value in the SRB. The disk timeout value is a tunable parameter in the registry at: HKLM\System\CurrentControlSet\Services\Disk\TimeOutValue. Some vendors will tune this value to best match their hardware, we do not advise changing this value without guidance from your storage vendor.
The timer is decremented once per second. When a request completes, the timer is refreshed with the timeout value of the head request in the pending queue. So, as long as requests complete the timer will never go to zero. If the timer does go to zero, it means the device has stopped responding. That is when the STORPORT driver logs the Event ID 129 error. STORPORT then has to take corrective action by trying to reset the unit. When the unit is reset, all outstanding requests are completed with an error and they are retried. When the pending queue empties, the timer is set to -1 which is its initial value.
Each SRB has a timer value set. As requests are completed the queue timer is refreshed with the timeout value of the SRB at the head of the list.
The most common causes of the Event ID 129 errors are unresponsive LUNs or a dropped request. Dropped requests can be caused by faulty routers or other hardware problems on the SAN.
I have never seen software cause an Event ID 129 error. If you are seeing Event ID 129 errors in your event logs, then you should start investigating the storage and fibre network.