This is Ron Stock from the Global Escalation Services team and I recently had the task of live debugging a customer’s remote server. In debug circles we use what is known as a crash cart to live debug production servers. The phrase conjures up visions of a wheeled cabinet containing an emergency defibrillator, a heart monitor and latex gloves. Luckily for our purposes, the term merely denotes a machine setup with the Debugging Tools for Windows. This life saving machine is attached to the ailing production server for debugging and no medical degree is required.
The ailing production server is referred to as the Target Computer and the Debugging Tools for Windows are installed on the Host computer. The machines are attached with either a null-modem cable, 1394 cable, a special USB cable, or an ethernet cable (network debugging was added in Windows 8). Below I outline serial debugging because this is the most common technique. In future articles I plan to discuss configuring the other methods.
Serial Connection Setup
A null-modem cable is a serial cable used to send data between two serial ports and it can be cheaply purchased at most electronics stores. Be aware these are different from standard serial cables because the transmit and receive lines are cross linked.
Plug the null-modem cable into a serial port on each of the computers. The serial port on the target computer must be built into the system, add on components such as PCI cards will not work for serial debugging on the target computer.
Target Computer setup
1. To enable debugging enter the following command from an elevated command prompt.
bcdedit /debug on
2. In most systems the default debug settings are sufficient. The current settings can be verified with the below command.
3. Use the below command if you need to change the debug settings, where x is the number of the COM port connected to the null modem cable on the target machine and rate is the baud rate desired for debugging. The rate is usually 115200.
bcdedit /dbgsettings serial debugport:x baudrate:115200
5. Reboot the target computer.
Host Computer setup
1. First install the Windows Debugging Tools on the host computer. Navigate to the Windows Software Development Kit (SDK) currently located at this link http://msdn.microsoft.com/en-US/windows/hardware/hh852363 and choose the download option.
a. If you are not able to install the SDK on the host computer, the debugging tools for windows can be installed to a different system and the debugger directory can be copied to the host computer.
2. Click Next until you see the Select the features you want to install screen.
3. Select only the option named Debugging Tools for Windows and click the Install button. I typically install the tools to a directory named C:\debugger
4. After the Windows Debugging Tools are installed I set my symbol path by setting the environment variable _NT_SYMBOL_PATH. I recommend setting it to the public symbol server SRV*c:\localsymbols*http://msdl.microsoft.com/download/symbols. If you prefer, you can specify any path in place of ‘c:\localsymbols’.
5. Open the debugger by running windbg.exe from the c:\debugger folder.
6. On the File menu, choose Kernel Debug.
7. In the Kernel Debugging dialog box, open the COM tab.
8. In the Baud rate box, enter the same rate you selected for the Target Machine in the steps above. This is usually 115200.
9. In the Port box, enter COMx where x is the COM port connected to the null modem cable on this computer. In my example I plugged my null modem cable to com port 1 so I typed com1 in the field.
a. It is not necessary to use the same port number on both the target and the host. For example, it is possible to use com1 on the target and com2 on the host.
10. Click OK and you’ll receive a message indicating the Host computer is waiting to connect.
11. From the Debug menu select Break will cause the debugger to break into the target machine and give you the opportunity to debug your ailing production server. Good Luck!!
Hello Debuggers! This is Ron Stock from the Global Escalation Services team and I recently worked an interesting case dispatched to our team because Services.exe was terminating. Nothing good ever happens when Services.exe exits. In this particular case, client RDP sessions were forcibly disconnected from the server and the server machine was shutting down unexpectedly. This is the message encountered at the console on the server.
The customer was able to trigger the crash by changing anything to do with a particular non-Microsoft service in the Services MMC (e.g. changing the Startup Type, changing the dependencies, stopping the service etc...) To protect our vendor friend, I refer to this service as FriendlyService throughout this article. We could have stopped the investigation right here and implicated the FriendlyService vendor, however as you will see, this service was merely a victim.
When investigating a process termination, I routinely gather a process dump to start my investigation. The frustrating thing about this instance was how none of our debugging tools were generating a dump file when Services.exe terminated. I tried the usual toolset including AdPlus, ProcDump and DebugDiag to no avail. Despite the lack of data from a memory dump, I was still able to piece together a stack and attack the problem through a creative approach. Debugging is an art.
First I reviewed the application log which was loaded with entries like this one.
11/29/2012 04:29:05 PM Information HE2NTSP208 1004 Application Error N/A Faulting application services.exe, version 5.2.3790.4455, faulting module msvcr80.dll, version 8.0.50727.6195, fault address 0x000000000001df67.
From the event log entry above I was able to determine the faulting module, msvcr80.dll, and the relative offset, 0x000000000001df67. This address is basically the return address of the faulting call. As you can see from the ln output below, the function name was msvcr80!wcscpy_s. As I noted above, I wasn’t able to gather a dump during process termination so I had the customer use ProcDump to snap a dump of Services.exe during normal process operation (not a crash dump).
0:000> ln 00000000`78130000 + 0x000000000001df67 (00000000`7814ded0) msvcr80!wcscpy_s+0x97 | (00000000`7814df80) msvcr80!wcsncpy_s
Next I needed to determine which DLLs in the Services.exe process were calling msvcr80!wcscpy_s by reviewing the import tables of the binaries loading in the process. I used the !peb command to dump out the Process Environment Block (PEB). This gave me the list of base addresses for each loaded DLL. I focused mainly on non-Microsoft binaries. To protect our vendor friends, I renamed the DLL to ThirdPartyServiceMonitor.Dll in this article.
This is the output from the !peb command with the base addresses in the left column.
PEB at 000007fffffde000
Ldr.InInitializationOrderModuleList: 00000000000d2df0 . 00000000001a3600
Ldr.InLoadOrderModuleList: 00000000000d2d20 . 000000000019cbc0
Ldr.InMemoryOrderModuleList: 00000000000d2d30 . 000000000019cbd0
Base TimeStamp Module
100000000 49882047 Feb 03 04:45:27 2009 C:\WINDOWS\system32\services.exe
77ec0000 4ecbcd57 Nov 22 10:27:03 2011 C:\WINDOWS\system32\ntdll.dll
77d40000 49c51cdd Mar 21 11:59:09 2009 C:\WINDOWS\system32\kernel32.dll
7ff7fc00000 45d6ccae Feb 17 03:36:46 2007 C:\WINDOWS\system32\msvcrt.dll
7ff7fee0000 4a61f064 Jul 18 10:55:16 2009 C:\WINDOWS\system32\ADVAPI32.dll
7ff7fd30000 4c6ba77a Aug 18 04:27:22 2010 C:\WINDOWS\system32\RPCRT4.dll
7ff7e9c0000 4a37438e Jun 16 02:02:38 2009 C:\WINDOWS\system32\Secur32.dll
77c20000 45e7c5c2 Mar 02 00:35:46 2007 C:\WINDOWS\system32\USER32.dll
7ff7fc90000 490062ac Oct 23 06:40:28 2008 C:\WINDOWS\system32\GDI32.dll
7ff7c680000 45d6ccab Feb 17 03:36:43 2007 C:\WINDOWS\system32\USERENV.dll
7ff7c450000 45d6cc90 Feb 17 03:36:16 2007 C:\WINDOWS\system32\SCESRV.dll
7ff7e490000 45d6cc04 Feb 17 03:33:56 2007 C:\WINDOWS\system32\AUTHZ.dll
7ff77370000 4fedd464 Jun 29 11:14:28 2012 C:\WINDOWS\system32\NETAPI32.dll
7ff7c410000 45d6cca5 Feb 17 03:36:37 2007 C:\WINDOWS\system32\umpnpmgr.dll
7ff7d4d0000 45d6ccb8 Feb 17 03:36:56 2007 C:\WINDOWS\system32\WINSTA.dll
7ff65470000 45d6cc5c Feb 17 03:35:24 2007 E:\Program Files\ThirdPartyDirectory\ThirdParty2.dll
400000 424360e9 Mar 24 19:52:57 2005 C:\WINDOWS\system32\msvcp60.dll
7ff7d500000 45d6cc3b Feb 17 03:34:51 2007 C:\WINDOWS\system32\IMM32.DLL
e50000 4df9462e Jun 15 18:54:22 2011 E:\Program Files\ThirdPartyDirectory\ThirdPartyServiceMonitor.dll
Using the base address of ThirdPartyServiceMonitor, I dumped the header to find the Import Address Table Directory.
0:000> !dh 00000000`00e50000
1E000 [ 758] address [size] of Import Address Table Directory
Using the dps command I dumped all of the functions in the import table of ThirdPartyServiceMonitor.dll. I found msvcr80!wcscpy_s in the function list. This indicates ThirdPartyServiceMonitor.dll makes calls to the msvcr80!wcscpy_s.
0:000> dps 00000000`00e50000+ 1E000 l758/@$ptrsize
00000000`00e6e458 00000000`7814d890 msvcr80!_wcsnicmp
00000000`00e6e460 00000000`7814db20 msvcr80!_wcsicmp
00000000`00e6e498 00000000`7814ded0 msvcr80!wcscpy_s
Since this was the only third party DLL with msvcr80!wcscpy_s in its import table, I was able to continue piecing together my stack. ThirdPartyServiceMonitor.dll was calling msvcr80!wcscpy_s and causing Services.exe to crash. At this point in the investigation, the stack looks like this-
In my quest to continue building my “conceptual” stack without a crash dump file, I reviewed the status code from the “System Shutdown” dialog displayed when Services.exe terminated. Notice the status code -1073741811 in the error. What the heck does that mean?
Well I easily resolved the cryptic status code by passing 0n-1073741811 to the !error command in the debugger. The “0n” prefix indicates the value should be interpreted as decimal rather than hex by the debugger. I also included the negative symbol “-“ because this also appears in status code.
0:030> !error 0n-1073741811
Error code: (NTSTATUS) 0xc000000d (3221225485) - An invalid parameter was passed to a service or function.
Armed with the status code information, I reviewed the msvcr80!wcscpy_s assembly code to determine if this status code was returned at any point. In the assembly I found a call to a function named msvcr80!_invalid_parameter. In context of the message, “an invalid parameter was passed to a service or function”, this certainly sounds like the code path taken.
0:000> uf msvcr80!wcscpy_s
18 00000000`7814deea e88184feff call msvcr80!_errno(00000000`78136370)
18 00000000`7814deef 4533c9 xor r9d,r9d
18 00000000`7814def2 4533c0 xor r8d,r8d
18 00000000`7814def5 33d2 xor edx,edx
18 00000000`7814def7 33c9 xor ecx,ecx
18 00000000`7814def9 48c744242000000000 mov qword ptr [rsp+20h],0
18 00000000`7814df02 c70016000000 mov dword ptr[rax],16h
18 00000000`7814df08 e873d1feff call msvcr80!_invalid_parameter (00000000`7813b080)
18 00000000`7814df0d b816000000 mov eax,16h
34 00000000`7814df12 4883c438 add rsp,38h
34 00000000`7814df16 c3 ret
It was reasonable to add this call to my conceptual stack because the call tree makes sense.
Because I’m a curious kind of guy, I unassembled msvcr80!_invalid_parameter to peel back another layer of the onion. To my surprise I found a call to msvcr80!_imp_TerminateProcess. BOOM! This explains why the debugger wasn’t catching the process crash. The process was terminating ‘organically’ through a TerminateProcess call rather than crashing due to an exception, however it was unexpectedly terminating. In other words, all of the services running on the machine were not expecting Services.exe to terminate.
88 00000000`7813b155 ff156d200900 call qword ptr [msvcr80!_imp_GetCurrentProcess (00000000`781cd1c8)]
88 00000000`7813b15b ba0d0000c0 mov edx,0C000000Dh
88 00000000`7813b160 488bc8 mov rcx,rax
88 00000000`7813b163 ff1557200900 call qword ptr [msvcr80!_imp_TerminateProcess(00000000`781cd1c0)]
Now I was able to cobble together a fairly accurate stack without a dump file. At this point I could tell ThirdPartyServiceMonitor.dll was passing bad parameters to msvcr80!wcscpy_s. However, this didn’t explain how FriendlyService (mentioned in the blog introduction) was triggering the issue. I needed to go deeper with a live debug by leveraging the stack information I devised.
On the customer’s server I attached Windbg to the Services.exe process and set a breakpoint on msvcr80!_invalid_parameter.
0:001> bp msvcr80!_invalid_parameter
Then I had the customer reproduce the issue by changing the startup type on FriendlyService in the Services MMC. As I mentioned above, this was one way to trigger the issue. BOOM! My breakpoint hit and looked exactly like the conceptual stack I pieced together. Now I was able to determine what ThirdPartyServiceMonitor was passing to msvcr80!wcscpy_s while broken in with the debugger.
Child-SP RetAddr Call Site
00000000`105deb00 00000000`7814df67 msvcr80!_invalid_parameter+0xe3
00000000`105df0c0 00000000`00e53045 msvcr80!wcscpy_s+0x97
00000000`105df100 00000000`00e5947e ThirdPartyServiceMonitor+0x3045
00000000`105df130 00000000`00e58405 ThirdPartyServiceMonitor+0x947e
00000000`105df180 000007ff`7fd69c75 ThirdPartyServiceMonitor+0x8405
00000000`105df1b0 000007ff`7fe9ccc9 rpcrt4!Invoke+0x65
00000000`105df200 000007ff`7fe9d58d rpcrt4!NdrStubCall2+0x54d
I reviewed the assembly of ThirdPartyServiceMonitor at the point in which it calls msvcr80!wcscpy_s. I discovered that the vendor hardcoded the size of the destination string buffer with 200 hex (512 decimal) while the size of the source string buffer was greater than 512 decimal.
0:035> ub 00000000`00e53045
00000000`00e53020 898324040000 mov dword ptr [rbx+424h],eax
00000000`00e53026 488b4738 mov rax,qword ptr [rdi+38h]
00000000`00e5302a 4c8b4010 mov r8,qword ptr [rax+10h] <<<< Source Buffer
00000000`00e5302e 4d85c0 test r8,r8
00000000`00e53031 7412 je ThirdPartyServiceMonitor+0x3045 (00000000`00e53045)
00000000`00e53033 488d8b28040000 lea rcx,[rbx+428h] <<<< Destination Buffer
00000000`00e5303a ba00020000 mov edx,200h <<<<< Buffer Size hardcoded to 200 hex
00000000`00e5303f ff1553b40100 call qword ptr [ThirdPartyServiceMonitor!PlugControl+0xff38 (00000000`00e6e498)] <<<< call to msvcr80!wcscpy_s
00000000`00e53045 eb16 jmp ThirdPartyServiceMonitor +0x305d (00000000`00e5305d) <<<< return from msvcr80!wcscpy_s
0:035> .formats 200
Here is the definition of wcscpy_s from MSDN
wchar_t *strDestination, // Location of destination string buffer
size_t numberOfElements, // Size of the destination string buffer.
const wchar_t *strSource // Null-terminated source string buffer.
Here are the parameters passed to msvcr80!wcscpy_s. As a reminder, the x64 compiler uses rcx to pass the first parameter, rdx for the second, and r8 for the third. In this case the buffer size was the second parameter in rdx and the source buffer was in r8, the third parameter.
0:035> dq 00000000`00e6e498 l1
0:035> ln 00000000`7814ded0
(00000000`7814ded0) msvcr80!wcscpy_s | (00000000`7814df80) msvcr80!wcsncpy_s
Dumping the source string in r8 showed the string was clearly longer than 512 characters.
0:035> dc 00000000`0103b438
00000000`0103b438 003a0000 0049005c 0074006e 00670065 ..:.\.F.r.e.i.n.
00000000`0103b448 00610072 00690074 006e006f 00670041 d.l.y.S.e.r.v.i.
00000000`0103b458 006e0065 005c0074 0069006c 005c0062 c.e.\.l.i.b.\..
00000000`0103b738 00650070 002e0072 00700061 002e0070 p.e.r...a.p.p...
00000000`0103b748 00610070 00610072 0065006d 00650074 p.a.r.a.m.e.t.e.
00000000`0103b758 002e0072 003d0031 00770020 00610072 r...1.=. .w.r.a.
00000000`0103b768 00700070 00720065 0061002e 00700070 p.p.e.r...a.p.p.
00000000`0103b778 0070002e 00720061 006d0061 00740065 ..p.a.r.a.m.e.t.
00000000`0103b788 00720065 0032002e 0020003d 00720077 e.r...2.=. .w.r.
00000000`0103b798 00700061 00650070 002e0072 00700061 a.p.p.e.r...a.p.
00000000`0103b7a8 002e0070 00610070 00610072 0065006d p...p.a.r.a.m.e.
00000000`0103b7b8 00650074 002e0072 003d0033 00770020 t.e.r...3.=. .w.
00000000`0103b7c8 00610072 00700070 00720065 0061002e r.a.p.p.e.r...a.
00000000`0103b7d8 00700070 0070002e 00720061 006d0061 p.p...p.a.r.a.m.
00000000`0103b7e8 00740065 00720065 0034002e 0020003d e.t.e.r...4.=. .
00000000`0103b7f8 00720077 00700061 00650070 002e0072 w.r.a.p.p.e.r...
00000000`0103b808 00700061 002e0070 00610070 00610072 a.p.p...p.a.r.a.
00000000`0103b818 0065006d 00650074 002e0072 003d0035 m.e.t.e.r...5.=.
00000000`0103b828 00770020 00610072 00700070 00720065 .w.r.a.p.p.e.r.
It was not hard to figure out this string was the imagefile path located at HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\FriendlyService. You can see from the string that service takes parameters which increased the size of the image path beyond 512 characters.
FriendlyService\lib\friendly.3\sbin\friendlywindows-x86-32.exe-s E:\ FriendlyService\conf\wrapper.conf set.DATA_APP=apia "set.DATA_APP_LONG= FriendlyServiceVendor" set.DATA_EXE=E:\ FriendlyService\bin\..\lib\friendly.3\bin set.DATA_HOME=E:\ FriendlyService\bin\..\lib\friendly.3 set.INSTALL_DIR=E:\ FriendlyService\bin\.. wrapper.working.dir=E:\ FriendlyService\bin\.. wrapper.app.parameter.1= wrapper.app.parameter.2= wrapper.app.parameter.3= wrapper.app.parameter.4= wrapper.app.parameter.5= wrapper.app.parameter.6= wrapper.app.parameter.7= wrapper.app.parameter.8= wrapper.app.parameter.9=
The ThirdPartyServiceMonitor hooks any changes made to a service (e.g. changing the Startup Type, changing the dependencies, stopping the service etc...). After hooking the change, ThirdPartyServiceMonitor.dll performs a string copy of the service’s image file path. In most cases this works like a champ, however in this instance, the FriendlyService image path from vendor 1 is really long and the ThirdPartyServiceMonitor from vendor 2 doesn’t account for service image paths exceeding 512 characters. This is the perfect storm! ThirdPartyServiceMonitor needs to remove the hardcoded buffer size.
This example occurred on Windows Server 2003. Starting in Windows 7 and Windows Server 2008 R2, Windows has added functionality for catching processes that exit ‘organically’ through a TerminateProcess call. You can find information on this added functionality on MSDN.