• Ntdebugging Blog

    How to Setup a Debug Crash Cart to Prevent Your Server from Flat Lining

    • 2 Comments

    This is Ron Stock from the Global Escalation Services team and I recently had the task of live debugging a customer’s remote server.  In debug circles we use what is known as a crash cart to live debug production servers. The phrase conjures up visions of a wheeled cabinet containing an emergency defibrillator, a heart monitor and latex gloves. Luckily for our purposes, the term merely denotes a machine setup with the Debugging Tools for Windows. This life saving machine is attached to the ailing production server for debugging and no medical degree is required.

     

    The ailing production server is referred to as the Target Computer and the Debugging Tools for Windows are installed on the Host computer. The machines are attached with either a null-modem cable, 1394 cable, a special USB cable, or an ethernet cable (network debugging was added in Windows 8). Below I outline serial debugging because this is the most common technique.  In future articles I plan to discuss configuring the other methods.

     

    image001

     

    Serial Connection Setup

     

    A null-modem cable is a serial cable used to send data between two serial ports and it can be cheaply purchased at most electronics stores. Be aware these are different from standard serial cables because the transmit and receive lines are cross linked.

     

    Plug the null-modem cable into a serial port on each of the computers.  The serial port on the target computer must be built into the system, add on components such as PCI cards will not work for serial debugging on the target computer.

     

    Target Computer setup

     

    1. To enable debugging enter the following command from an elevated command prompt.

    bcdedit /debug on

     

    2. In most systems the default debug settings are sufficient. The current settings can be verified with the below command.

    bcdedit /dbgsettings

     

    3. Use the below command if you need to change the debug settings, where x is the number of the COM port connected to the null modem cable on the target machine and rate is the baud rate desired for debugging. The rate is usually 115200.

    bcdedit /dbgsettings serial debugport:x baudrate:115200

     

    5. Reboot the target computer.

     

    Host Computer setup

    1. First install the Windows Debugging Tools on the host computer. Navigate to the Windows Software Development Kit (SDK) currently located at this link http://msdn.microsoft.com/en-US/windows/hardware/hh852363 and choose the download option.

          a. If you are not able to install the SDK on the host computer, the debugging tools for windows can be installed to a different system and the debugger directory can be copied to the host computer.

     

    2. Click Next until you see the Select the features you want to install screen.

     

    3. Select only the option named Debugging Tools for Windows and click the Install button. I typically install the tools to a directory named C:\debugger

     

    image002

     

    4. After the Windows Debugging Tools are installed I set my symbol path by setting the environment variable _NT_SYMBOL_PATH. I recommend setting it to the public symbol server SRV*c:\localsymbols*http://msdl.microsoft.com/download/symbols. If you prefer, you can specify any path in place of ‘c:\localsymbols’.

     

    5. Open the debugger by running windbg.exe from the c:\debugger folder.

     

    6. On the File menu, choose Kernel Debug.

     

    7. In the Kernel Debugging dialog box, open the COM tab.

     

    8. In the Baud rate box, enter the same rate you selected for the Target Machine in the steps above. This is usually 115200.

     

    9. In the Port box, enter COMx where x is the COM port connected to the null modem cable on this computer. In my example I plugged my null modem cable to com port 1 so I typed com1 in the field.

          a. It is not necessary to use the same port number on both the target and the host.  For example, it is possible to use com1 on the target and com2 on the host.

     

    image003

     

    10. Click OK and you’ll receive a message indicating the Host computer is waiting to connect.


    image004

     

    11. From the Debug menu select Break will cause the debugger to break into the target machine and give you the opportunity to debug your ailing production server. Good Luck!!

  • Ntdebugging Blog

    Case of the Unexplained Services exe Termination

    • 8 Comments

    Hello Debuggers! This is Ron Stock from the Global Escalation Services team and I recently worked an interesting case dispatched to our team because Services.exe was terminating. Nothing good ever happens when Services.exe exits. In this particular case, client RDP sessions were forcibly disconnected from the server and the server machine was shutting down unexpectedly. This is the message encountered at the console on the server.

     

     

     

    The customer was able to trigger the crash by changing anything to do with a particular non-Microsoft service in the Services MMC (e.g. changing the Startup Type, changing the dependencies, stopping the service etc...) To protect our vendor friend, I refer to this service as FriendlyService throughout this article. We could have stopped the investigation right here and implicated the FriendlyService vendor, however as you will see, this service was merely a victim.

     

    Investigation

    When investigating a process termination, I routinely gather a process dump to start my investigation. The frustrating thing about this instance was how none of our debugging tools were generating a dump file when Services.exe terminated. I tried the usual toolset including AdPlus, ProcDump and DebugDiag to no avail. Despite the lack of data from a memory dump, I was still able to piece together a stack and attack the problem through a creative approach. Debugging is an art.

     

    First I reviewed the application log which was loaded with entries like this one.

     

    11/29/2012 04:29:05 PM  Information   HE2NTSP208       1004    Application Error  N/A Faulting application services.exe, version 5.2.3790.4455, faulting module msvcr80.dll, version 8.0.50727.6195, fault address 0x000000000001df67

     

    From the event log entry above I was able to determine the faulting module, msvcr80.dll, and the relative offset, 0x000000000001df67. This address is basically the return address of the faulting call. As you can see from the ln output below, the function name was msvcr80!wcscpy_s. As I noted above, I wasn’t able to gather a dump during process termination so I had the customer use ProcDump to snap a dump of Services.exe during normal process operation (not a crash dump).

     

    0:000> ln 00000000`78130000 + 0x000000000001df67
    (00000000`7814ded0)   msvcr80!wcscpy_s+0x97   |  (00000000`7814df80)   msvcr80!wcsncpy_s

     

    Next I needed to determine which DLLs in the Services.exe process were calling msvcr80!wcscpy_s by reviewing the import tables of the binaries loading in the process. I used the !peb command to dump out the Process Environment Block (PEB). This gave me the list of base addresses for each loaded DLL. I focused mainly on non-Microsoft binaries. To protect our vendor friends, I renamed the DLL to ThirdPartyServiceMonitor.Dll in this article.

     

    This is the output from the !peb command with the base addresses in the left column.

     

    0:000> !peb

    PEB at 000007fffffde000

        InheritedAddressSpace:    No

        ReadImageFileExecOptions: No

        BeingDebugged:            Yes

        ImageBaseAddress:         0000000100000000

        Ldr                       0000000077fa9f20

        Ldr.Initialized:          Yes

        Ldr.InInitializationOrderModuleList: 00000000000d2df0 . 00000000001a3600

        Ldr.InLoadOrderModuleList:           00000000000d2d20 . 000000000019cbc0

        Ldr.InMemoryOrderModuleList:         00000000000d2d30 . 000000000019cbd0

                Base TimeStamp                     Module

           100000000 49882047 Feb 03 04:45:27 2009 C:\WINDOWS\system32\services.exe

            77ec0000 4ecbcd57 Nov 22 10:27:03 2011 C:\WINDOWS\system32\ntdll.dll

            77d40000 49c51cdd Mar 21 11:59:09 2009 C:\WINDOWS\system32\kernel32.dll

         7ff7fc00000 45d6ccae Feb 17 03:36:46 2007 C:\WINDOWS\system32\msvcrt.dll

         7ff7fee0000 4a61f064 Jul 18 10:55:16 2009 C:\WINDOWS\system32\ADVAPI32.dll

         7ff7fd30000 4c6ba77a Aug 18 04:27:22 2010 C:\WINDOWS\system32\RPCRT4.dll

         7ff7e9c0000 4a37438e Jun 16 02:02:38 2009 C:\WINDOWS\system32\Secur32.dll

            77c20000 45e7c5c2 Mar 02 00:35:46 2007 C:\WINDOWS\system32\USER32.dll

         7ff7fc90000 490062ac Oct 23 06:40:28 2008 C:\WINDOWS\system32\GDI32.dll

         7ff7c680000 45d6ccab Feb 17 03:36:43 2007 C:\WINDOWS\system32\USERENV.dll

         7ff7c450000 45d6cc90 Feb 17 03:36:16 2007 C:\WINDOWS\system32\SCESRV.dll

         7ff7e490000 45d6cc04 Feb 17 03:33:56 2007 C:\WINDOWS\system32\AUTHZ.dll

         7ff77370000 4fedd464 Jun 29 11:14:28 2012 C:\WINDOWS\system32\NETAPI32.dll

         7ff7c410000 45d6cca5 Feb 17 03:36:37 2007 C:\WINDOWS\system32\umpnpmgr.dll

         7ff7d4d0000 45d6ccb8 Feb 17 03:36:56 2007 C:\WINDOWS\system32\WINSTA.dll

         7ff65470000 45d6cc5c Feb 17 03:35:24 2007 E:\Program Files\ThirdPartyDirectory\ThirdParty2.dll

              400000 424360e9 Mar 24 19:52:57 2005 C:\WINDOWS\system32\msvcp60.dll

         7ff7d500000 45d6cc3b Feb 17 03:34:51 2007 C:\WINDOWS\system32\IMM32.DLL

              e50000 4df9462e Jun 15 18:54:22 2011 E:\Program Files\ThirdPartyDirectory\ThirdPartyServiceMonitor.dll

     

    Using the base address of ThirdPartyServiceMonitor, I dumped the header to find the Import Address Table Directory.

     

    0:000> !dh 00000000`00e50000

     

    1E000 [     758] address [size] of Import Address Table Directory

     

    Using the dps command I dumped all of the functions in the import table of ThirdPartyServiceMonitor.dll. I found msvcr80!wcscpy_s in the function list. This indicates ThirdPartyServiceMonitor.dll makes calls to the msvcr80!wcscpy_s.

     

    0:000> dps 00000000`00e50000+ 1E000 l758/@$ptrsize

    <snippet>

    .

    .

    00000000`00e6e458  00000000`7814d890 msvcr80!_wcsnicmp

    00000000`00e6e460  00000000`7814db20 msvcr80!_wcsicmp

    00000000`00e6e498  00000000`7814ded0 msvcr80!wcscpy_s

    .

    .

     

    Since this was the only third party DLL with msvcr80!wcscpy_s in its import table, I was able to continue piecing together my stack. ThirdPartyServiceMonitor.dll was calling msvcr80!wcscpy_s and causing Services.exe to crash. At this point in the investigation, the stack looks like this-

     

    msvcr80!wcscpy_s

    ThirdPartyServiceMonitor+<offset>

     

    In my quest to continue building my “conceptual” stack without a crash dump file, I reviewed the status code from the “System Shutdown” dialog displayed when Services.exe terminated. Notice the status code -1073741811 in the error. What the heck does that mean?

     

     

    Well I easily resolved the cryptic status code by passing 0n-1073741811 to the !error command in the debugger. The “0n” prefix indicates the value should be interpreted as decimal rather than hex by the debugger. I also included the negative symbol “-“ because this also appears in status code.

     

    0:030> !error 0n-1073741811

    Error code: (NTSTATUS) 0xc000000d (3221225485) - An invalid parameter was passed to a service or function.

     

    Armed with the status code information, I reviewed the msvcr80!wcscpy_s assembly code to determine if this status code was returned at any point. In the assembly I found a call to a function named msvcr80!_invalid_parameter. In context of the message, “an invalid parameter was passed to a service or function”, this certainly sounds like the code path taken.

     

    0:000> uf msvcr80!wcscpy_s

    <snippet>

    msvcr80!wcscpy_s+0x1a

       18 00000000`7814deea e88184feff      call    msvcr80!_errno(00000000`78136370)

       18 00000000`7814deef 4533c9          xor     r9d,r9d

       18 00000000`7814def2 4533c0          xor     r8d,r8d

       18 00000000`7814def5 33d2            xor     edx,edx

       18 00000000`7814def7 33c9            xor     ecx,ecx

       18 00000000`7814def9 48c744242000000000 mov   qword ptr [rsp+20h],0

       18 00000000`7814df02 c70016000000    mov     dword ptr[rax],16h

       18 00000000`7814df08 e873d1feff      call    msvcr80!_invalid_parameter (00000000`7813b080)

       18 00000000`7814df0d b816000000      mov     eax,16h

       34 00000000`7814df12 4883c438        add     rsp,38h

       34 00000000`7814df16 c3              ret

     

    It was reasonable to add this call to my conceptual stack because the call tree makes sense.

     

    msvcr80!_invalid_parameter

    msvcr80!wcscpy_s

    ThirdPartyServiceMonitor+<offset>

     

    Because I’m a curious kind of guy, I unassembled msvcr80!_invalid_parameter to peel back another layer of the onion. To my surprise I found a call to msvcr80!_imp_TerminateProcess. BOOM! This explains why the debugger wasn’t catching the process crash. The process was terminating ‘organically’ through a TerminateProcess call rather than crashing due to an exception, however it was unexpectedly terminating. In other words, all of the services running on the machine were not expecting Services.exe to terminate.

     

    msvcr80!_invalid_parameter+0xd5

    <snippet>

       88 00000000`7813b155 ff156d200900    call    qword ptr [msvcr80!_imp_GetCurrentProcess (00000000`781cd1c8)]

       88 00000000`7813b15b ba0d0000c0      mov     edx,0C000000Dh

       88 00000000`7813b160 488bc8          mov     rcx,rax

       88 00000000`7813b163 ff1557200900    call    qword ptr [msvcr80!_imp_TerminateProcess(00000000`781cd1c0)]

     

    Now I was able to cobble together a fairly accurate stack without a dump file. At this point I could tell ThirdPartyServiceMonitor.dll was passing bad parameters to msvcr80!wcscpy_s. However, this didn’t explain how FriendlyService (mentioned in the blog introduction) was triggering the issue. I needed to go deeper with a live debug by leveraging the stack information I devised.

     

    msvcr80!_imp_TerminateProcess

    msvcr80!_invalid_parameter

    msvcr80!wcscpy_s

    ThirdPartyServiceMonitor+<offset>

     

    On the customer’s server I attached Windbg to the Services.exe process and set a breakpoint on msvcr80!_invalid_parameter.

     

    0:001> bp msvcr80!_invalid_parameter

     

    Then I had the customer reproduce the issue by changing the startup type on FriendlyService in the Services MMC. As I mentioned above, this was one way to trigger the issue. BOOM! My breakpoint hit and looked exactly like the conceptual stack I pieced together. Now I was able to determine what ThirdPartyServiceMonitor was passing to msvcr80!wcscpy_s while broken in with the debugger.

     

    0:035> k

    Child-SP          RetAddr           Call Site

    00000000`105deb00 00000000`7814df67 msvcr80!_invalid_parameter+0xe3

    00000000`105df0c0 00000000`00e53045 msvcr80!wcscpy_s+0x97

    00000000`105df100 00000000`00e5947e ThirdPartyServiceMonitor+0x3045

    00000000`105df130 00000000`00e58405 ThirdPartyServiceMonitor+0x947e

    00000000`105df180 000007ff`7fd69c75 ThirdPartyServiceMonitor+0x8405

    00000000`105df1b0 000007ff`7fe9ccc9 rpcrt4!Invoke+0x65

    00000000`105df200 000007ff`7fe9d58d rpcrt4!NdrStubCall2+0x54d

     

    I reviewed the assembly of ThirdPartyServiceMonitor at the point in which it calls msvcr80!wcscpy_s. I discovered that the vendor hardcoded the size of the destination string buffer with 200 hex (512 decimal) while the size of the source string buffer was greater than 512 decimal.

     

    0:035> ub 00000000`00e53045

    ThirdPartyServiceMonitor+0x3020:

    00000000`00e53020 898324040000    mov     dword ptr [rbx+424h],eax

    00000000`00e53026 488b4738        mov     rax,qword ptr [rdi+38h]

    00000000`00e5302a 4c8b4010        mov     r8,qword ptr [rax+10h] <<<< Source Buffer

    00000000`00e5302e 4d85c0          test    r8,r8

    00000000`00e53031 7412            je      ThirdPartyServiceMonitor+0x3045 (00000000`00e53045)

    00000000`00e53033 488d8b28040000  lea     rcx,[rbx+428h] <<<< Destination Buffer

    00000000`00e5303a ba00020000      mov     edx,200h <<<<< Buffer Size hardcoded to 200 hex

    00000000`00e5303f ff1553b40100    call    qword ptr [ThirdPartyServiceMonitor!PlugControl+0xff38 (00000000`00e6e498)] <<<< call to msvcr80!wcscpy_s

    00000000`00e53045 eb16            jmp     ThirdPartyServiceMonitor +0x305d (00000000`00e5305d) <<<< return from msvcr80!wcscpy_s

     

    0:035> .formats 200

    Evaluate expression:

      Hex:     00000000`00000200

      Decimal: 512

     

    Here is the definition of wcscpy_s from MSDN

    errno_t wcscpy_s(

       wchar_t *strDestination,         // Location of destination string buffer

       size_t numberOfElements,         // Size of the destination string buffer.

       const wchar_t *strSource         // Null-terminated source string buffer.

    );

     

    Here are the parameters passed to msvcr80!wcscpy_s. As a reminder, the x64 compiler uses rcx to pass the first parameter, rdx for the second, and r8 for the third. In this case the buffer size was the second parameter in rdx and the source buffer was in r8, the third parameter.

     

    0:035> dq 00000000`00e6e498 l1

    00000000`00e6e498  00000000`7814ded0

     

    0:035> ln 00000000`7814ded0

    (00000000`7814ded0)   msvcr80!wcscpy_s   |  (00000000`7814df80)   msvcr80!wcsncpy_s

    Exact matches:

        msvcr80!wcscpy_s

     

    Dumping the source string in r8 showed the string was clearly longer than 512 characters.

     

    0:035> dc 00000000`0103b438

    00000000`0103b438  003a0000 0049005c 0074006e 00670065  ..:.\.F.r.e.i.n.

    00000000`0103b448  00610072 00690074 006e006f 00670041  d.l.y.S.e.r.v.i.

    00000000`0103b458  006e0065 005c0074 0069006c 005c0062  c.e.\.l.i.b.\..

    .

    .

    0:035> dc

    00000000`0103b738  00650070 002e0072 00700061 002e0070  p.e.r...a.p.p...

    00000000`0103b748  00610070 00610072 0065006d 00650074  p.a.r.a.m.e.t.e.

    00000000`0103b758  002e0072 003d0031 00770020 00610072  r...1.=. .w.r.a.

    00000000`0103b768  00700070 00720065 0061002e 00700070  p.p.e.r...a.p.p.

    00000000`0103b778  0070002e 00720061 006d0061 00740065  ..p.a.r.a.m.e.t.

    00000000`0103b788  00720065 0032002e 0020003d 00720077  e.r...2.=. .w.r.

    00000000`0103b798  00700061 00650070 002e0072 00700061  a.p.p.e.r...a.p.

    00000000`0103b7a8  002e0070 00610070 00610072 0065006d  p...p.a.r.a.m.e.

    0:035> dc

    00000000`0103b7b8  00650074 002e0072 003d0033 00770020  t.e.r...3.=. .w.

    00000000`0103b7c8  00610072 00700070 00720065 0061002e  r.a.p.p.e.r...a.

    00000000`0103b7d8  00700070 0070002e 00720061 006d0061  p.p...p.a.r.a.m.

    00000000`0103b7e8  00740065 00720065 0034002e 0020003d  e.t.e.r...4.=. .

    00000000`0103b7f8  00720077 00700061 00650070 002e0072  w.r.a.p.p.e.r...

    00000000`0103b808  00700061 002e0070 00610070 00610072  a.p.p...p.a.r.a.

    00000000`0103b818  0065006d 00650074 002e0072 003d0035  m.e.t.e.r...5.=.

    00000000`0103b828  00770020 00610072 00700070 00720065  .w.r.a.p.p.e.r.

     

    It was not hard to figure out this string was the imagefile path located at HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\FriendlyService. You can see from the string that service takes parameters which increased the size of the image path beyond 512 characters.

     

    FriendlyService\lib\friendly.3\sbin\friendlywindows-x86-32.exe-s E:\ FriendlyService\conf\wrapper.conf set.DATA_APP=apia "set.DATA_APP_LONG= FriendlyServiceVendor" set.DATA_EXE=E:\ FriendlyService\bin\..\lib\friendly.3\bin set.DATA_HOME=E:\ FriendlyService\bin\..\lib\friendly.3 set.INSTALL_DIR=E:\ FriendlyService\bin\.. wrapper.working.dir=E:\ FriendlyService\bin\.. wrapper.app.parameter.1= wrapper.app.parameter.2= wrapper.app.parameter.3= wrapper.app.parameter.4= wrapper.app.parameter.5= wrapper.app.parameter.6= wrapper.app.parameter.7= wrapper.app.parameter.8= wrapper.app.parameter.9=

     

    The ThirdPartyServiceMonitor hooks any changes made to a service (e.g. changing the Startup Type, changing the dependencies, stopping the service etc...). After hooking the change, ThirdPartyServiceMonitor.dll performs a string copy of the service’s image file path. In most cases this works like a champ, however in this instance, the FriendlyService image path from vendor 1 is really long and the ThirdPartyServiceMonitor from vendor 2 doesn’t account for service image paths exceeding 512 characters. This is the perfect storm! ThirdPartyServiceMonitor needs to remove the hardcoded buffer size.

     

    This example occurred on Windows Server 2003.  Starting in Windows 7 and Windows Server 2008 R2, Windows has added functionality for catching processes that exit ‘organically’ through a TerminateProcess call.  You can find information on this added functionality on MSDN.

Page 1 of 1 (2 items)