This guest post is by Tarik Soulami, a Principal Development Lead at Microsoft and author of Inside Windows Debugging: A Practical Guide to Debugging and Tracing Strategies in Windows, which is getting very positive reviews on Amazon. Enjoy!
Tarik here. When you think of scripts, you don’t usually think of threads, synchronization or deadlocks. However, that was precisely the curious bug I had to investigate a few weeks ago. The script I was trying to write had a simple goal: parse the standard and error outputs from another existing command-line tool (a console application), and write the processed (modified) text to the command prompt.
// // ConsoleApp.exe // int __cdecl wmain() { for (int n = 0; n < 500; n++) { fprintf(stdout, "%s\n", "This is an output sent to stdout."); fprintf(stderr, "%s\n", "This is an output sent to stderr."); } }
To help isolate the problem further, I went ahead and removed all the parsing logic that my Visual Basic script contained. Even with this simplified configuration where the output was redirected as-is, I continued to get the hang behavior, as shown in the following output:
C:\ScriptHang>cscript test.vbs consoleapp.exe TRACE: --- Output begin --- This is an output sent to stdout. This is an output sent to stdout. This is an output sent to stdout. ... This is an outpu^C
Here is the simplified script that I was using for this test experiment. It sure looked innocent enough!
' ' Test.vbs: captures and redirects output of another command-line application ' Option Explicit Dim result If WScript.Arguments.Count <> 1 Then WScript.StdOut.WriteLine("USAGE: cscript.exe test.vbs <ConsoleAppPath>") WScript.Quit(-1) End If WScript.StdOut.WriteLine("TRACE: --- Output begin ---") result = Run(WScript.Arguments.Item(0)) WScript.StdOut.WriteLine("TRACE: ---- Output end ----") WScript.StdOut.WriteLine("TRACE: Exit code: " & result) WScript.Quit(result) ' Execute the console program and capture its stdout/stderr output Function Run (cmdLine) Dim shellObj, execObj, status Set shellObj = CreateObject("WScript.Shell") Set execObj = shellObj.Exec(cmdLine) If execObj.ProcessID = 0 And execObj.Status = 1 Then ' Error launching the executable. Run = -1 Exit Function End If Do ' Process stdout. Do Until execObj.StdOut.AtEndOfStream WScript.StdOut.Write(execObj.StdOut.Read(1)) Loop ' Process stderr. Do Until execObj.StdErr.AtEndOfStream WScript.StdErr.Write(execObj.StdErr.Read(1)) Loop Loop Until (execObj.Status <> 0 And execObj.StdOut.AtEndOfStream And execObj.StdErr.AtEndOfStream) ' Return the exit code Run = execObj.ExitCode End Function
I must admit that I was both surprised and stumped at first. Why would a fairly simple script like this one hang? My immediate reaction was to attach the WinDbg user-mode debugger to the script host process (cscript.exe) and see what it was waiting on. Unfortunately, that wasn’t terribly useful. All I could see was a call to ReadFile, presumably related to the script’s call in progress to read from the console application’s stdout stream (as evidenced by the truncated message coming out of the script when it stopped printing to the console). However, there was no telling what was preventing this call from completing:
windbg.exe -pn cscript.exe 0:004> .symfix 0:004> .reload 0:004> ~*k ChildEBP RetAddr 0029e4f4 775262c4 ntdll!KiFastSystemCallRet 0029e4f8 757de43e ntdll!ZwReadFile+0xc 0029e55c 76ee9bba KERNELBASE!ReadFile+0x118 0029e5a4 73643804 kernel32!ReadFileImplementation+0xf0 0029e5d0 73643890 scrrun!CFileStream::Read+0x3d 0029e678 7364394a scrrun!CTextStream::ReadBlock+0x86 0029e684 73643b85 scrrun!CTextStream::GetCharCore+0x15 0029e698 73643b36 scrrun!CTextStream::PeekChar0+0x2f 0029e6b0 76f93e75 scrrun!CTextStream::get_AtEndOfStream+0x60 0029e6cc 76f93cef OLEAUT32!DispCallFunc+0x165 ...
Attaching WinDbg to the consoleapp.exe test tool showed a similar call stack, except the main thread in the process was stuck inside a call to the WriteFile Win32 API, which was in turn invoked by one of the fprintf calls in the C++ test program:
windbg.exe -pn consoleapp.exe 0:004> .symfix 0:004> .reload 0:004> ~*k ChildEBP RetAddr 0015efbc 775a6a74 ntdll!KiFastSystemCallRet 0015efc0 757775d4 ntdll!ZwWriteFile+0xc 0015f024 770a543c KERNELBASE!WriteFile+0x113 0015f040 76064236 kernel32!WriteFileImplementation+0x76 0015f5fc 760640eb msvcrt!_write_nolock+0x3fb 0015f640 760645f9 msvcrt!_write+0x9f 0015f660 7605c835 msvcrt!_flsbuf+0xc0 0015f66c 7605d0ef msvcrt!_get_printf_count_output+0x41 0015f67c 7605d0ba msvcrt!_get_printf_count_output+0xa8 0015f90c 76063ef9 msvcrt!_output_l+0xb57 0015f954 000c121c msvcrt!fprintf+0x123 0015f97c 000c1390 consoleapp!wmain+0x37 [c:\scripthang\main.cpp @ 12] ...
At this point, I knew I had to use a kernel debugger so that I could peek at the kernel side of these two threads and see precisely what file objects they were waiting on. Because I didn’t have a second machine to use as a host kernel mode debugger, I went for the next best thing: local (live) kernel debugging.
After enabling kernel debugging on the machine (and rebooting), I re-ran the scenario once again and used local kernel debugging to examine the previous two call stacks. I started with the call in the script host process:
kd.exe -kl lkd> .symfix lkd> .reload lkd> !process 0 0 cscript.exe PROCESS 89c21030 SessionId: 1 Cid: 0510 Peb: 7ffd7000 ParentCid: 0b34 Image: cscript.exe lkd> .process /r /p 89c21030 lkd> !process 89c21030 7 PROCESS 89c21030 SessionId: 1 Cid: 0510 Peb: 7ffd7000 ParentCid: 0b34 Image: cscript.exe THREAD 8a9aad48 Cid 0510.08ac Teb: 7ffdf000 Win32Thread: fe8a2dd8 WAIT: (Executive) 89bdbfdc NotificationEvent ChildEBP RetAddr Args to Child b48cab70 82c8a65d 84fa1888 807c9308 807c6120 nt!KiSwapContext+0x26 b48caba8 82c894a7 84fa1948 84fa1888 866676bc nt!KiSwapThread+0x266 b48cabd0 82c830cf 84fa1888 84fa1948 00000000 nt!KiCommitThreadWait+0x1df b48cac48 82e36b81 866676bc 00000000 b48cad01 nt!KeWaitForSingleObject+0x393 b48cac7c 82e6f4e8 00000103 86667660 00000001 nt!IopSynchronousServiceTail+0x270 b48cad08 82c4a27a 86731e30 a4419508 00000000 nt!NtReadFile+0x644 b48cad08 775a7094 86731e30 a4419508 00000000 nt!KiFastCallEntry+0x12a 0016e834 775a62c4 7577e43e 0000016c 00000000 ntdll!KiFastSystemCallRet 0016e838 7577e43e 0000016c 00000000 00000000 ntdll!ZwReadFile+0xc 0016e89c 77099bba 0000016c 0016e934 00000080 KERNELBASE!ReadFile+0x118 ...
In the previous listing, the handle to the file object being read from is 0x16c (the first argument to the ReadFile API call), and it is a named pipe:
lkd> !handle 0x16c PROCESS 885cc780 SessionId: 1 Cid: 1dd4 Peb: 7ffd5000 ParentCid: 1c88 Image: cscript.exe 016c: Object: 86667660 GrantedAccess: 00120189 Entry: 8ed9f2d8 Object: 86667660 Type: (84eab9b8) File ObjectHeader: 86667648 (new version) HandleCount: 1 PointerCount: 3 lkd> !fileobj 86667660 Device Object: 0x89abded8 \FileSystem\Npfs Flags: 0x40082 Synchronous IO Named Pipe Handle Created File Object is currently busy and has 0 waiters. ...
It turns out the thread is waiting on the internal event of a named pipe file object to get signaled after the I/O read request was posted. This can also be verified by comparing the event object stored in the named pipe object structure with the notification event encountered earlier in the thread’s call stack:
lkd> dt nt!_FILE_OBJECT ... +0x05c Event : _KEVENT... lkd> dd 86667660+0x5c Evaluate expression: -1984053284 = 89bdbfdc
Because the named pipe is currently empty, the thread blocks and waits for new bytes to be written to the named pipe stream.
Note The blocking behavior in the named pipes case (NPFS file system) isn’t really surprising. However, you should also remember that even in the case of NTFS file handles, synchronous I/O read or write requests are handled similarly and involve a two-phase process. First, the request is queued up and the thread blocks, and then when the I/O is actually completed (hopefully very shortly thereafter in the case of NTFS), the internal event handle of the file object is signaled to unblock the requesting thread, which then completes the call and returns to user mode.
The question now is why consoleapp.exe isn’t writing any new data to its stdout stream (keeping the script waiting for new input). For that, let’s look at the state of its main thread (which we had already seen was blocked in a WriteFile call). It’s easy again to see that we have a very similar situation, where WriteFile is “pended” but cannot complete because the named pipe became full.
lkd> !process 0 0 consoleapp.exe PROCESS 885cc780 SessionId: 1 Cid: 1e8c Peb: 7ffd9000 ParentCid: 2260 Image: consoleapp.exe lkd> .process /r /p 885cc780 lkd> !process 885cc780 7 ... ChildEBP RetAddr Args to Child 98758b60 82c8a65d 858edd08 807c9308 807c6120 nt!KiSwapContext+0x26 98758b98 82c894a7 858eddc8 858edd08 85908184 nt!KiSwapThread+0x266 98758bc0 82c830cf 858edd08 858eddc8 00000000 nt!KiCommitThreadWait+0x1df 98758c38 82e36b81 85908184 00000000 885cc701 nt!KeWaitForSingleObject+0x393 98758c6c 82e7c840 00000103 85908128 00000001 nt!IopSynchronousServiceTail+0x270 98758d08 82c4a27a 86731e30 00000000 00000000 nt!NtWriteFile+0x6e8 98758d08 775a7094 86731e30 00000000 00000000 nt!KiFastCallEntry+0x12a 0021ef5c 775a6a74 757775d4 00000168 00000000 ntdll!KiFastSystemCallRet 0021ef60 757775d4 00000168 00000000 00000000 ntdll!ZwWriteFile+0xc ... lkd> !handle 00000168 ... 0168: Object: 85908128 GrantedAccess: 00120196 (Inherit) Entry: a5d392d0 Object: 85908128 Type: (84eab9b8) File ObjectHeader: 85908110 (new version) HandleCount: 1 PointerCount: 3 lkd> !fileobj 85908128 Device Object: 0x86731e30 \FileSystem\Npfs Flags: 0x40082 Synchronous IO Named Pipe Handle Created File Object is currently busy and has 0 waiters. ...
Everything starts to make sense now. It appears that the stderr pipe of the console process is full (this happens when the internal pipe buffer fills up with unprocessed data), and so the test tool isn’t able to write any more characters into the pipe until a client has pumped data off of it. However, cscript.exe hasn’t gotten around to reading from stderr just yet (the second loop in test.vbs). At the same time, cscript.exe has read all the characters from the stdout stream of the tool and is waiting on the test tool to emit more data into the empty pipe. This results in a deadlock!
Clearly, the console application tool isn’t at fault here. It is simply writing messages to its stdout and stderr streams. The problem lies within the script stream object implementation. The AtEndOfStream property was unfortunately designed as a blocking call when it could arguably have been designed to first peek the pipe (using for instance PeekNamedPipe) to see if there was any data in the pipe’s buffer before reading those available bytes. This issue has in fact been documented in a Microsoft KB article; see http://support.microsoft.com/kb/960246.
If you were writing your own client program (C++ or C#, for example) to parse the output of the console program, you could easily do just that (peek the pipe first before reading). But when using VBScript or JScript, you are unfortunately stuck with the built-in implementation of the scripting engine stream object (unless you want to implement and deploy your own COM scripting object to all the client machines).
A simple approach that works around this problem in the scripting engine’s stream object is to redirect the data from the stdout and stderr streams of the console application to temporary files on the file system. Once the files are written to disk, the test script could then parse and modify the data from those files as needed. This solution essentially buffers the streams manually instead of relying on the scripting object to redirect the standard and error output streams from the child process on the fly.
C:\ScriptHang>cscript test.vbs "cmd.exe /c consoleapp.exe 1>stdout.txt 2>stderr.txt" TRACE: --- Output begin --- TRACE: ---- Output end ---- TRACE: Exit code: 0 echo After executing the command, the script can now safely process stdout.txt and stderr.txt ...
This case study demonstrated several important steps in a typical debugging investigation:
1. Isolate a good repro case: When I started noticing the deadlock behavior in the script from this case study, it wasn’t actually occurring every time I ran the script, but it was depending on the specific command output from the console tool that I was passing to the script. Simplifying the console application to continuously write interleaving stderr and stdout messages highlighted the conditions under which the hang occurred, making for a 100% repro rate as I increased the number of iterations in the console application.
2. Pick the debugging environment that you’ll use to examine the scenario: In this case, the script was hung in native code, outside the Visual Basic script code that I had written in Visual Studio. In fact, the hung script was waiting on an event in kernel mode! WinDbg is a great debugger to use when examining native code, and because I also wanted to dig deeper into what was happening on the kernel side of the fence, I ended up using live (local) kernel debugging. Even when debugging high-level code such as a script program, low-level debugging techniques can still sometimes prove very useful.
3. Understand the code paths that you’re dealing with: A debugger leads you to the door step of the failure, but it’s still up to you to unveil the mystery by understanding how the code you’re debugging is supposed to work. In this case study, it was important to understand several system architecture fundamentals such as how named pipes work internally, how I/O requests are processed by the Windows OS, and many more aspects of synchronization. It was also useful to search the Internet for similar issues, which in this case led me right to a known KB article on the MSDN website.
4. Determine possible bug fixes: The final step once you’ve understood the nature of the bug that you’re dealing with is to think about potential solutions. The more you understand about the bug, of course, the easier it is going to be to come up with sound approaches, so this step usually runs in parallel with the previous one.
Great post!