This button doesn’t do anything!

This button doesn’t do anything!

  • Comments 8

 

Hello - Matthew here again.  Today I'll be discussing in detail hang scenario #1 that Tate first mentioned a few blogs posts ago.  From a debugging perspective, in an ideal world an application would always provide some kind of feedback when a failure occurs.  The reality is that sometimes an application just doesn't do anything in response to a user-initiated action (such as a mouse-click) that results in a failure of some kind.  When this happens, how can you determine what is going wrong?

 

In general, Process Monitor from Sysinternals is a great tool for gathering clues about this kind of failure.  If the failure is due to a file system or registry problem, Process Explorer will catch the problematic event.  Or if you suspect that the problem is due to something failing over the network, a network trace could help.

 

Let's assume that the techniques described above do not move us any closer to understanding the failure.  What next?  If you don't know what the app does in the background, and you don't have source code for the app, how do you proceed?  Let's answer those questions by debugging a sample application...

 

The Nothing button

 

Download the sample application here.

 

 

Here's what we know:

1. Clicking the “Button 1” button is supposed to display a useful dialog box

2. For certain users, clicking “Button 1” results in no observable response

3. You don't have access to the source code or debug symbols for the app

4. No one seems to know what the “Button 1” button does behind the scenes

5. The developer of the application hasn't been heard from in 3 years

 

We need to understand what exactly happens when the user clicks the “Button 1” button.  Every window has a WindowProc function that receives messages upon user input.  Since a button is considered a "control", the WindowProc for the main application window will get a WM_COMMAND message indicating that the button was clicked.  To differentiate which specific control sent the message, each WM_COMMAND message has a control identifier associated with it.

 

So we need to:

 

a)     Find the control identifier for “Button 1”

b)    Find the WindowProc for the main application window

c)     Determine where WM_COMMAND messages for the “Button 1” button are handled

d)    Figure out what is failing in that code

 

Let’s get started...

 

 

 

Find the control identifier for “Button 1”      

 

Spy++ (available as part of Visual Studio) is the tool for this job.  Click Spy -> Log Messages.  Use the Finder Tool to select the main window of ntdbghang1.exe.  On the messages tab, click “Clear All” then select “WM_COMMAND”.  Click “OK” and SPY++ will start logging.  Click “Button 1” in ntdbghang1.exe and you should see one line of text appear in the output window.  For good measure, click “Button 2” also, and you should see a second line of text.  At this point, your SPY++ window should look something like this....

 

Spy Windows Msg

 

We can tell from this output, that the control identifier for “Button 1” is 257 (0x101), and the control identifier for “Button 2” is 258 (0x102).  We’ll need this information later.

 

 

 

Find the WindowProc for the main application window

 

We can find the WindowProc address by using SPY++ again.  Click Spy -> Find Window and use the Finder Tool to select the main window of ntdbghang1.exe.  Make sure “Show properties” is selected, and then click “OK”.  You will get a dialog that looks like this....

 

Spy Window Properties

 

Note the value of Window Proc.  This is the address of the window procedure for the main window (the exact value may differ on your system).   We know that this function is a WindowProc, which means it is of the form...

 

LRESULT CALLBACK WindowProc(         

    HWND hwnd,

    UINT uMsg,

    WPARAM wParam,

    LPARAM lParam

);

 

The meaning of wParam and lParam differ depending on what uMsg is being processed.  When a WM_COMMAND message is passed, the low-word of wParam is the control identifier, which we determined to be 0x101.  The high-word of wParam is the control notification code, which in the case of a button being clicked is BN_CLICKED (literally 0).

 

So, we specifically are interested in the case where:

           

            uMsg = WM_COMMAND (literally 0x111)

            wParam = 0x101

 

While looking at assembly language for the WindowProc, the stack frame will look like this...

 

ebp    =  “old ebp”

ebp+4  =  “return address”

ebp+8  =  hwnd

ebp+c  =  uMsg

ebp+10 =  wParam

ebp+14 =  lParam

 

 

 

Determine where WM_COMMAND messages for the “Button 1” button are handled

 

Once you have the address of the WindowProc, we can examine the assembly code for that function using Windbg.  Launch windbg.exe and select “File -> Attach to a Process”.  Select “ntdbghang1.exe” from the list and click “OK”.  When the debugger breaks in, you can unassemble the beginning of the WindowProc by typing “u <address>”.  Based on the address on my system, that command will be “u 01002830”.  To unassemble more just type “u” again.  I’m now going to unassemble the relevant code and explain what it means along the way...

 

 

0:001> u 01002830

ntdbghang1+0x2830:

 

The first 3 instructions are a function prologue, just getting everything set up.

01002830 8bff            mov     edi,edi

01002832 55              push    ebp

01002833 8bec            mov     ebp,esp

 

Here is the code that moves the value of uMsg into ecx

01002835 8b4d0c          mov     ecx,dword ptr [ebp+0Ch]

 

Now the app enters the assembly equivalent of a switch statement.

We are interested in the check for uMsg = WM_COMMAND

 

if uMsg = WM_CREATE (1) goto 01002893

01002838 49              dec     ecx

01002839 7458            je      ntdbghang1+0x2893 (01002893)

if uMsg = WM_ DESTROY (2) goto 01002889

0100283b 49              dec     ecx

0100283c 744b            je      ntdbghang1+0x2889 (01002889)

if uMsg = WM_CLOSE (0x10) goto 01002889

0100283e 83e90e          sub     ecx,0Eh

01002841 743b            je      ntdbghang1+0x287e (0100287e)

if uMsg = WM_COMMAND (0x111) goto 01002853

01002843 b801010000      mov     eax,101h

01002848 2bc8            sub     ecx,eax

0100284a 7407            je      ntdbghang1+0x2853 (01002853)

 

From the above switch statement we see that when uMsg = WM_COMMAND, execution moves to 01002853.  So let’s go there...

 

0:001> u 01002853

ntdbghang1+0x2853:

 

Move the value of wParam into edx

01002853 8b5510          mov     edx,dword ptr [ebp+10h]

If LOWORD(wParam) == 0x101 (control ID for button 1) goto 0100286f

01002856 0fb7ca          movzx   ecx,dx

01002859 2bc8            sub     ecx,eax

0100285b 7412            je      ntdbghang1+0x286f (0100286f)

 

From the above assembly we can see that right away we are checking for the control ID for button 1.  This is the code path we want to follow....

 

0:001> u 0100286f

ntdbghang1+0x286f:

 

If HIWORD(wparam) != BN_CLICKED (0) then goto 0100289b

Otherwise, call the function at 010027f6

0100286f c1ea10          shr     edx,10h

01002872 6685d2          test    dx,dx

01002875 7524            jne     ntdbghang1+0x289b (0100289b)

01002877 e87affffff      call    ntdbghang1+0x27f6 (010027f6)

 

Now it is clear that when the BN_CLICKED occurs for control id 0x101, we are going to execute the function at address 010027f6. 

 

 

 

Figure out what is failing

 

So now we know what code is going to run every time the button is clicked (the function at 010027f6).  Let’s examine that code and figure out what may be failing.  We’ll use “uf” to unassemble the entire function, as it is rather short.  During this first pass we’ll make up names for our local variables and functions that are called, and then we’ll revisit and figure out what they are.

 

 

0:001> uf 010027f6

ntdbghang1+0x27f6:

 

Function Prologue

010027f6 8bff            mov     edi,edi

010027f8 55              push    ebp

010027f9 8bec            mov     ebp,esp

 

010027fb 51              push    ecx

010027fc 56              push    esi

 

call func1(0x20) [func1 address at 01002dca]

set localvar1=0x10

010027fd 6a20            push    20h

010027ff c745fc10000000  mov     dword ptr [ebp-4],10h

01002806 e8bf050000      call    ntdbghang1+0x2dca (01002dca)

 

Store the result of func1 in esi (call it localvar2)

0100280b 8bf0            mov     esi,eax

 

Call func2(localvar2, &localvar1) [func2 address at 0100b484]

0100280d 59              pop     ecx

0100280e 8d45fc          lea     eax,[ebp-4]

01002811 50              push    eax

01002812 56              push    esi

01002813 e86c8c0000      call    ntdbghang1+0xb484 (0100b484)

 

If func2 returns 0, then goto 01002821

01002818 85c0            test    eax,eax

0100281a 7405            je      ntdbghang1+0x2821 (01002821)

 

Call func3() [func3 address at 0100278d]

ntdbghang1+0x281c:

0100281c e86cffffff      call    ntdbghang1+0x278d (0100278d)

 

Call func4(localvar2) [func4 address at 01002ce3]

ntdbghang1+0x2821:

01002821 56              push    esi

01002822 e8bc040000      call    ntdbghang1+0x2ce3 (01002ce3)

 

Clean up and exit the function

01002827 59              pop     ecx

01002828 5e              pop     esi

01002829 c9              leave

0100282a c3              ret

 

 

So let’s translate this to some pseudo-code...

 

localvar1 = 0x10;

localvar2 = func1(0x20);

 

if(func2(localvar2, &localvar1)

{

       func3();

 

func4(localvar2);

 

So the behavior of the function will differ, depending on the results of func2.  Let’s see if we can figure out what func2 does...

 

0:001> u 0100b484

ntdbghang1+0xb484:

0100b484 ff2568110001    jmp     dword ptr [ntdbghang1+0x1168 (01001168)]

0100b48a cc              int     3

0100b48b cc              int     3

0100b48c cc              int     3

 

0:001> dps 01001168 L1

01001168  70b88cb1 WINSPOOL!GetDefaultPrinterW

 

 

GetDefaultPrinterW is a public API function.  Here’s the function prototype from MSDN...

 

BOOL GetDefaultPrinter(

  LPTSTR pszBuffer,   // printer name buffer

  LPDWORD pcchBuffer  // size of name buffer

);

 

This fits our pseudo-code, since the function takes two parameters and returns a BOOL.  Let’s update our pseudo code to match what we know, replacing func2 with GetDefaultPrinter, etc...

 

DWORD cchBuffer = 0x10;

LPWSTR pszBuffer = func1(0x20);

 

if(GetDefaultPrinterW(pszBuffer,  &cchBuffer))

{

       func3();

}

 

func4(pszBuffer);

 

Based on the above, we can make a couple of assumptions.  Func1 is likely an allocator function of some kind (such as malloc) and func4 is likely a memory free function (such as free). Since the app is using the Unicode version of GetDefaultPrinter (it ends with a W) it make sense that func1 is an allocator asking for 0x20 bytes, and the size of the buffer, in characters, passed to GetDefaultPrinterW is 0x10.  So that accounts for all of the functions, except for func3.  We could unassemble func3 to see what it does, but that may not be needed.  From the pseudo-code we’ve already created, we can see that a failure in GetDefaultPrinterW will prevent the button click function from doing anything except allocating and freeing some memory.  So GetDefaultPrinterW is a likely potential point of failure.

 

Let’s test this theory out.  We’ll set a breakpoint on GetDefaultPrinterW and see if it is failing upon the button click.

 

0:001> bp WINSPOOL!GetDefaultPrinterW

 

0:001> g

 

<Now click Button 1>

 

Breakpoint 0 hit

eax=0006fb84 ebx=00000000 ecx=00000020 edx=00dc0e98 esi=00dc0e70 edi=0006fc0c

eip=70b88cb1 esp=0006fb74 ebp=0006fb88 iopl=0         nv up ei pl nz na po nc

cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000202

WINSPOOL!GetDefaultPrinterW:

70b88cb1 8bff            mov     edi,edi

 

// Dump out the call stack...

0:000> kb

ChildEBP RetAddr  Args to Child             

0006fb70 01002818 00dc0e70 0006fb84 00000111 WINSPOOL!GetDefaultPrinterW

WARNING: Stack unwind information not available. Following frames may be wrong.

0006fb88 0100287c 0006fbbc 75d41a10 00320f78 ntdbghang1+0x2818

0006fb90 75d41a10 00320f78 00000111 00000101 ntdbghang1+0x287c

0006fbbc 75d41ae8 01002830 00320f78 00000111 USER32!InternalCallWinProc+0x23

0006fc34 75d4286a 00000000 01002830 00320f78 USER32!UserCallWinProcCheckWow+0x14b

0006fc74 75d42bba 00a90b80 0095ee08 00000101 USER32!SendMessageWorker+0x4b7

 

// Go to the return address.  Note that eax=0, meaning GetDefaultPrinterW returned FALSE.

0:000> gu

eax=00000000 ebx=00000000 ecx=76f22033 edx=00e10178 esi=00dc0e70 edi=0006fc0c

eip=01002818 esp=0006fb80 ebp=0006fb88 iopl=0         nv up ei pl zr na pe nc

cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246

ntdbghang1+0x2818:

01002818 85c0            test    eax,eax

 

// Check the last error...

0:000> !gle

LastErrorValue: (Win32) 0x7a (122) - The data area passed to a system call is too small.

 

       // Resume execution

0:000> g

 

The error returned was 0x7a = ERROR_INSUFFICIENT_BUFFER.  It looks like the buffer of 0x20 bytes being passed to GetDefaultPrinterW is too small to hold the name of the default printer for that user.  This explains why the app works for some users and fails for others.  A quick change of the default printer name to something less than 0x10 characters (16 decimal) will work around this issue.

 

To wrap things up, here is the C source for the Button1_OnClick function (at address 010027f6 in the assembly) so you can see how this was actually written...

 

VOID Button1_OnClick()

{

      DWORD cch = 16;

      LPTSTR pPrinterName;

 

      pPrinterName = (LPTSTR) malloc(16 * sizeof(TCHAR));

 

      if(GetDefaultPrinter(pPrinterName, &cch))

      {

            DisplayGoButtonMessage();

      }

 

      free(pPrinterName);

 

      return;

}

 

Look for an upcoming blog post that shows another way this problem could have been approached – using the “wt” command in the debugger.  I hope you found this useful and can apply to real-world debugging scenarios.  Please post any questions / comments!

Leave a Comment
  • Please add 8 and 7 and type the answer here:
  • Post
  • Minor nit pick, for WM_COMMAND, ctlid is in the LOWORD of wp

  • ac - Good catch!  I've corrected it.

  • Very interesting. I have used spy in the past to figure out a lot of things and have experienced moments of epiphany. With your post you take it down to another level. My question is that of late (last 4 years), I have only been dealing with managed code and I missed the native world. My assembly is pretty rusty. Any pointers to where I can refresh and rebuild my knowledge of assembly code specific to VC++ debugging. I don't want to learn assembly language but just enough to recognize the patterns in it to skim thru and understand.

    Thanks.

  • A lot of our upcoming posts will include assembly annotation as part of figuring out what is going on at the lowest level.  This is a great way to learn,  watch the assembly tag in blog, if we do assembly work we will at this tag to the post.  Matt’s latest blog breaks assembly down as does the last two “Hung Window?, No source?, No problem!!!”.  I recommend download and walking through the labs in the debugger.  There will be more of these types of post coming so stay tuned.  

    Thanks Jeff

  • I didn't know you could get the window proc address from Spy++, thanks!

  • &quot;이 문서는 http://blogs.msdn.com/ntdebugging blog 의 번역이며 원래의 자료가 통보 없이 변경될 수 있습니다. 이 자료는 법률적 보증이 없으며

  • It is a interesting read and I learn something new about assembly.  But I have a question about test eax,eax and je

    01002818 85c0    test    eax,eax

    0100281a 7405    je      ntdbghang1+0x2821 (01002821)

    Does je jump when the eax is not zero?  Should it be jz?   I am still newbie in assembly, please correct me if I am wrong

  • &quot;이 문서는 http://blogs.msdn.com/ntdebugging blog 의 번역이며 원래의 자료가 통보 없이 변경될 수 있습니다. 이 자료는 법률적 보증이 없으며

Page 1 of 1 (8 items)