• Ntdebugging Blog

    Windows Internals Beta Exam 71-660

    • 13 Comments

    Correction, Windows Internals Beta Exam 71-660

    This was previously listed incorrectly as 70-660, the actual exam ID is now 71-660

     

    Update 7-18-2008: The link to the exam preparation information has been updated below.

     

    Update 7-21-2008: When registering, there is no need to use a credit card.  Be sure to instead use the following promotional code listed below when registering for the exam: WINT.  If you have already been charged for this exam, please contact Prometric and request a refund, and reference ticket number 3711938.

     

    Hello Windows Debugging Community,  We are super excited to announce a new certification.   With the help of industry partners, including some of our very talented community members,  we have developed a new Windows Internals Certification.   This Certification targets developers and IT professionals that require knowledge of Windows Internals as part of their profession.   Chance are if you are reading this blog post you are part of the target audience.  I encourage you to register and please take the Beta exam.  Please help us to improve the exam by providing feedback during the beta exam.

     

     

    Register for Beta Exam 71-660: TS: Windows® Internals

     

    You are invited to take beta exam 70-660: TS: Windows® Internals. If you pass the beta exam, the exam credit will be added to your transcript and you will not need to take the exam in its released form.

    By participating in beta exams, you have the opportunity to provide the Microsoft Certification program with feedback about exam content, which is integral to development of exams in their released version. We depend on the contributions of experienced IT professionals and developers as we continually improve exam content and maintain the value of Microsoft certifications.

    Please remember that participation in the beta process is completely voluntary and Microsoft makes no promises or guarantees regarding the beta exam process. You can expect to receive your score on the beta exam within 12 weeks of taking the exam, although in some instances, beta exams may take longer to score and your results may be delayed.

    70-660: TS: Windows® Internals counts as credit towards the following certification.

    ·          Microsoft Certified Technology Specialist (MCTS): Windows® Internals  


    Availability

    Registration begins: July 17, 2008

    Beta exam period runs: July 21, 2008August 5, 2008

     

    Receiving this invitation does not guarantee you a seat in the beta; we recommend that you register immediately. Beta exams have limited availability and are operated under a first-come-first-served basis. Once all beta slots are filled, no additional seats will be offered.


    Testing is held at Prometric testing centers worldwide, although this exam may not be available in all countries (see Regional Restrictions).  All testing centers will have the capability to offer this exam in its live version.

     

    Regional Restrictions: India, Pakistan, China


    Registration Information

    You must register at least 24 hours prior to taking the exam.
    Please use the following promotional code when registering for the exam: WINT
    Receiving this invitation does not guarantee you a seat in the beta; we recommend that you register immediately.


    To register in North America, please call:

    Prometric: (800) 755-EXAM (800-755-3926)

     

     

    Outside the U.S./Canada, please contact:

    Prometric: http://www.register.prometric.com/ClientInformation.asp

     

     


    Test Information and Support

    You are invited to take this beta exam at no charge.
    You will be given four hours to complete the beta exam. Please plan accordingly.
    Find exam preparation information: http://blogs.msdn.com/ntdebugging/pages/exam-preparation-information-for-exam-71-660.aspx

     


    Frequently Asked Questions

    For Microsoft Certified Professional (MCP) help and information, you may log in to the MCP Web site at http://www.microsoft.com/learning/mcp/or contact your Regional Service Center: http://www.microsoft.com/learning/support/worldsites.asp.

    What is a beta exam?

    Where can I learn more about the registration process?

    Where can I learn more about the beta exam invitation process?

    Where can I learn more about the new structure of Microsoft Certification?

    Who do I contact for help with this beta exam or other MCP questions?

     

    Thank you.


    Jeff Dailey, Microsoft Global Escalation Services.

  • Ntdebugging Blog

    If Cluster is Running Fine, Don’t Dump Cluadmin on a Hang

    • 3 Comments

    Hello, my name is East, and I’m an Escalation Engineer on the Microsoft Platform Global Escalation Services team.  Here are some tips for those wanting to review dumps from Cluster Administrator (Cluadmin.exe), the GUI version of cluster.exe that allows management of your cluster environment. Historically there were more Cluadmin hangs in Windows 2000 Cluster server than Windows 2003 Cluster server so the volume of these dumps submitted to my group has dropped significantly. People have submitted complete memory dumps thinking the entire Cluster server was hung when it turned out to be a Cluadmin hang. It’s important to determine what is really hung when troubleshooting these types of problems.

     

    So what causes Cluadmin to hang? In most cases a resource is working to come online, or a remote connection is misbehaving which results in the hang.  Cluadmin dumps are not as helpful in revealing why a resource is misbehaving in your Cluster environment, when compared to dumping the cluster service and resrcmon. In most cases the cluster log can help you narrow down a problematic resource that’s holding Cluadmin up. Later I will show how to find the possible resource that caused a Cluster Administrator hang by using the debugger.

     

    To help narrow down the hang you should start by testing that the Cluster environment is healthy.

     

    Let’s say at the time of the hang you were using Cluadmin to view properties of certain resources. At this point it would be useful to examine the status of the cluster by using the Cluster.exe tool. Here are some examples.

     

    Using the cluster.exe command you can check the Groups status with the following command:

     

    C:\>cluster group

    Listing status for all available resource groups:

     

    Group                    Node            Status

    -------------------- --------------- --------------

    Test1                   NODE1            Online

    Cluster Group           JNODE1           Online

     

    You can check the Node status using this command:

     

    C:\>cluster node

    Listing status for all available nodes:

     

    Node           Node ID          Status

    -------------- ------- ---------------------

    NODE1            1                 Up

    NODE2            2                 Down

     

    And you can check the Resource status with the resource command:

     

    C:\>cluster resource

    Listing status for all available resources:

     

    Resource                    Group              Node       Status

    -------------------- -------------------- --------------- ------

    Disk Q:                 Cluster Group         NODE1       Online

    tst1                    Test1                 NODE1       Online

    Cluster IP Address      Cluster Group         NODE1       Online

    Cluster Name            Cluster Group         NODE1       Online

    MSDTC                   Cluster Group         NODE1       Online

     

     

    You can get more granular by using the /priv switch to list out the private properties like this:

     

    C:\>cluster resource /priv

    Listing private properties for all resources:

                    <ouput not shown to save space>

     

    Since hangs can occur during startup of Cluadmin I recommend the following article which discusses different ways to start Cluadmin.

     

    280125  Cluster Administrator Switches for Connecting to a Cluster

    http://support.microsoft.com/default.aspx?scid=kb;EN-US;280125

     

    Now let’s take a look at a Cluadmin dump to illustrate what you would see during a hang. Most dumps we gather from customers are snapped when Cluadmin is starting, resulting in limited information contained in the dump.  Here’s an example of the resource that Cluster was working with when the process hung.  The thread below is making an RPC call to the local cluster service.  This can be the culprit causing Cluadmin to be in the hung state:

     

    0:000> kb

    ChildEBP RetAddr  Args to Child             

    0006eddc 7c59a0a2 00000154 00000001 0006edfc NTDLL!ZwWaitForSingleObject+0xb

    0006ee04 77d7f41d 00000154 000007d0 00000001 KERNEL32!WaitForSingleObjectEx+0x71

    0006ee28 77d5fadf 00000000 0006ee58 0006ee50 rpcrt4!DG_ReceivePacket+0xd3

    0006ee68 77d5f998 0006f0c8 0006f07c 0006ee88 rpcrt4!DG_CCALL::ReceiveSinglePacket+0x5e

    0006ee7c 77d3b96c 0006f07c 0006f274 77d6a686 rpcrt4!DG_CCALL::SendReceive+0xd6

    0006ee88 77d6a686 0006f07c 00000009 7393270e rpcrt4!I_RpcSendReceive+0x2c

    0006ee9c 77d93b64 0006f0c8 000a425c 0006eeac rpcrt4!NdrSendReceive+0x31

    0006f274 73939b6d 73931778 739326c0 0006f28c rpcrt4!NdrClientCall2+0x512

    0006f284 73934bd5 00091ed8 01000059 00000000 CLUSAPI!ApiResourceControl+0x14

    0006f2c4 0102705f 000921e8 00000000 01000059 CLUSAPI!ClusterResourceControl+0xd0

    0006f2f8 01027b00 000921e8 01000059 00000000 CLUADMIN!CClusPropList::ScGetResourceProperties+0x3e

    0006f374 010129ae 00c988a0 00000080 00239770 CLUADMIN!CResource::ReadItem+0xb9

    0006f3b4 0101141d 002397c8 00239770 00000001 CLUADMIN!CClusterDoc::InitResources+0x9a

    0006f3ec 01010f45 00000001 00239770 00000000 CLUADMIN!CClusterDoc::CollectClusterItems+0xd5

    0006f410 01010e11 002352d0 00234960 00239770 CLUADMIN!CClusterDoc::OnOpenDocumentWorker+0x80

    0006f444 76fbaea3 002352d0 00234960 01044c28 CLUADMIN!CClusterDoc::OnOpenDocument+0xa2

    0006f46c 0101039f 002352d0 00000001 01044c28 mfc42u!CMultiDocTemplate::OpenDocumentFile+0xb5

    0006fc88 0100fff6 002352d0 002373a0 00000000 CLUADMIN!CClusterAdminApp::OpenDocumentFile+0xbf

    0006fcc0 01021c9e 00000001 00000000 76fb2089 CLUADMIN!CClusterAdminApp::OnRestoreDesktop+0xc4

    0006fccc 76fb2089 00000001 00000000 002373a0 CLUADMIN!CMainFrame::OnRestoreDesktop+0x17

     

    The first parameter passed to CLUSAPI!ClusterResourceControl in this Windows 2000 dump reveals the resource in question.

     

    Here I’m dumping out the raw data of the parameter which yields the name of the resource, Joseph.

     

    0:000> dc 000921e8+28

    00092210  006F004A 00650073 00680070 00000000  J.o.s.e.p.h.....

    00092220  00030007 000c0100 00091d40 00092780  ........@....'..

    00092230  000923e8 000923e8 00092238 00092238  .#...#..8"..8"..

    00092240  00000000 000805e8 00000000 00000000  ................

    00092250  00020019 00000000 00070005 00080100  ................

    00092260  00000000 00092378 00092b48 000925f0  ....x#..H+...%..

    00092270  00000000 00000000 d3cc9553 00000000  ........S.......

    00092280  00050005 00080100 000922b0 000922d8  ........."..."..

     

    0:000> du 000921e8+28

    00092210  "Joseph"

     

     

    On a Windows 2003 stack I again examined the first parameter to CLUSAPI!ClusterResourceControl...

     

    lpBytesReturned = 0x0006f2a8

     

    ChildEBP RetAddr  Args to Child             

    0006f278 0102bb53 000a03c8 00000000 01000059 CLUSAPI!ClusterResourceControl

    0006f2ac 0102ec09 000a03c8 01000059 00000000 cluadmin!CClusPropList::ScGetResourceProperties+0x78

    0006f328 010142c7 00000000 0026d6b8 01078468 cluadmin!CResource::ReadItem+0xcb

    0006f36c 010161dc 00268508 01078468 00000001 cluadmin!CClusterDoc::InitResources+0xa2

    0006f3a8 01016647 01078468 00000000 01078a48 cluadmin!CClusterDoc::CollectClusterItems+0xe9

    0006f3c8 010167a3 002667b8 00268508 01078468 cluadmin!CClusterDoc::OnOpenDocumentWorker+0x90

    0006f400 7f062793 002667b8 00268508 002667b8 cluadmin!CClusterDoc::OnOpenDocument+0x9c

    0006f428 01010d97 002667b8 00000001 00264da0 MFC42u!CMultiDocTemplate::OpenDocumentFile+0x103

    0006fc50 01010a9b 002667b8 00264da0 00268680 cluadmin!CClusterAdminApp::OpenDocumentFile+0xdc

    0006fc64 7f03babe 00099c48 00264da0 00264da0 cluadmin!CClusterAdminApp::OnClusterConnectionOpened+0x27

    0006fce8 7f038edc 0000040e 00099c48 01006d70 MFC42u!CWnd::OnWndMsg+0x62e

    0006fd10 7f03af27 0000040e 00099c48 00264da0 MFC42u!CWnd::WindowProc+0x2c

    0006fd70 7f03b06e 00268680 00000000 0000040e MFC42u!AfxCallWndProc+0xa7

    0006fd94 7f0e6d8d 000501c6 0000040e 00099c48 MFC42u!AfxWndProc+0x3e

    0006fdc4 7739b6e3 000501c6 0000040e 00099c48 MFC42u!AfxWndProcBase+0x4d

    0006fdf0 7739b874 7f0e6d40 000501c6 0000040e USER32!InternalCallWinProc+0x28

    0006fe68 7739ba92 00000000 7f0e6d40 000501c6 USER32!UserCallWinProcCheckWow+0x151

    0006fed0 7739bad0 01049b24 00000000 0006ff08 USER32!DispatchMessageWorker+0x327

    0006fee0 7f073000 01049b24 01049b24 01049af0 USER32!DispatchMessageW+0xf

    0006fef0 7f072dda 01049af0 01049af0 ffffffff MFC42u!CWinThread::PumpMessage+0x40

     

    0:000> dc 000a03c8+b8

    000a0480  00650052 006f0073 00720075 00650063  R.e.s.o.u.r.c.e.

    000a0490  005c0073 00330031 00350034 00310030  s.\.1.3.4.5.0.1.

    000a04a0  00370030 0031002d 00340065 002d0034  0.7.-.1.e.4.4.-.

    000a04b0  00660034 00310035 0038002d 00370061  4.f.5.1.-.8.a.7.

    000a04c0  002d0030 00610035 00300034 00630034  0.-.5.a.4.0.4.c.

    000a04d0  00350032 00620031 00650036 abab0000  2.5.1.b.6.e.....

    000a04e0  abababab feeeabab 00000000 00000000  ................

    000a04f0  000f0007 001c0731 000a0530 fedcba98  ....1...0.......

     

    0:000> du 000a03c8+b8

    000a0480  "Resources\13450107-1e44-4f51-8a7"

    000a04c0  "0-5a404c251b6e"

     

     

    You will also notice that Windows 2003 shows the resource guid for the resource.

     

    NOTE: You can speculate that the problem resource is the one in the bottom view of the Cluadmin GUI, however using that method can lead to an improper conclusion because the next resource may actually be causing the problem. The resource may have hung the GUI before getting displayed in Cluadmin.

     

    Ok now I know the resource because I dumped it out so what can I do with it now? You can take the resource offline to see if Cluster exhibits an issue bringing that resource off or online.  Additionally, the cluster log may show that this resource is having an issue. Your best bet from here would be to review the cluster log for any additional information about the specific resource or get a Resource Monitor (Resrcmon.exe) dump to investigate any further issues. I also did not show all the threads of the Cluadmin.exe dump since the rest are normally RPC calls and irrelevant to this discussion. If the connection was to another machine we would do a network trace to reveal additional problems.

     

  • Ntdebugging Blog

    NTFS Misreports Free Space?

    • 11 Comments

    I have recently seen a number of issues where customers called in to report a significant difference between the “Size on disk” for the root of a volume, and the reported amount of “Used space” in the volume properties.  While considering this, I noticed that my own C: drive had a mysterious 20GB difference.

     

    Below is the story of how I found the answer.

     

    Before we begin, there are two methods used for calculating disk usage…

     

    Method 1 – Volume Bitmap Analysis

    The % used and %free indication shown below with the pie chart is based on volume bitmap analysis.  The hidden $Bitmap:$Data:”” stream is read from the volume via the FSCTL_GET_VOLUME_BITMAP IO control code, and the ones and zeroes are counted.  A binary “1” represents an allocated cluster, while a “0” indicates a free cluster.  The total number of set bits indicates the total number of allocated clusters on the volume.  This is the fastest and most reliable method of calculating allocated and free space.

     

    Method 2 – File Scanning with FindFirstFileEx  / FindNextFile

    Navigating to the “C:\” folder, we select all files/folders, then right click, and then select properties.  A dialog will appear as the shell enumerates all files that it can find in the root and subfolders.  The “Size” and “Size on disk” fields will eventually display the total size of all primary data streams for all files that are found by Explorer.  In my case, this method only finds 80.25 GB of 102 GB (leaving a 21.75 GB difference).

     

    Why such a big difference?

     

    The short answer is - “If you can’t see it with Explorer, it doesn’t get added to the total size.”

     

    How to find the hidden disk usage…

     

    Step 1:

     

    First, check to see if your files are all actually visible with the current user account and Explorer settings.  Uncheck the “Hide protected operating system files” and select “Show hidden files and folders”.  When this is completed, look at the size on disk again. 

    Does it match the pie chart within a few percent?  If not, continue to Step 2.

     

    Step 2:

    Run CHKDSK and save the output.  The summary information at the end is what we want to see.

    Based on CHKDSK output, we can calculate the total metadata usage by adding up the following…

     

    KB

    GB

    Description

    57036

    .05 GB

    Space used by 22677 indexes.

    0

    0 GB

    Space used by $Badclus file.

    1352680

    1.29 GB

    Space used by $MFT.

    65536

    .06 GB

    Space used by $Loffile.

    1475252

    1.4 GB

     Metadata Total

     

    In this example, metadata usage accounts for only 1.4 GB.  If the metadata number were a high percentage of the usage, then we need to take closer a look at how the volume is being used instead of simply looking for hidden files.  High metadata usage will be the subject for part 2 of this blog.

     

    We can see from the CHKDSK output that the total space occupied by our user files is 106,095,080 KB (101.18 GB).  This is a large percentage of the total usage, so we should be looking at the user files to see why we can’t see these files with Explorer.

     

    Step 3:

    Do you have permissions to see all files/folders the volume? 

     

    a.     If this is a system drive, look through the “\Documents and Settings”, or “\users” folder and see if you can browse all of the profile folders.  If not, you may need contact the owner(s) to check their folder size and see what they have saved in their user folder.  If they are like me, they will have 5 years worth of memory dumps, MP3’s, VHD’s, PDF’s, etc.

    b.    Try “CACLS.EXE x:\ /T”, or just browse through the folders on the drive looking for an “Access Denied” error.  Once this happens, give yourself permissions to that part of the subtree (if permitted by your administrative privileges) or have someone who has access to that folder enumerate the contents for you.  Check the number of files and the size on disk for all files in that folder and add it to the total.

    c.     Continue this process until you have a total of all files in all folders.

    d.    Does the total make sense?  If not, then continue to the next step.

     

    In the case of my mysterious 20 GB difference, I found an old user profile from a previous installation.  Sure enough, I received an access denied error when browsing to it.  To access the folder, I acquired ownership.  This user was in the habit of collecting memory dumps and virtual machines in a subfolder on his desktop.  I recovered quite a bit of free space by using the delete key.  I rescanned the volume, but to my disappointment, there was still a significant difference.

     

    Step 4.

    Are there alternate streams?  Try using STREAMS.EXE from (http://technet.microsoft.com/en-us/sysinternals/default.aspx).  Streams will recurse subdirectories and report space usage by alternate named streams.  Pipe this to a text file and analyze it with a text editor or spreadsheet program.

     

    Step 5.

    Hard links can be problematic when calculating the total usage via the file scan method.  Hard links are not extensively used, but this is worth mentioning.  A hard link is an extra index entry that points to an existing file.  These are created via the CreateHardLink function.  Although the file actually exists in one location, each hard link has its own size field.  Because of this, a single file can be added twice to the running total.  Another thing to know is that hard links are not synchronized, so it is possible that only one link will show the actual file size (see the example below).

    Unfortunately, there are few options available for detecting hard link paradoxes, but it is something to consider when the file scan shows more usage than the bitmap.  Since we have the opposite situation here, hard links are not a significant factor.

     

    Step 6.

    Is Volume Shadow Copy Service maintaining diff area files for snapshots?  Use VSSADMIN LIST SHADOWSTORAGE to find out.  Add shadow storage to the running total.

     

    VSSVC pre-allocates space for making volume snapshots.  In order to support this feature, diff area files are kept in the “\System Volume Information” folder.  This pre-allocated space is used to maintain point-in-time information for the “Previous Versions” feature and for the “System Restore” application.  If you are the type of user who prefers to minimize the impact of such features, then you can resize your shadow storage with VSSADMIN so it has less impact on disk usage.  I prefer to leave these features at their default settings (and just make a note of how much disk space it is using).

    Step 7.

     

    If your numbers still don’t make sense, then it’s time to check for actively hidden files / folders.  There are many different rootkit scanners that can help you identify the presence of actively hidden files & folders.   Try using a scanner like Rootkit Revealer.  If you suspect that your machine has been compromised by a rootkit, refer to http://www.microsoft.com/uk/business/security/security-malware.mspx.

     

    Seeing that I did not have a significant amount of mysterious usage, I was satisfied that there was no malicious logic hiding my files.  After taking into account all of the factors, there was only a 1.57GB difference remaining (which was accounted for by the contents of a user’s recycle bin).

     

    Whenever you see a difference between the “Used space”, and “Size on disk”, always ask yourself “What am I not seeing?”

     

     

    Best regards,

     

    Dennis Middleton “The NTFS Doctor”

     

  • Ntdebugging Blog

    Tracking Down a Multi-Process Deadlock

    • 9 Comments

     

    Hello, my name is Louis Shanks and my story starts off with a tricky little deadlock issue that I worked on recently. Let me set the stage by explaining the surface level symptom, and then we will get to the good stuff. Randomly when users running Vista would attempt to launch Internet Explorer nothing would happen. So they would again double-click the icon, and still nothing happened. So the users would reboot and go back to their web activities.

     

    After some basic trouble shooting it was discovered that Iexplore.exe was running multiple times and every attempt to open the browser would result in another Iexplore.exe instance.  Since no UI was ever presented I went ahead and collected a few Iexplore.exe process dumps along with a complete system dump.

     

     The process dumps didn’t prove to be terribly useful so I decided to move over to the complete memory dump.

     

    Since I knew we had multiple IExplore.exe processes running I wanted to investigate them first.

     

    0: kd>!process 0 0

     

    <snip>

    PROCESS 87289d90  SessionId: 5  Cid: 0fdc    Peb: 7ffdc000  ParentCid: 0968

        DirBase: 7d66c520  ObjectTable: a203a178  HandleCount: 205.

        Image: iexplore.exe

     

    PROCESS 861a9d90  SessionId: 5  Cid: 0d34    Peb: 7ffd9000  ParentCid: 0fdc

        DirBase: 7d66c7a0  ObjectTable: a5651a98  HandleCount: 170.

        Image: ieuser.exe

     

    PROCESS 86c175b0  SessionId: 5  Cid: 1250    Peb: 7ffd9000  ParentCid: 0968

        DirBase: 7d66c940  ObjectTable: a0853f10  HandleCount:  91.

        Image: iexplore.exe


    PROCESS 861ac7a8  SessionId: 5  Cid: 1024    Peb: 7ffde000  ParentCid: 0968

        DirBase: 7d66c620  ObjectTable: a83f7898  HandleCount:  91.

    Image: iexplore.exe
    </snip>

    I decided to just start with the first one in the list (87289d90)

     

    0: kd> !process 87289d90
        Image: iexplore.exe

        ElapsedTime                       00:24:06.666

     

    It appears this process has been around a while, so I wanted to look at the first thread in this process as it is responsible for the UI in Internet Explorer.  I am looking at this thread because the surface level problem illustrates that when we launch Iexplore.exe the UI is never displayed. Here I used the !thread command to display the specific thread.

     

            THREAD 870b6398  Cid 0fdc.16a4  Teb: 7ffdf000 Win32Thread: fd850b78 WAIT: (UserRequest) UserMode Non-Alertable

                85e86cf0  SynchronizationEvent

                85cb35e0  SynchronizationEvent

            Not impersonating

            Attached Process          87289d90       Image:         iexplore.exe

            Wait Start TickCount      10817197       Ticks: 69571 (0:00:18:05.314)

            UserTime                  00:00:00.015

            KernelTime                00:00:00.093  

     

    Based on the tick count this thread has been waiting for 18 minutes.

     

    0: kd> kvn

    # ChildEBP RetAddr  Args to Child             

    00 931208d8 81a940a1 870b6398 803d2120 870b6420 nt!KiSwapContext+0x26

    01 9312091c 81a8dacc 870b6398 00000000 00000002 nt!KiSwapThread+0x44f

    02 93120970 81bedb4f 00000002 93120aa8 00000001 nt!KeWaitForMultipleObjects+0x53d

    03 93120bfc 81bee2e3 00000002 00000001 00000000 nt!ObpWaitForMultipleObjects+0x256

    04 93120d48 81a30b3a 00000002 0027e9c4 00000001 nt!NtWaitForMultipleObjects+0xcc

    05 93120d48 77568254 00000002 0027e9c4 00000001 nt!KiFastCallEntry+0x12a

    06 0027e970 775679fc 7708c973 00000002 0027e9c4 ntdll!KiFastSystemCallRet

    07 0027e974 7708c973 00000002 0027e9c4 00000001 ntdll!NtWaitForMultipleObjects+0xc

    08 0027ea10 7608ff95 0027e9c4 0027ea38 00000000 kernel32!WaitForMultipleObjectsEx+0x11d

    09 0027ea64 773d3a46 00000028 0027eaac ffffffff USER32!RealMsgWaitForMultipleObjectsEx+0x13c

    0a 0027ea8c 773d3b7f 0027eaac ffffffff 0027eabc ole32!CCliModalLoop::BlockFn+0x97

    0b 0027eab4 774e75be ffffffff 003ddf90 0027ebc0 ole32!ModalLoop+0x5b

    0c 0027ead0 774e9294 00000000 0027ebd4 00000000 ole32!ThreadSendReceive+0x12c

    0d 0027eaf8 774e9100 0027ebc0 003d3538 0027ec1c ole32!CRpcChannelBuffer::SwitchAptAndDispatchCall+0x194

    0e 0027ebd8 773d3c25 003d3538 0027ecfc 0027ece4 ole32!CRpcChannelBuffer::SendReceive2+0xef

    0f 0027ebf4 773d3bd7 0027ecfc 0027ece4 003d3538 ole32!CCliModalLoop::SendReceive+0x1e

    10 0027ec6c 77403efb 003d3538 0027ecfc 0027ece4 ole32!CAptRpcChnl::SendReceive+0x73

    11 0027ecc0 76052ff4 003d3538 0027ecfc 0027ece4 ole32!CCtxComChnl::SendReceive+0x1c5

    12 0027ecd8 76053055 003e50a4 0027ede4 76052e0d RPCRT4!NdrProxySendReceive+0x43

    13 0027ece4 76052e0d 09ea0d59 0027f1fc 070001f3 RPCRT4!NdrpProxySendReceive+0xc

     

    Based on the call stack it appears this thread made an OLE call and is now waiting for some objects to signal.

     

    What is IExplore doing in this thread?  It is waiting on another process to service this OLE request.   We can find the PID of the other process by digging into the ole32 SendReceive call.

     

    I looked at the first argument:

     

    0: kd> dd 003d3538

    003d3538  77403a50 773f57b4 00000003 00000002

    003d3548  00000000 00000000 003d2ef0 003d8840

    003d3558  0038b1c0 003e05f8 77403980 00070005

     

    This gives us a pointer to another data structure that contains the information we are attempting to locate.


    0: kd> dd 003d2ef0

    003d2ef0  774ee600 003d2e70 00000d34 00000000

    003d2f00  862941fa c94cde09 55a88424 801a2601

    003d2f10  55a88424 801a2601 0000c400 00000d34

     

    Ah there it is! 0d34 is the process id servicing our OLE request.

     

    I went back and reviewed the !process 0 0 output to locate the process containing PID 0D34. I’m dumping it here.

     

    PROCESS 861a9d90  SessionId: 5  Cid: 0d34    Peb: 7ffd9000  ParentCid: 0fdc

        DirBase: 7d66c7a0  ObjectTable: a5651a98  HandleCount: 170.

        Image: ieuser.exe

     

    Ok, so the process handing this request is IEUSER.exe, which is the broker process used by Internet Explorer in Vista when running in protect mode.  This is new in Vista.

     

    More on IEUser and Protected mode can be seen at the following location:
    http://msdn.microsoft.com/en-us/library/bb250462.aspx

     

    Now I needed to dig into the IEUser.exe process to find out what it was working on causing Iexplore to wait.

     

    0: kd> !process 861a9d90 

    PROCESS 861a9d90  SessionId: 5  Cid: 0d34    Peb: 7ffd9000  ParentCid: 0fdc

        DirBase: 7d66c7a0  ObjectTable: a5651a98  HandleCount: 170.

        Image: ieuser.exe


    I discovered the thread in the ieuser.exe process working on our call from Iexplore.exe. However this thread is also waiting on access to something before it can complete.


    0: kd> kvn+200

     # ChildEBP RetAddr  Args to Child             

    00 a41b0c50 81a940a1 85ecbac0 85ecbb48 81af9920 nt!KiSwapContext+0x26

    01 a41b0c94 81a2ddc8 85ecbac0 00000000 85e18510 nt!KiSwapThread+0x44f

    02 a41b0ce8 81bf0d9f 85e18510 00000006 00000001 nt!KeWaitForSingleObject+0x492

    03 a41b0d50 81a30b3a 00000224 00000000 00000000 nt!NtWaitForSingleObject+0xbe

    04 a41b0d50 77568254 00000224 00000000 00000000 nt!KiFastCallEntry+0x12a

    05 0163e8a4 77567a0c 7708c7c4 00000224 00000000 ntdll!KiFastSystemCallRet

    06 0163e8a8 7708c7c4 00000224 00000000 00000000 ntdll!NtWaitForSingleObject+0xc

    07 0163e918 7708c732 00000224 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xbe

    08 0163e92c 747e60aa 00000224 ffffffff 00000000 kernel32!WaitForSingleObject+0x12

    09 0163e948 747e5fe0 0163ec48 00497b38 00000000 RASAPI32!ReadPhonebookFileEx+0x1a4

    0a 0163e968 747e4e9d 0163ec48 00000000 00000000 RASAPI32!ReadPhonebookFile+0x1b

    0b 0163e9a8 747e52d3 0163ec48 0048f660 0163e9e0 RASAPI32!DwEnumEntriesFromPhonebook+0x35

    0c 0163ee58 747e4fd4 0163ee9c 00000001 0048f660 RASAPI32!DwEnumEntriesInDir+0x1cc

    0d 0163f0ac 747e677e 00000001 0048f660 0163f0f4 RASAPI32!DwEnumEntriesForPbkMode+0xab

    0e 0163f0e0 76e534b9 00000000 00004180 0048f660 RASAPI32!RasEnumEntriesW+0xb0

    0f 0163f10c 76e5342a 76ecc2dc 00000000 00000000 WININET!RasEnumHelp::RasEnumHelp+0x55

    10 0163f120 76e533a3 00000001 00000000 00000001 WININET!DoConnectoidsExist+0x2b

    11 0163f14c 76e534ff 76ecf02c 76ecc2dc 00000000 WININET!GetRasConnections+0x34

    12 0163f168 76e4196e 00000000 0163f180 6ee3228c WININET!IsDialUpConnection+0xa9

    13 0163f184 76e70564 00000001 00000000 00000000 WININET!FixProxySettingsForCurrentConnection+0x31

    14 0163f1a8 6ec2db3b 00000000 0163f1bc 0163f3d4 WININET!InternetInitializeAutoProxyDll+0x6c

    15 0163f1b8 75fe9142 004884a0 00000202 0047e900 ieframe!CShdocvwBroker::StartAutoProxyDetection+0x4d

    16 0163f1d0 76054294 6ec2daf8 0163f3d8 00000001 RPCRT4!Invoke+0x2a

    17 0163f5fc 76054a49 004858d0 0046e4e0 0047e900 RPCRT4!NdrStubCall2+0x27b

    18 0163f64c 774e86d7 004858d0 0047e900 0046e4e0 RPCRT4!CStdStubBuffer_Invoke+0xa0

    19 0163f694 774e877d 0047e900 0047f968 00480b38 ole32!SyncStubInvoke+0x3c

    1a 0163f6e0 7740249e 0047e900 0047f860 004858d0 ole32!StubInvoke+0xb9

    1b 0163f7bc 774023af 0046e4e0 00000000 004858d0 ole32!CCtxComChnl::ContextInvoke+0xfa

    1c 0163f7d8 774e85df 0047e900 00000001 004858d0 ole32!MTAInvoke+0x1a

    1d 0163f808 774e8a18 d0908070 0046e4e0 004858d0 ole32!AppInvoke+0xaa

    1e 0163f8e4 774e85ef 0047e8a8 00474118 0047ebc0 ole32!ComInvokeWithLockAndIPID+0x32c

    1f 0163f930 75fe94b0 0047ebc0 08ac70b7 0047ebc0 ole32!ThreadInvoke+0x2fd

     

    I’m dumping the first parameter passed to KeWaitForSingleObject to determine the object type.

     

    0: kd> !object 85e18510

    Object: 85e18510  Type: (85484d40) Mutant

        ObjectHeader: 85e184f8 (old version)

        HandleCount: 3  PointerCount: 5

        Directory Object: 881f3030  Name: RasPbFile


    Ok, we are waiting on a Mutex named RasPbFile. The big question in your mind now is, “who owns this lock”, right?

    To determine this I’ll dump the structure for the mutex object:

     

    0: kd> dt _KMUTANT 85e18510

    nt!_KMUTANT

       +0x000 Header           : _DISPATCHER_HEADER

       +0x010 MutantListEntry  : _LIST_ENTRY [ 0x870b6564 - 0x870b6564 ]

       +0x018 OwnerThread      : 0x870b6398 _KTHREAD

       +0x01c Abandoned        : 0 ''

       +0x01d ApcDisable       : 0 ''

     

    The owner of the mutex is 870b6398. The bad news is this is the same thread I started with in the Iexplore.exe process so we’re looking at a classic deadlock scenario. Thread 870b6398 running under Iexplore.exe has made an OLE call over to the IEuser.exe broker process. The IEuser process needs the mutex owned by the thread that called it in the first place.

    This turned out to be a bug in the RasApi code, not the browser or IEuser process.  The RasAPI code was written in such a way that it would wait for an infinite period of time to gain the Mutex.

     

    0163e92c 747e60aa 00000224 ffffffff 00000000 kernel32!WaitForSingleObject+0x12

     

     The good news is that we have created hotfix 951738 to address this issue.  The official KB article is still pending.

  • Ntdebugging Blog

    Join us on Facebook

    • 0 Comments

    We’ve set up a new Facebook group named “Windows Debugging” as a community for Windows Debugging industry experts.  If Windows debugging is an interest of yours, please consider joining the group!  Many of the regular contributors to this blog will be there as well.  Just login to Facebook and search for “Windows Debugging” to find us, or go here:

    http://www.facebook.com/n/?group.php&gid=23775552268.

  • Ntdebugging Blog

    How it Works: DLL Injection

    • 7 Comments

     

    Introduction

     

    Hi everyone, this is Bob again.  I recently worked on an issue where the interaction of two threads in Winlogon led to a bugcheck.  One thread was a Winlogon thread initializing GDI.  The interesting thing about this scenario is how the other thread ended up in this process.

     

     

     

    What was the thread doing?

     

    Below is the user half of the thread stack.  The thread attempted to load a DLL.  

     

    ChildEBP RetAddr  Args to Child

    0058eaec 773901ad 773901d9 0058eafc 00240022 ntdll!KiFastSystemCallRet

    0058eb0c 775d96f3 775d1808 00000000 77e6f032 USER32!NtUserRegisterWindowMessage+0xc

    0058ed24 775e4755 00000000 00000001 7c837512 comctl32!InitGlobalMetrics+0x44

    0058ed3c 775e426a 00000031 0058ed68 7763490c comctl32!_ProcessAttach+0x98

    0058ed48 7763490c 775d0000 00000001 00000000 comctl32!DllMain+0x21

    0058ed68 7c81a352 775d0000 00000001 00000000 comctl32!_DllMainCRTStartup+0x52

    0058ed88 7c833465 776348ba 775d0000 00000001 ntdll!LdrpCallInitRoutine+0x14

    0058ee90 7c834311 00000000 00000000 7c8e2e58 ntdll!LdrpRunInitializeRoutines+0x367

    0058f124 7c834065 00000000 00080e98 0058f3ec ntdll!LdrpLoadDll+0x3cd

    0058f3a0 77e41bf3 00080e98 0058f3ec 0058f3cc ntdll!LdrLoadDll+0x198

    0058f408 77e5c70b 7c8e2e58 00000000 00000000 kernel32!LoadLibraryExW+0x1b2

    0058f41c 7c92a6a1 7c8e2e58 00000000 7c8e2e58 kernel32!LoadLibraryW+0x11

    0058f454 7c92a65f 7c8e2e58 7c8d0000 7c9297b6 SHELL32!SHFusionLoadLibrary+0x2a

    0058f460 7c9297b6 00000020 00000008 0058f6a8 SHELL32!DelayLoadCC+0x15

    0058f694 7c929728 0058f6a8 0000007c 00000001 SHELL32!SHFusionInitializeIDCC+0x92

    0058f8b4 7c92966f 7c8d0000 0000007c 00000001 SHELL32!SHFusionInitializeFromModuleID+0x3a

    0058f8c8 7c92962c 7c8d0000 00000001 0058f8f8 SHELL32!_ProcessAttach+0x34

    0058f8d8 7c92bb63 7c8d0000 00000001 00000000 SHELL32!DllMain+0x27

    0058f8f8 7c81a352 7c8d0000 00000001 00000000 SHELL32!_DllMainCRTStartup+0x52

    0058f918 7c833465 7c92bb1b 7c8d0000 00000001 ntdll!LdrpCallInitRoutine+0x14

    0058fa20 7c834311 00000000 00000000 00000004 ntdll!LdrpRunInitializeRoutines+0x367    ß This function is

          loading and calling init

          functions of dependent DLL’s

    0058fcb4 7c834065 00000000 00080760 0058ff7c ntdll!LdrpLoadDll+0x3cd

    0058ff30 77e41bf3 00080760 0058ff7c 0058ff5c ntdll!LdrLoadDll+0x198        ß 0058ff5c is the Unicode string

       pointer to DLL name

    0058ff98 77e5c70b 00570254 00000000 00000000 kernel32!LoadLibraryExW+0x1b2

    0058ffac 0057017e 00570254 00000000 00200008 kernel32!LoadLibraryW+0x11

    WARNING: Frame IP not in any known module. Following frames may be wrong.

    0058fff4 00000000 00570228 00905a4d 00000003 0x57017e

     

     

    The DLL being loaded depends on other DLLs.  These DLLs are loaded and initialized when the first DLL is loaded.  So if DLL ‘A’ has a call to DLL ‘B’, then DLL ‘B’ is loaded by the loader when DLL ‘A’ is loaded.

     

     

    What is so unusual about this thread?

    If you examine the IP for that start address taken from !thread and where it is calling LoadLibraryW, the IP is not in a range of any loaded module.

     

    1: kd> !thread

    THREAD 86edd020  Cid 7884.7528  Teb: 7ffdc000 Win32Thread: bc1adb48 RUNNING on processor 1

    Not impersonating

    DeviceMap                 e10018c0

    Owning Process            87c51d88       Image:         winlogon.exe

    Wait Start TickCount      2567944        Ticks: 0

    Context Switch Count      4                 LargeStack

    UserTime                  00:00:00.015

    KernelTime                00:00:00.000

    Start Address 0x00570000   ß Starting address.  This is not in any module displayed by “!peb”

     

    The !PEB extension will display the loaded module list and address range for the process.  It is not shown here because of space.  However this address is not in any loaded module.

     

    Let’s look at the function:

     

    00570000 55              push    ebp

    00570001 8bec            mov     ebp,esp

    00570003 83ec3c          sub     esp,3Ch

    00570006 8365e800        and     dword ptr [ebp-18h],0

    0057000a 8365ec00        and     dword ptr [ebp-14h],0

    0057000e 8365f800        and     dword ptr [ebp-8],0

    00570012 8365dc00        and     dword ptr [ebp-24h],0

    00570016 8365f000        and     dword ptr [ebp-10h],0

     

    1: kd> u

    0057001a 8365e000        and     dword ptr [ebp-20h],0

    0057001e 8365f400        and     dword ptr [ebp-0Ch],0

    00570022 6a01            push    1

    00570024 8b4508          mov     eax,dword ptr [ebp+8]        ß The 1st argument is a pointer to a list of functions.

    00570027 ff5004          call    dword ptr [eax+4]

    0057002a 8945fc          mov     dword ptr [ebp-4],eax

    0057002d 8b4508          mov     eax,dword ptr [ebp+8]        ß Function block.

    00570030 ff5010          call    dword ptr [eax+10h]

     

    1: kd> u

    00570033 8945e4          mov     dword ptr [ebp-1Ch],eax

    00570036 837de400        cmp     dword ptr [ebp-1Ch],0

    0057003a 0f84c0010000    je      00570200

     

    The first argument is a block of functions.  That is what was passed as the initial parameter.  What functions were passed?

     

    1: kd> dds 570228 l 5

    00570228  77e5c6fa kernel32!LoadLibraryW

    0057022c  77e6c2dc kernel32!SetErrorMode

    00570230  77e70531 kernel32!GetCurrentDirectoryW

    00570234  77e70d67 kernel32!SetCurrentDirectoryW

    00570238  77e63ec7 kernel32!GetProcessHeap

     

    These functions are standard kernel32 calls.  So, the question is why is it doing that?

     

     

    What is this thread doing?

    Based on the fact that the IP is not in any module, the IP is page aligned, and the thread was passed function addresses as it initial parameter, it looks like this thread was “injected” into this process. 

     

     

    How was this thread injected?

    Another process, (I do not know which one) allocated a block of memory in the Winlogon process via VirtualAllocEx.  This function takes a process handle as input and it can be a different process.  Then code can be moved into the process via WriteProcessMemory.  Then a thread can be created using the starting address of the memory address returned from VirtualAllocEx.

     

     

    Are we done?

    Nope – remember the block of addresses?  This is needed because the module was not loaded by the loader.  So the functions did not get resolved by the linker.  So the addresses of functions outside of the code moved are unknown.  Since in Windows Server 2003 the functions of certain DLLs stay at the same address, they can be passed to another process.  Vista and beyond does not do this, so this method will not work.

     

     

    Conclusion

    Hope you found this interesting.  More on DLL injection can be found here: http://www.codeproject.com/dll/DLL_Injection_tutorial.asp and here: http://en.wikipedia.org/wiki/DLL_injection.

     

    AppInit_DLLs is another method of getting a DLL to load (in all process that rely on user32.dll), documented in KB 197571.  This can lead to problems though, as Raymond Chen discussed here.

     

  • Ntdebugging Blog

    Designing the Perfect Breakpoint

    • 4 Comments

     

    Written by Jeff Dailey.

     

     

    When it comes to live debugging, the breakpoint is king.  Oftentimes solving a very complex problem in a production environment involves doing a local, non-production debug one of my own test machines.  I’ll typically debug the process or code in question to get a good idea of how it works and what data I need to collect when I break in.   This normally involves some reverse engineering and doing code review of the process or module in question.    We always want to minimize impact on production environments when it’s time to do the live debug.  I call it designing the perfect breakpoint.

     

     

    There are several types of breakpoints we typically use. 

     

     

    The most common breakpoint is bp, and setting it in windbg is simple:  bp 0XADDRESS.   Note that when you break into a user mode process it halts execution of the process.  When you break into the kernel debugger it halts the entire machine.  This being the case, we like to figure out what commands we would issue once the breakpoint hits.  You can then set the breakpoint to automatically issue the commands at the time of break in.   

     

     

    The syntax for running commands upon hitting a breakpoint is straightforward:  bp 0XADDRESS “comand1;command2;…”

     

    As an example - bp 0xMYADDRESS “kv;dd g_myGlobal;g”    This command will break on 0xMYADDRESS, dump the callstack via kv; dump memory as DWORDs starting at the g_MyGlobal address; and then continue execution with the g command.

     

     

    Another cool trick is to have a breakpoint enable or disable another breakpoint or disable itself.     Say you have a breakpoint that is hit far too often to break in EVERY single time that code executes.    However, once a particular state or a condition is met, you may want every call stack associated with that break point.    All you need do is set a breakpoint that is waiting on your key conditional code to execute.   This breakpoint can then enable the more expensive breakpoint.

     

     

    Here is the example:

     

    bp ntdll!RtlAllocateHeap “kv;g”    << This will be breakpoint 1, set on a very common call.

    bd 1    << Disable break point 1, as we don’t want to see its output until breakpoint 2 executes.

    bp notepad!OpenFile “be 1;g”  << This will enable breakpoint 1 when NotePad!OpenFile is executed. 

     

     

    The most expensive type of break point is a ba - Break on access.  This break point allows you to break on access to either memory or I/O ports.  There are several variations : ba w (break on write), ba r (break on read), ba e (break on execute), and ba i  (for I/O ports)

     

     

    Let’s use break on write for our sample.   The syntax is simple.  ba w4 0xNNNNnnnn.  This means break on access, access type WRITE, with a width of 4 bytes, followed by the target address.   However, because the processor has to watch for access to this memory, it generally slows down execution time.  This might be a breakpoint that you enable conditionally using BE during another condition check. 

     

     

    Ba type breakpoints are very handy when you have a resource changing and you don’t know what code path is touching it.    Imagine you had a corrupted critical section.  It’s not likely one of the critical section APIs is causing  the corruption.  It’s more likely that something overwrote that memory.  That being the case you could simply do a ba w4 on the critical section address followed by a “kv;g” and let the process run.  You will see all the places the critical section APIs touches the critical section; however, you will also see any offender that writes over that memory.

     

     

    Refer to the debugger.chm file in the debugger tools directory for details on the BA variations.

     

     

    The conditional break

     

    You can also do conditional breaks or checks during the breakpoint execution in the debugger by using the j command in the section following the breakpoint: bp Address “j EXPRESSION ‘IfTrueCommands1;IfTruecommand2’  ; ‘ElseCommand1;ElseCommand2’”

     

     

    The following is an example of checking a call to a file open operation.  The breakpoint is set on the instruction following a CALL.

    bp notepad!SomeNotePadCodeOpeningAFile+0x132 "j @eax >0 '.echo all is good file opened ok;gc';'!gle;kv;lsa eip;gc'“

     

     

    If eax (the default register used to store function return values) is nonzero, we echo ‘all is good, file opened ok’ to the command output window.  If eax is zero (indicating failure), we display the last error with !gle, dump the call stack via kv, and list the source at the eip address using LSA EIP.

     

     

    Good luck and Happy debugging.

  • Ntdebugging Blog

    Windbg Tip: KN, .Frame , DV, and DT - It's so easy

    • 3 Comments

    Written by Jeff Dailey.

    Hello NTDebuggers, many of us take for granted some of the simple commands in the debugger that make life easy. I was thinking of several in particular that go great together.  The first command would be kn.  Kn will show the current call stack and includes the stack frame number to the far left of the individual call.  

    I loaded up a copy of BadWindow.EXE in Windbg, set my thread to thread Zero via ~0s.  Then I simply type kn.  This dumps out the call stack, and as you can see, to the far left each call there is the stack frame number.

    Using stack frame 2 from the above stack let’s look at the locals it uses.  First we must get in context using .frame and the frame number.

    Now to dump out the locals we can just use the dv (Display Local Variables) command.  (Note you need private symbols for the binary you are debugging for this to work).

    As you can see, we have several locals here.  Now, because I have private symbols for this binary, the debugger already understands the structures and data types for each variable.

    Furthermore because I’m already in the stack frames context I can simply dt (display type) with the name of the variable and the debugger figures out the address.  Let’s dt msg and see what data it contains.

    As you can see  have a couple of structures that are part of this structure.    Look at the right in the above output.  You will notice a data type name of HWND_ and tagPOINT to the right.  These are structures.  The good news is that dt will work recursively and will dump out structures under your current structure.   All you need to do is issue a DT –r.  You can specify the depth by appending a number to –r.

    For more information about the dt command and others mentioned in this blog make sure you take a look at the debugger.chm file that comes with the Windows Debugging Tools.

    Good luck and happy debugging.

  • Ntdebugging Blog

    How to debug WOW64 applications?

    • 3 Comments

     

    Hello, my name is Venkatesh Ganga, and I’m an Escalation Engineer on the Microsoft Platform Global Escalation Services team.  Recently, I worked on an issue where registry reflection was not happening while installing 32bit Office under the system account. This required looking into both the 32bit Office code and the Wow64 code where the registry reflection is implemented.  When attaching to the Wow64 process using the 32bit debugger it’s like debugging a 32bit process on the 32bit machine; there are no 64bit binaries in the process. However, we needed to debug Wow64 to debug the registry reflection code. To do this we attached to the Wow64 process using the 64bit debugger which allows you to see the Wow64 binaries.

    Ø  lm

                Base TimeStamp                     Module

              400000 42435b2a Mar 24 18:28:26 2005 C:\Program Files (x86)\Internet Explorer\IEXPLORE.EXE

            77ec0000 45d6cc72 Feb 17 03:35:46 2007 C:\WINDOWS\system32\ntdll.dll

            6b000000 45d6943d Feb 16 23:35:57 2007 C:\WINDOWS\system32\wow64.dll

            6b280000 45d695f3 Feb 16 23:43:15 2007 C:\WINDOWS\system32\wow64win.dll

            78b80000 42438b7a Mar 24 21:54:34 2005 C:\WINDOWS\system32\wow64cpu.dll

     

    There are 2 options for debugging Wow64 applications.

    1.       Using the 64bit debugger and Wow64 debugger extension (Wow64exts.dll)

    2.       Using the 32bit debugger

    Using 64bit debugger and Wow64 debugger extension (Wow64exts.dll)

    I ran the 32bit version of Internet Explorer on a 64bit machine and attached to it using the 64bit debugger. Here is the thread 0 call stack when viewed from 64bit debugger.

    0:000> kL

    Child-SP          RetAddr           Call Site

    00000000`0013edf8 00000000`78b8428e wow64cpu!CpupSyscallStub+0x9

    00000000`0013ee00 00000000`6b006a5a wow64cpu!Thunk0Arg+0x5

    00000000`0013ee70 00000000`6b005e0d wow64!RunCpuSimulation+0xa

    00000000`0013eea0 00000000`77ed8030 wow64!Wow64LdrpInitialize+0x2ed

    00000000`0013f6d0 00000000`77ed582f ntdll!LdrpInitializeProcess+0x1538

    00000000`0013f9d0 00000000`77ef30a5 ntdll!_LdrpInitialize+0x18f

    00000000`0013fab0 00000000`77d59620 ntdll!KiUserApcDispatch+0x15

    00000000`0013ffa8 00000000`00000000 0x77d59620

    00000000`0013ffb0 00000000`00000000 0x0

    00000000`0013ffb8 00000000`00000000 0x0

    00000000`0013ffc0 00000000`00000000 0x0

    00000000`0013ffc8 00000000`00000000 0x0

    00000000`0013ffd0 00000000`00000000 0x0

    00000000`0013ffd8 00000000`00000000 0x0

    00000000`0013ffe0 00000000`00000000 0x0

    00000000`0013ffe8 00000000`00000000 0x0

    00000000`0013fff0 00000000`00000000 0x0

    00000000`0013fff8 00000000`00000000 0x0

    00000000`00140000 00000020`78746341 0x0

    00000000`00140008 00005370`00000001 0x20`78746341

     

    The above stack only shows the 64 bit calls; we cannot see what the 32 bit calls are doing. To get the 32bit stack you must use one of the below methods.

    Ø  Option 1 : Run “!wow64exts.k”

    0:000> !wow64exts.k

    Walking 64bit Stack…

    Child-SP          RetAddr           Call Site

    00000000`0013edf8 00000000`78b8428e wow64cpu!CpupSyscallStub+0x9

    00000000`0013ee00 00000000`6b006a5a wow64cpu!Thunk0Arg+0x5

    00000000`0013ee70 00000000`6b005e0d wow64!RunCpuSimulation+0xa

    00000000`0013eea0 00000000`77ed8030 wow64!Wow64LdrpInitialize+0x2ed

    00000000`0013f6d0 00000000`77ed582f ntdll!LdrpInitializeProcess+0x1538

    00000000`0013f9d0 00000000`77ef30a5 ntdll!_LdrpInitialize+0x18f

    00000000`0013fab0 00000000`77d59620 ntdll!KiUserApcDispatch+0x15

    00000000`0013ffa8 00000000`00000000 0x77d59620

    00000000`0013ffb0 00000000`00000000 0x0

    00000000`0013ffb8 00000000`00000000 0x0

    00000000`0013ffc0 00000000`00000000 0x0

    00000000`0013ffc8 00000000`00000000 0x0

    00000000`0013ffd0 00000000`00000000 0x0

    00000000`0013ffd8 00000000`00000000 0x0

    00000000`0013ffe0 00000000`00000000 0x0

    00000000`0013ffe8 00000000`00000000 0x0

    00000000`0013fff0 00000000`00000000 0x0

    00000000`0013fff8 00000000`00000000 0x0

    00000000`00140000 00000020`78746341 0x0

    00000000`00140008 00005370`00000001 0x20`78746341

    Walking 32bit Stack...

    ChildEBP          RetAddr          

    002ded98 75ec1c83 USER32!NtUserWaitMessage+0x15

    002dee24 75ec61ef BROWSEUI!BrowserProtectedThreadProc+0x44

    002dfea8 779ba3a6 BROWSEUI!SHOpenFolderWindow+0x22c

    002dfec8 0040243d SHDOCVW!IEWinMain+0x129

    002dff1c 00402748 IEXPLORE!WinMain+0x316

    002dffc0 7d4e7d2a IEXPLORE!WinMainCRTStartup+0x186

    002dfff0 00000000 KERNEL32!BaseProcessStart+0x28

     

    Ø  Option 2 : Switch to x86 mode (using “!wow64exts.sw”) and do KB.

     

    0:000> !wow64exts.sw

    Switched to 32bit mode

    0:000:x86> kb

    ChildEBP          RetAddr           Args to Child                                        

    002ded98 75ec1c83 002f1be8 002dee50 002f1be8 USER32!NtUserWaitMessage+0x15

    002dee24 75ec61ef 002f1be8 002f1be8 00000000 BROWSEUI!BrowserProtectedThreadProc+0x44

    002dfea8 779ba3a6 002f1be8 00000001 00000000 BROWSEUI!SHOpenFolderWindow+0x22c

    002dfec8 0040243d 002e2508 00000001 ffffffff SHDOCVW!IEWinMain+0x129

    002dff1c 00402748 00400000 00000000 002e2508 IEXPLORE!WinMain+0x316

    002dffc0 7d4e7d2a 00000000 00000000 7efdf000 IEXPLORE!WinMainCRTStartup+0x186

    002dfff0 00000000 004025c2 00000000 000000c8 KERNEL32!BaseProcessStart+0x28

     

    The easiest way to see all of the 32bit call stacks is by switching to  32bit mode (!wow64exts.sw) and doing ~*k.  In addition, you can set breakpoints in 32bit or 64 bit binaries using the 64bit debugger.  Also note  “!peb” will show both the 64bit and 32bit PEB.

     

    Using the 32bit debugger

    As mentioned earlier there is nothing wrong with using the 32 bit debugger.  If you just need to debug the application’s 32bit code, using it is probably the simplest approach.  However, if you need to view Wow64 code or binaries, you must use the 64bit debugger.  Note that these techniques apply to debugging Wow64 dumps and live processes.

    You can find more information about WoW64 applications at http://msdn.microsoft.com/en-us/library/aa384249(VS.85).aspx 

     

  • Ntdebugging Blog

    NTDebugging Puzzler 0x00000007: Interlocked functions

    • 19 Comments

     

    Today, we will have some fun with interlocked functions.

     

    The following section of code is reentrant.  A “well meaning” developer used interlocked functions to avoid serializing on a global table lock.

     

    Initial smoke testing shows that the code works fine.  Sometimes things are not as they appear when doing initial code review.  After several hours of heavy stress testing, the developer finds the machine has bugchecked.  Analysis of the dump showed that the caller of this function had steamrolled through nonpaged pool writing stacks on top of everyone’s pool memory.

     

    The goal of today’s puzzler is to find the bug and describe how it could be fixed with minimal timing impact.

     

    Here are a few details before you begin.

     

    1.       The variable gIndex is an unsigned long global.

    2.       The gLogArray memory was allocated from nonpaged pool and the size of this allocation is correct.

     

    Ready?

     

    00:   PLOG_ENTRY GetNextLogEntry ()

          {

    01:         ULONG IterationCount = MAX_RECORDS;

     

    02:         PLOG_ENTRY pEntry;

     

    03:         do

                {

    04:               if (InterlockedCompareExchange(&gIndex, 0, MAX_RECORDS) == MAX_RECORDS)

     

    05:                     pEntry = &gLogArray[0];

     

    06:               else

     

    07:                     pEntry = &gLogArray[InterlockedIncrement(&gIndex)];

     

    08:               --IterationCount;

     

    09:         } while(InterlockedCompareExchange(&pEntry->Active, 1, 0) != 0 && (IterationCount > 0));

     

    0a:         return (IterationCount > 0) ? pEntry : NULL;

          }

     

     

     

    Happy hunting,

     

    Dennis Middleton “The NTFS Doctor”

     


      [Update: our answer. Posted 6/10/2008]

     

    Thanks for all the great posts!  I saw many interesting answers, and a few unique solutions that I hadn’t considered.

     

    Bug Description

     

    A slight time gap exists between the InterlockedCompareExchange and the InterlockedIncrement.  For this reason, several threads could pass the check for MAX_RECORDS prior to doing the increment.

     

    Assume that N is the number of threads that pass the check for MAX_RECORDS while gIndex is at a particular value.

    If N or more threads are able to pass the check while gIndex is equal to MAX_RECORDS-(N-1), then gIndex would be incremented beyond MAX_RECORDS.

     

    For example, let’s assume that 3 threads passed the check while gIndex was at MAX_RECORDS-2.  Then after the three increments occur, gIndex would be equal to MAX_RECORDS+1.  From that point, invalid pointers would be passed out to the caller.

     

    Possible Solutions

     

    There are several ways to solve this problem.  Some are more efficient than others.  I would avoid doing checks for MAX_RECORDS-1, MAX_RECORDS, or MAX_RECORDS+1 (interlocked or not) since there could potentially be more than two threads involved in the race condition.  Such a solution would only reduce the likelihood of an overflow.

     

    There were a few posts suggesting a lock or critical section for the section of code between the compare and increment.  That would be one solution, but it would also do away with the benefits of using interlocked functions.

     

    In keeping with the philosophy of keeping the code fast and simple, here’s a solution that gives a very good result with minimal impact.

     

    1.       I removed the if () {} else {} and simply allowed gIndex to increment unchecked.  With this change, gIndex can approach its max unsigned long value and possibly wrap - we need to keep the array index in check.

    2.       The modulus operator (added to line 4 below) will divide the incremented gIndex by MAX_RECORDS and use the remainder as the array index.  When dividing, the resultant remainder is always smaller than the divisor (MAX_RECORDS).  For this reason, the array index is guaranteed to be smaller than MAX_RECORDS.  As even multiples of MAX_RECORDS are reached, the array index resets back to zero mathemagically and no interlocked compare is even necessary.

     

     

    00:   PLOG_ENTRY GetNextLogEntry ()

          {

    01:         ULONG IterationCount = MAX_RECORDS;

     

    02:         PLOG_ENTRY pEntry;

     

    03:         do

                {

     

    04:               pEntry = &gLogArray[InterlockedIncrement(&gIndex) % MAX_RECORDS];

     

    05:               --IterationCount;

     

    06:         } while(InterlockedCompareExchange(&pEntry->Active, 1, 0) != 0 && (IterationCount > 0));

     

    07:         return (IterationCount > 0) ? pEntry : NULL;

          }

     

     

    With the fix in place, the code is smaller, faster, easier to read, and most of all - the bug is gone.  When developing code, always try to think small.             

     

    Best Regards,

    NTFS Doctor

     

     

     





Page 18 of 24 (235 items) «1617181920»