Other

  • The Old New Thing

    Tip for trying to boost morale: Don't brag about your overseas trips

    • 8 Comments

    Once upon a time, a senior manager held a team meeting to address low team morale. Attempting to highlight how important the project was, he opened by saying, "I just got back from ⟨faraway country⟩, meeting with some of our important clients, and..."

    This remark had exactly the opposite effect from what the manager intended. Instead of revitalizing the team, the team became even more despondent. "Here we are, working late and coming in on weekends, and this senior manager is telling us about his recent overseas junket."

    (After he left, I heard that he still pulled this stunt with his new team. "Over the past two weeks, I've been all over ⟨continent⟩...")

  • The Old New Thing

    Finding the leaked object reference by scanning memory: Example

    • 16 Comments

    An assertion failure was hit in some code.

        // There should be no additional references to the object at this point
        assert(m_cRef == 1);
    

    But the reference count was 2. That's not good. Where is that extra reference and who took it?

    This was not code I was at all familiar with, so I went back to first principles: Let's hope that the reference was not leaked but rather that the reference was taken and not released. And let's hope that the memory hasn't been paged out. (Because debugging is an exercise in optimism.)

    1: kd> s 0 0fffffff 00 86 ec 00
    04effacc  00 86 ec 00 c0 85 ec 00-00 00 00 00 00 00 00 00  ................ // us
    0532c318  00 86 ec 00 28 05 00 00-80 6d 32 05 03 00 00 00  ....(....m2..... // rogue
    

    The first hit is the reference to the object from the code raising the assertion. The second hit is the interesting one. That's probably the rogue reference. But who is it?

    1: kd> ln 532c318
    1: kd>
    

    It does not report as belong to any module, so it's not a global variable.

    Is it a reference from a stack variable? If so, then a stack trace of the thread with the active reference may tell us who is holding the reference and why.

    1: kd> !process -1 4
    PROCESS 907ef980  SessionId: 2  Cid: 06cc    Peb: 7f4df000  ParentCid: 0298
        DirBase: 9e983000  ObjectTable: a576f560  HandleCount: 330.
        Image: contoso.exe
    
            THREAD 8e840080  Cid 06cc.0b78  Teb: 7f4de000 Win32Thread: 9d04b3e0 WAIT
            THREAD 91e24080  Cid 06cc.08d8  Teb: 7f4dd000 Win32Thread: 00000000 WAIT
            THREAD 8e9a3580  Cid 06cc.09f8  Teb: 7f4dc000 Win32Thread: 9d102cc8 WAIT
            THREAD 8e2be080  Cid 06cc.0878  Teb: 7f4db000 Win32Thread: 9d129978 WAIT
            THREAD 82c08080  Cid 06cc.0480  Teb: 7f4da000 Win32Thread: 00000000 WAIT
            THREAD 90552400  Cid 06cc.0f5c  Teb: 7f4d9000 Win32Thread: 9d129628 WAIT
            THREAD 912c9080  Cid 06cc.02ec  Teb: 7f4d8000 Win32Thread: 00000000 WAIT
            THREAD 8e9e8680  Cid 06cc.0130  Teb: 7f4d7000 Win32Thread: 9d129cc8 READY on processor 0
            THREAD 914b8b80  Cid 06cc.02e8  Teb: 7f4d6000 Win32Thread: 9d12d568 WAIT
            THREAD 9054ab00  Cid 06cc.0294  Teb: 7f4d5000 Win32Thread: 9d12fac0 WAIT
            THREAD 909a2b80  Cid 06cc.0b54  Teb: 7f4d4000 Win32Thread: 00000000 WAIT
            THREAD 90866b80  Cid 06cc.0784  Teb: 7f4d3000 Win32Thread: 93dbb4e0 RUNNING on processor 1
            THREAD 90cfcb80  Cid 06cc.08c4  Teb: 7f3af000 Win32Thread: 93de0cc8 WAIT
            THREAD 90c39a00  Cid 06cc.0914  Teb: 7f3ae000 Win32Thread: 00000000 WAIT
            THREAD 90629480  Cid 06cc.0bc8  Teb: 7f3ad000 Win32Thread: 00000000 WAIT
    

    Now I have to dump the stack boundaries to see whether the address in question lies within the stack range.

    1: kd> dd 7f4de000 l3
    7f4de000  ffffffff 00de0000 00dd0000
    1: kd> dd 7f4dd000 l3
    7f4dd000  ffffffff 01070000 01060000
    ...
    1: kd> dd 7f4d7000 l3
    7f4d7000  ffffffff 04e00000 04df0000 // our stack
    ...
    

    The rogue reference did not land in any of the stack ranges, so it's probably on the heap. Fortunately, since it's on the heap, it's probably part of some larger object. And let's hope (see: optimism) that it's an object with virtual methods.

    0532c298  73617453
    0532c29c  74654d68
    0532c2a0  74616461
    0532c2a4  446e4961
    0532c2a8  00007865
    0532c2ac  00000000
    0532c2b0  76726553 USER32!_NULL_IMPORT_DESCRIPTOR  (USER32+0xb6553)
    0532c2b4  44497265
    0532c2b8  45646e49
    0532c2bc  41745378 contoso!CMumble::CMumble+0x4c
    0532c2c0  00006873
    0532c2c4  00000000
    0532c2c8  4e616843
    0532c2cc  79546567
    0532c2d0  4e496570
    0532c2d4  00786564
    0532c2d8  2856662a
    0532c2dc  080a9b87
    0532c2e0  00f59fa0
    0532c2e4  05326538
    0532c2e8  00000000
    0532c2ec  00000000
    0532c2f0  0000029c
    0532c2f4  00000001
    0532c2f8  00000230
    0532c2fc  fdfdfdfd
    0532c300  45ea1370 contoso!CFrumble::`vftable'
    0532c304  45ea134c contoso!CFrumble::`vftable'
    0532c308  00000000
    0532c30c  05b9a040
    0532c310  00000002
    0532c314  00000001
    0532c318  00ec8600
    

    Hooray, there is a vtable a few bytes before the pointer, and the contents of the memory do appear to match a CFrumble object, so I think we found our culprit.

    I was able to hand off the next stage of the investigation (why is a Frumble being created with a reference to the object?) to another team member with more expertise with Frumbles.

    (In case anybody cared, the conclusion was that this was a variation of a known bug.)

  • The Old New Thing

    No good deed goes unpunished: Marking a document as obsolete

    • 43 Comments

    I was contacted by a customer support liaison who was hoping that I could help them understand Feature X.

    I saw your name on a "Feature X technical specification" document in the Windows specification repository, and I was hoping you could answer a few questions for me, or redirect me to somebody who can.

    I was puzzled why this person saw my name on the "Feature X technical specification" document in the Windows specification repository, because I was not the author of that specification. I went to the specification repository, opened the document in question, and nope, my name appears nowhere in it.

    I asked, "What gave you the impression that I had anything to do with Feature X? XYZ can help you with your questions; he's the one listed as the author of the document."

    The response was, "Oh, I'm sorry. I didn't actually read the specification. I merely did a search through the entire repository for Feature X, and the "Feature X technical specification" is the one that showed up as most recently updated by you. In the past, this technique has been pretty good at finding someone who can help with a feature. Sorry about that."

    I went back and took another look at the document, and then I remembered why I updated it: My duties at the time included reviewing all documents that met certain criteria, such as this particular document. I had some feedback about the document for the author, who told me, "Oh, that's an obsolete version of the document, but it's retained for historical purposes. The current one is over there." To save the next person some time, I edited the obsolete document by inserting in big letters at the top, "TECHNICAL DOCUMENTATION FOR THIS FEATURE HAS MOVED TO ⟨new location⟩. THIS DOCUMENT IS OBSOLETE." I could've asked the author to do this, but I had the document open already, so I figured I'd save a few steps (ask author to update document, wait for reply, reopen document to verify that edit occurred) and just do it myself.

    Boom, no good deed goes unpunished. My update was made long after the real technical specification was completed. As a result, of all the documents on Feature X, not only is it the obsolete one that shows up as most recently updated, but I am the one listed as the person who made that most recent update.

    Next time, I'll try to remember to do things the long way, even though it is big hassle for everybody.

    "Please update the document to indicate that it is obsolete and redirect the reader to the current document."

    — Could you do that? You've already got the document open.

    "No, I used to do that, but it came back and bit me, because I become the person to edit the document last, and then everybody comes to me with questions about the document instead of you."

    — You do realize that in the time you tried to convince me to do it, you could've just done it.

    Follow-up: I tried it, and sometimes the response was "I'm really busy now, I'll get around to it in a few weeks." Now I have to create a reminder task in two weeks to follow up. More hassle for everybody.

    I think the next time this happens, I'll write back, "I'm coming over to your office. I'll make the one-line edit on your computer so that your name is the one attached to the edit."

  • The Old New Thing

    2014 year-end link clearance

    • 14 Comments

    Another round of the semi-annual link clearance.

  • The Old New Thing

    Even the publishing department had its own Year 2000 preparedness plan

    • 7 Comments

    On December 31, 1999, Microsoft Product Support Services were ready in case something horrible happened as the calendar rolled over into the new year.

    I'm told that Microsoft Press also had its own Year 2000 plan. They staffed their helpline continuously from Friday evening December 31, 1999 all the way through Sunday, January 2, 2000. They did this even though Microsoft Press did not normally staff its helpline ouside normal business hours, and even though all sample code in all publications come with a disclaimer that they are provided "as is" with no warranty.

    I do not know if they took any calls, but I suspect not.

  • The Old New Thing

    Debugging walkthrough: Access violation on nonsense instruction

    • 20 Comments

    A colleague of mine asked for help puzzling out a mysterious crash dump which arrived via Windows Error Reporting.

    rax=00007fff219c5000 rbx=00000000023c8380 rcx=00000000023c8380
    rdx=0000000000000000 rsi=00000000043f0148 rdi=0000000000000000
    rip=00007fff21af2d22 rsp=000000000392e518 rbp=000000000392e580
     r8=00000000276e4639  r9=00000000043b2360 r10=00000000ffffffff
    r11=0000000000000000 r12=0000000000000001 r13=0000000000000000
    r14=000000000237cfc0 r15=00000000023d3ea0
    iopl=0         nv up ei pl zr na po nc
    cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010246
    nosebleed!CNosebleed::OnFrimble+0x1f891a:
    00007fff`21af2d22 30488b xor byte ptr [rax-75h],cl ds:00007fff`219c4f8b=41
    

    Well that's a pretty strange instruction. Especially since it doesn't match up with the source code at all.

    void CNosebleed::OnFrimble(...)
    {
        ...
        if (CanFrumble(...))
        {
            ...
        }
        else
        {
            hr = pCereal->AddMilk(pCarton);
            if (SUCCEEDED(hr))
            {
                pCereal->Snap();
                pCereal->Crackle(false);
                if (SUCCEEDED(pCereal->Pop(uId)) // ← crash here
                {
                    ....
                }
            }
        }
        ....
    }
    

    There is no bit-toggling in the actual code. The method calls to Snap, Crackle, and Pop are all interface calls and therefore should be vtable calls. We are clearly in a case of a bogus return address, possibly a stack smash (and therefore cause for concern from a security standpoint).

    My approach was to try to figure out what was happening just before the crash. And that meant figuring out how we ended up in the middle of an instruction.

    Here is the code surrounding the crash point.

    00007fff`21af2d17 ff90d0020000    call    qword ptr [rax+2D0h]
    00007fff`21af2d1d 488b03          mov     rax,qword ptr [rbx]
    00007fff`21af2d20 8b5530          mov     edx,dword ptr [rbp+30h]
    00007fff`21af2d23 488bcb          mov     rcx,rbx
    

    Notice that the code that crashed is actually the last byte of the mov edx, dword ptr [rbp+30h] (the 30) and the first two bytes of the mov rcx, rbx (the 488b).

    Disassembling backward is a tricky business on a processor with variable-length instructions, so to get my bearings, I looked for the call to Can­Frumble:

    0:011> #CanFrumble nosebleed!CNosebleed::OnFrimble
    nosebleed!CNosebleed::OnFrimble+0x1f883b
    00007fff`21af2c43 e8e0e40f00 call nosebleed!CNosebleed::OnFrimble
    

    The # command means "Start disassembling at the specified location and stop when you see the string I passed." This is an automated way of just hitting u until you get to the thing you are looking for.

    Now that I am at some known good code, I can disassemble forward:

    00007fff`21af2c48 488bcb          mov     rcx,rbx
    00007fff`21af2c4b 84c0            test    al,al
    00007fff`21af2c4d 0f849a000000    je      nosebleed!CNosebleed::OnFrimble+0x1f88e5 (00007fff`21af2ced)
    

    The above instructions check whether the Can­Frumble returned true, and if not, it jumps to 00007fff`21af2ced. Since we know that we are in the false path, we follow the jump.

    // Make a vtable call into pCereal->AddMilk()
    00007fff`21af2ced 488b03          mov     rax,qword ptr [rbx] ; vtable
    00007fff`21af2cf0 498bd7          mov     rdx,r15 ; pCarton
    00007fff`21af2cf3 ff9068010000    call    qword ptr [rax+168h] ; call
    00007fff`21af2cf9 8bf8            mov     edi,eax ; save to hr
    00007fff`21af2cfb 85c0            test    eax,eax ; succeeded?
    00007fff`21af2dfd 0f880dffffff    js      nosebleed!CNosebleed::OnFrimble+0x1f8808 (00007fff`21af2c10)
    
    // Now call Snap()
    00007fff`21af2d03 488b03          mov     rax,qword ptr [rbx] ; vtable
    00007fff`21af2d06 488bcb          mov     rcx,rbx ; "this"
    00007fff`21af2d09 ff9070020000    call    qword ptr [rax+270h] ; Snap
    
    / Now call Crackle
    00007fff`21af2d0f 488b03          mov     rax,qword ptr [rbx] ; vtable
    00007fff`21af2d12 33d2            xor     edx,edx ; parameter: false
    00007fff`21af2d14 488bcb          mov     rcx,rbx ; "this"
    00007fff`21af2d17 ff90d0020000    call    qword ptr [rax+2D0h] ; Crackle
    
    // Get ready to Pop
    00007fff`21af2d1d 488b03          mov     rax,qword ptr [rbx] ; vtable
    00007fff`21af2d20 8b5530          mov     edx,dword ptr [rbp+30h] ; uId
    00007fff`21af2d23 488bcb          mov     rcx,rbx ; "this"
    

    But we never got to execute the Pop because our return address from Crackle got messed up.

    Let's follow the call into Crackle.

    0:011> dps @rbx l1
    00000000`02b4b790  00007fff`219c50a0 nosebleed!CCereal::`vftable'
    0:011> dps 00007fff`219c50a0+2d0 l1
    00007fff`219c5370  00007fff`21aa5c28 nosebleed!CCereal::Crackle
    0:011> u 00007fff`21aa5c28
    nosebleed!CCereal::Crackle:
    00007fff`21aa5c28 889163010000    mov     byte ptr [rcx+163h],dl
    00007fff`21aa5c2e c3              ret
    

    So at least the pCereal pointer seems to be okay. It has a vtable and the slot in the vtable points to the function we expect. The Crackle method merely stashes the bool parameter into a member variable. No stack corruption here because rbx is nowhere near rsp.

    0:012> db @rbx+163 l1
    00000000`02b4b8f3  ??                                               ?
    

    Sadly, the byte in question was not captured in the dump, so we cannot verify whether the call actually was made. Similarly, the members of CCereal manipulated by the Snap method were also not captured in the dump, so we can't verify that either. (The only member of CCereal captured in the dump is the vtable itself.)

    So we can't find any evidence one way or the other as to whether any of the calls leading up to Pop actually occurred. Maybe we can try to figure out how many misaligned instructions we managed to execute before we crashed, see if that reveals anything. To do this, I'm going to disassemble at varying incorrect offsets and see which ones lead to the instruction that crashed.

    0:011> u .-1 l2
    nosebleed!CNosebleed::OnFrimble+0x1f8919:
    00007fff`21af2d21 55              push    rbp
    00007fff`21af2d22 30488b          xor     byte ptr [rax-75h],cl
    // ^^ this looks interesting; we'll come back to it
    
    0:011> u .-3 l2
    nosebleed!CNosebleed::OnFrimble+0x1f8917:
    00007fff`21af2d1f 038b5530488b    add     ecx,dword ptr [rbx-74B7CFABh]
    00007fff`21af2d25 cb              retf
    // ^^ this doesn't lead to the crashed instruction
    
    0:011> u .-4 l2
    nosebleed!CNosebleed::OnFrimble+0x1f8916:
    00007fff`21af2d1e 8b03            mov     eax,dword ptr [rbx]
    00007fff`21af2d20 8b5530          mov     edx,dword ptr [rbp+30h]
    // ^^ this doesn't lead to the crashed instruction
    
    0:012> u .-5 l3
    nosebleed!CNosebleed::OnFrimble+0x1f8914:
    00007fff`21af2d1c 00488b          add     byte ptr [rax-75h],cl
    00007fff`21af2d1f 038b5530488b    add     ecx,dword ptr [rbx-74B7CFABh]
    00007fff`21af2d25 cb              retf
    // ^^ this doesn't lead to the crashed instruction
    
    0:012> u .-6 l3
    nosebleed!CNosebleed::OnFrimble+0x1f8913:
    00007fff`21af2d1b 0000            add     byte ptr [rax],al
    00007fff`21af2d1d 488b03          mov     rax,qword ptr [rbx]
    00007fff`21af2d20 8b5530          mov     edx,dword ptr [rbp+30h]
    // ^^ this doesn't lead to the crashed instruction
    

    Exercise: Why didn't I bother checking .-2?

    You only need to test as far back as the maximum instruction length, and in practice you can give up much sooner because the maximimum instruction length involves a lot of prefixes which are unlikely to occur in real code.

    The only single-instruction rewind that makes sense is the push rbp. Let's see if it matches.

    0:011> ?? @rbp
    unsigned int64 0x453e700
    0:011> dps @rsp l1
    00000000`0453e698  00000000`0453e700
    

    Yup, it lines up. This wayward push is also consistent with the stack frame layout for the function.

    nosebleed!CNosebleed::OnFrimble:
    00007fff`218fa408 48895c2410      mov     qword ptr [rsp+10h],rbx
    00007fff`218fa40d 4889742418      mov     qword ptr [rsp+18h],rsi
    00007fff`218fa412 55              push    rbp
    00007fff`218fa413 57              push    rdi
    00007fff`218fa414 4154            push    r12
    00007fff`218fa416 4156            push    r14
    00007fff`218fa418 4157            push    r15
    00007fff`218fa41a 488bec          mov     rbp,rsp
    00007fff`218fa41d 4883ec60        sub     rsp,60h
    

    The values of rbp and rsp should differ by 0x60.

    0:012> ?? @rbp-@rsp
    unsigned int64 0x68
    

    The difference is in error by 8 bytes, exactly the size of the rbp register that was pushed.

    It therefore seems highly likely that the push rbp was executed.

    Repeating the exercise to find the instruction before the push rbp shows that no instruction fell through to the push rbp. Therefore, execution jumped to 00007fff`21af2d21 somehow.

    Another piece of data is that rax matches the value we expect it to have, sort of. Here are some selected lines from earlier in the debug session:

    // What we expected to have executed
    00007fff`21af2d1e 8b03            mov     eax,dword ptr [rbx]
    
    // The value we expected to have fetched
    0:011> dps @rbx l1
    00000000`02b4b790  00007fff`219c50a0 nosebleed!CCereal::`vftable'
    
    // The value in the rax register
    rax=00007fff219c5000 ...
    

    The value we expect is 00007fff`219c50a0, but the value in the register has the bottom eight bits cleared.

    Putting this all together, my theory is that the CPU executed the instruction at 00007fff`21af2d1e, and then due to some sort of hardware failure, instead of incrementing the rip register by two, it (1) incremented it by three, and then (2) as part of its confusion, zeroed out the bottom byte of rax. The erroneous rip led to the rogue push rbp and the crash on the nonsensical xor.

    It's not a great theory, but it's all I got.

    As to what sort of hardware failure could have occurred: This particular failure was reported twice, so a cosmic ray is less likely to be the culprit (because you have to get lightning to strike twice) than overheating or overclocking.

  • The Old New Thing

    It rather involved being on the other side of this airtight hatchway: Account vulnerable to Active Directory administrator

    • 45 Comments

    A security vulnerability report came in that went something like this:

    Disclosure of arbitrary data from any user

    An attacker can obtain arbitrary data from any user by means of the following steps:

    1. Obtain administrative access on the domain controller.
    2. Stop the XYZZY service.
    3. Edit the XYZZY.DAT file in a hex editor and changes the bytes starting at offset 0x4242 as follows:
    4. ...

    There's no point continuing, because the first step assumes that you are on the other side of the airtight hatchway. If you have compromised the domain controller, then you control the domain. From there, all the remaining steps are just piling on style points and cranking up the degree of difficulty.

    A much less roundabout attack is as follows:

    1. Obtain administrative access on the domain controller.
    2. Deploy a logon script to all users that does whatever you want.
    3. Wait for the user to log in next, and your script will DO ANYTHING YOU WANT.

    No, wait, I can make it even easier.

    1. Obtain administrative access on the domain controller.
    2. Change the victim's password.
    3. Log on as that user and DO ANYTHING YOU WANT.

    You are the domain administrator. You already pwn the domain. That you can pwn a domain that you pwn is really not much of a surprise.

    This is why it is important to choose your domain administrators carefully.

  • The Old New Thing

    The psychology of confirmation, or something, I don't know what to call it

    • 46 Comments

    There is probably a name for this phenomenon. I will illustrate it below.

    "Is there a way to configure the system to do X?"

    Go to the Y dialog and select Z.

    "It doesn't work."

    I just tried it. It works for me. I'm using ⟨configuration details⟩.

    "Thanks. It's working."

  • The Old New Thing

    Microspeak: Redlines

    • 30 Comments

    To the outside world, redline can mean to mark something for removal, or it could mean the maximum safe speed of an engine. But in the world of Microsoft design, the term redlines (pronounced as if it were written as the two words red lines, but the accent is on the red) refers to a diagram showing the exact position of visual elements. They typically take the form of a proposed screen shot, with arrows and lines superimposed to indicate the distances between items, which items align with each other, and so on. They also contain indications as to the exact colors to use for different elements.

    Originally the lines and arrows were actually red, hence the name. Here's an example of something that gives you the idea:

    These aren't real redlines because the diagram doesn't contain any indications about the colors to use, and more complete redlines would include diagrams showing the hover, pressed, and disabled states.

  • The Old New Thing

    How to view the stack of a user-mode thread when its kernel stack has been paged out

    • 4 Comments

    Suppose you have a machine that has crashed, and your investigation shows that the reason is that there is a critical section that everybody is waiting for. While waiting for that critical section, work piles up, and eventually the machine keels over. Suppose further that this crash is given to you in the form of a kernel debugger.

    In case it wasn't obvious, by "you" I mean "me".

    Okay, so the critical section that is the cause of the logjam is this one:

    1: kd> !cs CONTOSO!g_csDataLock
    -----------------------------------------
    Critical section   = 0x00007ff7f0ed2f68 (CONTOSO!g_csDataLock+0x0)
    DebugInfo          = 0x0000000022f2efd0
    LOCKED
    LockCount          = 0x5D
    WaiterWoken        = No
    OwningThread       = 0x0000000000004228
    RecursionCount     = 0x1
    LockSemaphore      = 0x17A0
    SpinCount          = 0x00000000020007cb
    

    "Great," you say. "I just need to look at thread 0x4228 to see why it is stuck.

    1: kd> !process -1 4
    PROCESS ffffe000047ae900
        SessionId: 1  Cid: 0604    Peb: 7ff74ecfa000  ParentCid: 05cc
        DirBase: 0eb07000  ObjectTable: ffffc000014c5680  HandleCount: 7003.
        Image: contoso.exe
    
            ...
            THREAD ffffe0000c136080  Cid 0604.4228  Teb: 00007ff74e94c000 Win32Thread: fffff90144edea60 WAIT
            ...
    

    Woo-hoo, there's the thread. Now I just need to switch to its context to see what it is stuck on.

    1: kd> .thread ffffe0000c136080
    Can't retrieve thread context, Win32 error 0n30
    

    Okay, that didn't work out too well. Now what?

    Even though the kernel stack is paged out, the user-mode stack may still be available.

    1: kd> !thread ffffe0000c136080
    THREAD ffffe0000c136080  Cid 0604.4228  Teb: 00007ff74e94c000
           Win32Thread: fffff90144edea60 WAIT: (UserRequest) UserMode Non-Alertable
        ffffe000077a7830  NotificationEvent
    Not impersonating
    DeviceMap                 ffffc00000e89c80
    Owning Process            ffffe000047ae900       Image:         contoso.exe
    Attached Process          N/A            Image:         N/A
    Wait Start TickCount      12735890       Ticks: 328715 (0:01:25:36.171)
    Context Switch Count      75             IdealProcessor: 2
    UserTime                  00:00:00.000
    KernelTime                00:00:00.031
    Kernel stack not resident.
    

    The limits of the user-mode stack are kept in the Teb.

    2: kd> !teb 00007ff74e94c000
    TEB at 00007ff74e94c000
        ExceptionList:        0000000000000000
        StackBase:            00000001027d0000
        StackLimit:           00000001027c2000
        SubSystemTib:         0000000000000000
        FiberData:            0000000000001e00
        ArbitraryUserPointer: 0000000000000000
        Self:                 00007ff74e94c000
        EnvironmentPointer:   0000000000000000
        ClientId:             0000000000000604 . 0000000000004228
        RpcHandle:            0000000000000000
        Tls Storage:          000000010132dfc0
        PEB Address:          00007ff74ecfa000
        LastErrorValue:       1008
        LastStatusValue:      c0000034
        Count Owned Locks:    2
        HardErrorMode:        0
    

    We now use the trick we learned some time ago where we grovel the stack of a thread without knowing where its stack pointer is.

    In this case, the groveling is made easier because we already know that everybody is waiting on the data lock. The data lock is taken in only two functions, so it's a matter ot looking for any occurrences of one of those two functions. And here it is: Data­Wrapper::Verify­Data.

    00000001`027cef88  00007ffe`f1bbac9a NTDLL!NtWaitForSingleObject+0xa
    00000001`027cef90  00000000`00006ac8
    00000001`027cef98  00007ffe`ef1bd085 KERNELBASE!WaitForSingleObjectEx+0xa5
    00000001`027cefa0  00000000`00006ac8
    00000001`027cefa8  00000000`00000000
    00000001`027cefb0  00000000`026d0000
    00000001`027cefb8  00000000`00000000
    00000001`027cefc0  00000001`01739ee0
    00000001`027cefc8  00000000`00000001
    00000001`027cefd0  00000000`00000048
    00000001`027cefd8  00000001`00000001
    00000001`027cefe0  00000000`00000000
    00000001`027cefe8  00000000`00000000
    00000001`027ceff0  00000000`00000000
    00000001`027ceff8  00000000`00000000
    00000001`027cf000  00000000`00000000
    00000001`027cf008  00000000`00000000
    00000001`027cf010  00000000`00000000
    00000001`027cf018  00007ff7`eefe6540 FABRIKAM!Lock::IsInitialized+0xfc
    00000001`027cf020  00000000`00000000
    00000001`027cf028  00000000`3b803fa0
    00000001`027cf030  00000000`00006ac8
    00000001`027cf038  00007ff7`eeff3c43 FABRIKAM!AccessRequest::WaitTimeout+0x87
    00000001`027cf040  00000000`ffffffff
    00000001`027cf048  10f513ec`6a161fb8
    00000001`027cf050  00000000`00000000
    00000001`027cf058  00000000`00006ac8
    00000001`027cf060  00000000`3b803fb8
    00000001`027cf068  00007ff7`eeff3cc0 FABRIKAM!AccessRequest::Wait+0x18
    00000001`027cf070  00000000`8007000e
    00000001`027cf078  00000000`fd416ff0
    00000001`027cf080  4fe80c4a`51236583
    00000001`027cf088  00000000`3b803fb8
    00000001`027cf090  00000000`ffffffff
    00000001`027cf098  00000000`00000000
    00000001`027cf0a0  00000000`3b803fa0
    00000001`027cf0a8  00007ff7`da8375b7 FABRIKAM!DataAccess::RequestAccess+0x93
    00000001`027cf0b0  00000000`3b803fa0
    00000001`027cf0b8  00000000`3b803fb8
    00000001`027cf0c0  4fe80c4a`51236583
    00000001`027cf0c8  10f513ec`6a161fb8
    00000001`027cf0d0  00000000`00000076
    00000001`027cf0d8  00007ff7`f07a8619 CONTOSO!Widget::SetColor+0x9
    00000001`027cf0e0  000095dc`90897985
    00000001`027cf0e8  00000000`00000110
    00000001`027cf0f0  00000000`00000000
    00000001`027cf0f8  00000000`00000000
    00000001`027cf100  00007ff7`4dd8000c
    00000001`027cf108  00007ff7`f07a85e5 CONTOSO!Widget::UpdateColor+0x39
    00000001`027cf110  00000000`ffc03f90
    00000001`027cf118  00000001`027cf1e8
    00000001`027cf120  00000000`ffc03fc8
    00000001`027cf128  00000000`00000000
    00000001`027cf130  00000000`00000000
    00000001`027cf138  00007ff7`4dd8000c
    00000001`027cf140  00007ff7`da5bc420 FABRIKAM!DataAccess::`vftable'+0x18
    00000001`027cf148  00000000`ffc03f90
    00000001`027cf150  00000001`027cf260
    00000001`027cf158  00007ff7`f0b3459c CONTOSO!DataWrapper::VerifyData+0x428
    00000001`027cf160  00000000`3b803fa0
    00000001`027cf168  00007ff7`f0ed1d30 CONTOSO!g_DataManager
    00000001`027cf170  00000000`fe196f10
    00000001`027cf188  00000000`00000001
    00000001`027cf190  00000000`00000000
    00000001`027cf198  00000001`027cf270
    00000001`027cf1a0  00000001`027cf480
    00000001`027cf1a8  00007ff7`00000001
    00000001`027cf1b0  00000001`027cf220 
    00000001`027cf1b8  00000000`00000000 
    00000001`027cf1c0  00000000`00000000 
    

    I left the red herrings in place just to make things a little more interesting.

    The Data­Wrapper::Verify­Data method enters the critical section and then calls Data­Access::Request­Access via a virtual method call:

    00007ff7`f0b3458f mov     dword ptr [rsp+28h],eax
    00007ff7`f0b34593 mov     eax,dword ptr [rsi+10h]
    00007ff7`f0b34596 mov     dword ptr [rsp+20h],eax
    00007ff7`f0b3459a call    qword ptr [rdi] ←
    

    Let's disassemble the start of Data­Access::Request­Access to see how it sets up its stack. This will help us interpret the other values in the stack dump.

    0: kd> u 00007ff7`da8375b7-93
    FABRIKAM!DataAccess::RequestAccess
    ;; function prologue
    00007ff7`da837524 mov     rax,rsp
    00007ff7`da837527 mov     qword ptr [rax+8],rbx
    00007ff7`da83752b mov     qword ptr [rax+20h],r9
    00007ff7`da83752f mov     qword ptr [rax+18h],r8
    00007ff7`da837533 mov     qword ptr [rax+10h],rdx
    00007ff7`da837537 push    rbp
    00007ff7`da837538 push    rsi
    00007ff7`da837539 push    rdi
    00007ff7`da83753a push    r12
    00007ff7`da83753c push    r13
    00007ff7`da83753e push    r14
    00007ff7`da837540 push    r15
    00007ff7`da837542 sub     rsp,70h
    ;; function body
    00007ff7`da837546 lea     rsi,[rcx+18h]
    00007ff7`da83754a mov     rdi,rcx
    00007ff7`da83754d mov     rbp,r9
    00007ff7`da837550 mov     rcx,rsi
    00007ff7`da837553 call    qword ptr [FABRIKAM!_imp_EnterCriticalSection]
    00007ff7`da837559 xor     r13b,r13b
    00007ff7`da83755c mov     ebx,8007000Eh
    ...
    00007ffe`da8375b1 call    FABRIKAM!AccessRequest::Wait
    

    We can replay the above code in our head and annotate the stack trace accordingly. On entry to the function, the stack pointer is 00000001`027cf158 (the return address). The function stashes some registers in the caller-provided spill area and it pushes others onto the stack, and then it subtracts some space for local variables as well as for outbound parameters of functions it intends to call.

    /-00000001`027cf0b0  00000000`3b803fa0
    | 00000001`027cf0b8  00000000`3b803fb8
    | 00000001`027cf0c0  4fe80c4a`51236583
    | 00000001`027cf0c8  10f513ec`6a161fb8
    | 00000001`027cf0d0  00000000`00000076
    | 00000001`027cf0d8  00007ff7`f07a8619 CONTOSO!Widget::SetColor+0x9
    | 00000001`027cf0e0  000095dc`90897985
    | 00000001`027cf0e8  00000000`00000110
    | 00000001`027cf0f0  00000000`00000000
    | 00000001`027cf0f8  00000000`00000000
    | 00000001`027cf100  00007ff7`4dd8000c
    | 00000001`027cf108  00007ff7`f07a85e5 CONTOSO!Widget::UpdateColor+0x39
    | 00000001`027cf110  00000000`ffc03f90
    \-00000001`027cf118  00000001`027cf1e8
      00000001`027cf120  00000000`ffc03fc8 // VerifyData's r15
      00000001`027cf128  00000000`00000000 // VerifyData's r14
      00000001`027cf130  00000000`00000000 // VerifyData's r13
      00000001`027cf138  00007ff7`4dd8000c // VerifyData's r12
      00000001`027cf140  00007ff7`da5bc420 FABRIKAM!DataAccess::`vftable'+0x18 // VerifyData's rdi
      00000001`027cf148  00000000`ffc03f90 // VerifyData's rsi
      00000001`027cf150  00000001`027cf260 // VerifyData's rbp
      00000001`027cf158  00007ff7`f0b3459c CONTOSO!DataWrapper::VerifyData+0x428 ← ESP is here
      00000001`027cf160  00000000`3b803fa0 // VerifyData's rbx
      00000001`027cf168  00007ff7`f0ed1d30 CONTOSO!g_DataManager // VerifyData's rdx
      00000001`027cf170  00000000`fe196f10 // VerifyData's r8
      00000001`027cf188  00000000`00000001 // VerifyData's r9
      00000001`027cf190  00000000`00000000
      00000001`027cf198  00000001`027cf270
      00000001`027cf1a0  00000001`027cf480
      00000001`027cf1a8  00007ff7`00000001
      00000001`027cf1b0  00000001`027cf220 
      00000001`027cf1b8  00000000`00000000 
      00000001`027cf1c0  00000000`00000000 
    

    The region marked in brackets is the 0x70 bytes of space for local variables and outbound parameters. Notice that some red herring function pointers are in that space. Those are probably variables that haven't been initialized yet, and the memory happened previously to have been used to hold some return addresses.

    A reassuring observation is that the rdx coming from Verify­Data is the address of CONTOSO!g_Data­Manager. That is the second function parameter (or first, if you aren't counting the hidden this) to Request­Access.

    Another reassuring observation is that that Verify­Data's rdi points into the vtable for Data­Access, since that matches the code we saw at the call point: call qword ptr [rdi].

    The mov rdi, rcx instruction in the function body tells us that the function stashed its this pointer in rdi. That's good info to keep track of, because that will let us look at the Data­Access object once we figure out what is in rdi.

    The next function on the stack is Access­Request::Wait.

    FABRIKAM!AccessRequest::Wait:
    00007ff7`eeff3ca8 sub     rsp,38h
    00007ffe`eeff3cb3 mov     dword ptr [rsp+20h],0FFFFFFFFh
    00007ffe`eeff3cbb call    FABRIKAM!AccessRequest::WaitTimeout
    00007ffe`eeff3cc0 add     rsp,38h
    00007ffe`eeff3cc4 ret
    

    This function doesn't bother saving any registers; it just reserves space for local variables and outbound parameters. From inspection, you can see that this is a simple wrapper that passes all its parameters onward to Wait­Timeout, with an INFINITE tacked onto the end, so this function has no local variables at all. Everything is just for outbound parameters.

    We can annotate some more entries in our stack trace.

    00000001`027cf070  00000000`8007000e // spill space for WaitTimeout
    00000001`027cf078  00000000`fd416ff0 // spill space for WaitTimeout
    00000001`027cf080  4fe80c4a`51236583 // spill space for WaitTimeout
    00000001`027cf088  00000000`3b803fb8 // spill space for WaitTimeout
    00000001`027cf090  00000000`ffffffff // INFINITE parameter
    00000001`027cf098  00000000`00000000 // unused
    00000001`027cf0a0  00000000`3b803fa0 // unused
    00000001`027cf0a8  00007ff7`da8375b7 FABRIKAM!DataAccess::RequestAccess+0x93
    

    The next function on the list is Access­Request::Wait­Timeout.

    FABRIKAM!AccessRequest::WaitTimeout:
    00007ff7`eeff3bbc mov     qword ptr [rsp+8],rbx
    00007ff7`eeff3bc1 mov     qword ptr [rsp+10h],rbp
    00007ff7`eeff3bc6 push    rsi
    00007ff7`eeff3bc7 sub     rsp,20h
    00007ff7`eeff3bcb mov     ebx,edx
    00007ff7`eeff3bcd mov     rsi,rcx
    00007ff7`eeff3bd0 mov     edx,0Bh
    00007ff7`eeff3bd5 mov     rcx,r8
    

    This function stashes two registers in the parameter spill space, pushes one onto the stack, and reserves another 0x20 bytes for local use (outbound parameters).

    00000001`027cf040  00000000`ffffffff // spill space for WaitForSingleObjectEx
    00000001`027cf048  10f513ec`6a161fb8 // spill space for WaitForSingleObjectEx
    00000001`027cf050  00000000`00000000 // spill space for WaitForSingleObjectEx
    00000001`027cf058  00000000`00006ac8 // spill space for WaitForSingleObjectEx
    00000001`027cf060  00000000`3b803fb8 // Wait's rsi
    00000001`027cf068  00007ff7`eeff3cc0 FABRIKAM!AccessRequest::Wait+0x18
    00000001`027cf070  00000000`8007000e // Wait's rbx
    00000001`027cf078  00000000`fd416ff0
    00000001`027cf080  4fe80c4a`51236583
    00000001`027cf088  00000000`3b803fb8
    

    Notice that the stashed rbx value is 8007000E, which conveniently lines up with the mov ebx,8007000Eh instruction in Data­Access::Request­Access. That's a bit reassuring, since it's another sign that we're on the right track.

    Next up is Wait­For­Single­Object­Ex.

    0: kd> u 00007ffe`ef1bd085 -a5
    KERNELBASE!WaitForSingleObjectEx
    00007ffe`ef1bcfe0 mov     r11,rsp
    00007ffe`ef1bcfe3 mov     qword ptr [r11+8],rbx
    00007ffe`ef1bcfe7 mov     dword ptr [r11+18h],r8d
    00007ffe`ef1bcfeb push    rsi
    00007ffe`ef1bcfec push    rdi
    00007ffe`ef1bcfed push    r14
    00007ffe`ef1bcfef sub     rsp,80h
    00007ffe`ef1bcff6 mov     ebx,r8d
    

    Incorporating this prologue into our stack annotation yields

    /-00000001`027cefa0  00000000`00006ac8 // spill space for NtWaitForSingleObject
    | 00000001`027cefa8  00000000`00000000 // spill space for NtWaitForSingleObject
    | 00000001`027cefb0  00000000`026d0000 // spill space for NtWaitForSingleObject
    | 00000001`027cefb8  00000000`00000000 // spill space for NtWaitForSingleObject
    | 00000001`027cefc0  00000001`01739ee0
    | 00000001`027cefc8  00000000`00000001
    | 00000001`027cefd0  00000000`00000048
    | 00000001`027cefd8  00000001`00000001
    | 00000001`027cefe0  00000000`00000000
    | 00000001`027cefe8  00000000`00000000
    | 00000001`027ceff0  00000000`00000000
    | 00000001`027ceff8  00000000`00000000
    | 00000001`027cf000  00000000`00000000
    | 00000001`027cf008  00000000`00000000
    | 00000001`027cf010  00000000`00000000
    \-00000001`027cf018  00007ff7`eefe6540 FABRIKAM!Lock::IsInitialized+0xfc
      00000001`027cf020  00000000`00000000 // WaitTimeout's r14
      00000001`027cf028  00000000`3b803fa0 // WaitTimeout's rdi
      00000001`027cf030  00000000`00006ac8 // WaitTimeout's rsi
      00000001`027cf038  00007ff7`eeff3c43 FABRIKAM!AccessRequest::WaitTimeout+0x87
      00000001`027cf040  00000000`ffffffff // WaitTimeout's rbx
      00000001`027cf048  10f513ec`6a161fb8
      00000001`027cf050  00000000`00000000 // WaitTimeout's r8
      00000001`027cf058  00000000`00006ac8
    

    Ooh, another red herring function pointer got caught in the local variables.

    Putting everything together results in the following annotated stack, with red herrings removed.

    00000001`027cef88  00007ffe`f1bbac9a NTDLL!NtWaitForSingleObject+0xa
    00000001`027cef90  00000000`00006ac8
    00000001`027cef98  00007ffe`ef1bd085 KERNELBASE!WaitForSingleObjectEx+0xa5
    00000001`027cefa0  00000000`00006ac8
    00000001`027cefa8  00000000`00000000
    00000001`027cefb0  00000000`026d0000
    00000001`027cefb8  00000000`00000000
    00000001`027cefc0  00000001`01739ee0
    00000001`027cefc8  00000000`00000001
    00000001`027cefd0  00000000`00000048
    00000001`027cefd8  00000001`00000001
    00000001`027cefe0  00000000`00000000
    00000001`027cefe8  00000000`00000000
    00000001`027ceff0  00000000`00000000
    00000001`027ceff8  00000000`00000000
    00000001`027cf000  00000000`00000000
    00000001`027cf008  00000000`00000000
    00000001`027cf010  00000000`00000000
    00000001`027cf018  00007ff7`eefe6540
    00000001`027cf020  00000000`00000000 // WaitTimeout's r14
    00000001`027cf028  00000000`3b803fa0 // WaitTimeout's rdi
    00000001`027cf030  00000000`00006ac8 // WaitTimeout's rsi
    00000001`027cf038  00007ff7`eeff3c43 FABRIKAM!AccessRequest::WaitTimeout+0x87
    00000001`027cf040  00000000`ffffffff // WaitTimeout's rbx
    00000001`027cf048  10f513ec`6a161fb8
    00000001`027cf050  00000000`00000000 // WaitTimeout's r8
    00000001`027cf058  00000000`00006ac8
    00000001`027cf060  00000000`3b803fb8 // Wait's rsi
    00000001`027cf068  00007ff7`eeff3cc0 FABRIKAM!AccessRequest::Wait+0x18
    00000001`027cf070  00000000`8007000e // Wait's rbx
    00000001`027cf078  00000000`fd416ff0
    00000001`027cf080  4fe80c4a`51236583
    00000001`027cf088  00000000`3b803fb8
    00000001`027cf090  00000000`ffffffff // INFINITE parameter
    00000001`027cf098  00000000`00000000
    00000001`027cf0a0  00000000`3b803fa0
    00000001`027cf0a8  00007ff7`da8375b7 FABRIKAM!DataAccess::RequestAccess+0x93
    00000001`027cf0b0  00000000`3b803fa0
    00000001`027cf0b8  00000000`3b803fb8
    00000001`027cf0c0  4fe80c4a`51236583
    00000001`027cf0c8  10f513ec`6a161fb8
    00000001`027cf0d0  00000000`00000076
    00000001`027cf0d8  00007ff7`f07a8619
    00000001`027cf0e0  000095dc`90897985
    00000001`027cf0e8  00000000`00000110
    00000001`027cf0f0  00000000`00000000
    00000001`027cf0f8  00000000`00000000
    00000001`027cf100  00007ff7`4dd8000c
    00000001`027cf108  00007ff7`f07a85e5
    00000001`027cf110  00000000`ffc03f90
    00000001`027cf118  00000001`027cf1e8
    00000001`027cf120  00000000`ffc03fc8 // VerifyData's r15
    00000001`027cf128  00000000`00000000 // VerifyData's r14
    00000001`027cf130  00000000`00000000 // VerifyData's r13
    00000001`027cf138  00007ff7`4dd8000c // VerifyData's r12
    00000001`027cf140  00007ff7`da5bc420 FABRIKAM!DataAccess::`vftable'+0x18 // VerifyData's rdi
    00000001`027cf148  00000000`ffc03f90 // VerifyData's rsi
    00000001`027cf150  00000001`027cf260 // VerifyData's rbp
    00000001`027cf158  00007ff7`f0b3459c CONTOSO!DataWrapper::VerifyData+0x428
    00000001`027cf160  00000000`3b803fa0 // VerifyData's rbx
    00000001`027cf168  00007ff7`f0ed1d30 CONTOSO!g_DataManager // VerifyData's rdx
    00000001`027cf170  00000000`fe196f10 // VerifyData's r8
    00000001`027cf188  00000000`00000001 // VerifyData's r9
    00000001`027cf190  00000000`00000000
    00000001`027cf198  00000001`027cf270
    00000001`027cf1a0  00000001`027cf480
    00000001`027cf1a8  00007ff7`00000001
    00000001`027cf1b0  00000001`027cf220 
    00000001`027cf1b8  00000000`00000000 
    00000001`027cf1c0  00000000`00000000 
    

    From this, we can also suck out the this pointer passed to Data­Access::Request­Access. We saw that it was stashed in rdi. The Wait function doesn't use rdi (because if it did, it would have saved the old value), so its rdi is the same as Request­Access's rdi. Similarly, the Wait­Timeout function does not use rdi. Therefore, when Wait­For­Single­Object saves the rdi register, it is saving the value from Data­Access::Request­Access.

    00000001`027cf028  00000000`3b803fa0 // WaitTimeout DataAccess's rdi
    

    And that is the this pointer that lets us study the Data­Access object to figure out why its access request is not completing.

Page 1 of 95 (941 items) 12345»