• The Old New Thing

    Why is there a BSTR cache anyway?

    • 25 Comments

    The Sys­Alloc­String function uses a cache of BSTRs which can mess up your performance tracing. There is a switch for disabling it for debugging purposes, but why does the cache exist at all?

    The BSTR cache is a historical artifact. When BSTRs were originally introduced, performance tracing showed that a significant chunk of time was spent merely allocating and freeing memory in the heap manager. Using a cache reduced the heap allocation overhead significantly.

    In the intervening years, um, decades, the performance of the heap manager has improved to the point where the cache isn't necessary any more. But the Sys­Alloc­String people can't get rid of the BSTR because so many applications unwittingly rely on it.

    The BSTR cache is now a compatibility constraint.

  • The Old New Thing

    No good deed goes unpunished: Marking a document as obsolete

    • 43 Comments

    I was contacted by a customer support liaison who was hoping that I could help them understand Feature X.

    I saw your name on a "Feature X technical specification" document in the Windows specification repository, and I was hoping you could answer a few questions for me, or redirect me to somebody who can.

    I was puzzled why this person saw my name on the "Feature X technical specification" document in the Windows specification repository, because I was not the author of that specification. I went to the specification repository, opened the document in question, and nope, my name appears nowhere in it.

    I asked, "What gave you the impression that I had anything to do with Feature X? XYZ can help you with your questions; he's the one listed as the author of the document."

    The response was, "Oh, I'm sorry. I didn't actually read the specification. I merely did a search through the entire repository for Feature X, and the "Feature X technical specification" is the one that showed up as most recently updated by you. In the past, this technique has been pretty good at finding someone who can help with a feature. Sorry about that."

    I went back and took another look at the document, and then I remembered why I updated it: My duties at the time included reviewing all documents that met certain criteria, such as this particular document. I had some feedback about the document for the author, who told me, "Oh, that's an obsolete version of the document, but it's retained for historical purposes. The current one is over there." To save the next person some time, I edited the obsolete document by inserting in big letters at the top, "TECHNICAL DOCUMENTATION FOR THIS FEATURE HAS MOVED TO ⟨new location⟩. THIS DOCUMENT IS OBSOLETE." I could've asked the author to do this, but I had the document open already, so I figured I'd save a few steps (ask author to update document, wait for reply, reopen document to verify that edit occurred) and just do it myself.

    Boom, no good deed goes unpunished. My update was made long after the real technical specification was completed. As a result, of all the documents on Feature X, not only is it the obsolete one that shows up as most recently updated, but I am the one listed as the person who made that most recent update.

    Next time, I'll try to remember to do things the long way, even though it is big hassle for everybody.

    "Please update the document to indicate that it is obsolete and redirect the reader to the current document."

    — Could you do that? You've already got the document open.

    "No, I used to do that, but it came back and bit me, because I become the person to edit the document last, and then everybody comes to me with questions about the document instead of you."

    — You do realize that in the time you tried to convince me to do it, you could've just done it.

    Follow-up: I tried it, and sometimes the response was "I'm really busy now, I'll get around to it in a few weeks." Now I have to create a reminder task in two weeks to follow up. More hassle for everybody.

    I think the next time this happens, I'll write back, "I'm coming over to your office. I'll make the one-line edit on your computer so that your name is the one attached to the edit."

  • The Old New Thing

    More notes on calculating constants in SSE registers

    • 10 Comments

    A few weeks ago I noted some tricks for creating special bit patterns in all lanes, but I forgot to cover the case where you treat the 128-bit register as one giant lane: Setting all of the least significant N bits or all of the most significant N bits.

    This is a variation of the trick for setting a bit pattern in all lanes, but the catch is that the pslldq instruction shifts by bytes, not bits.

    We'll assume that N is not a multiple of eight, because if it were a multiple of eight, then the pslldq or psrldq instruction does the trick (after using pcmpeqd to fill the register with ones).

    One case is if N ≤ 64. This is relatively easy because we can build the value by first building the desired value in both 64-bit lanes, and then finishing with a big pslldq or psrldq to clear the lane we don't like.

    ; set the bottom N bits, where N ≤ 64
    pcmpeqd xmm0, xmm0 ; FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF
    unsigned shift right
    64 − N bits
    unsigned shift right
    64 − N bits
    psrlq   xmm0, 64 - N ; 0000 0000 0FFF FFFF 0000 0000 0FFF FFFF
    unsigned shift right 64 bits
    psrldq  xmm0, 8 ; 0000 0000 0000 0000 0000 0000 0FFF FFFF
     
    ; set the top N bits, where N ≤ 64
    pcmpeqd xmm0, xmm0 ; FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF
    unsigned shift left
    64 − N bits
    unsigned shift left
    64 − N bits
    psllq   xmm0, 64 - N ; FFFF FFF0 0000 0000 FFFF FFF0 0000 0000
    unsigned shift left 64 bits
    pslldq  xmm0, 8 ; FFFF FFF0 0000 0000 0000 0000 0000 0000

    If N ≥ 80, then we shift in zeroes into the top and bottom half, but then use a shuffle to patch up the half that needs to stay all-ones.

    ; set the bottom N bits, where N ≥ 80
    pcmpeqd xmm0, xmm0 ; FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF
    unsigned shift right
    128 − N bits
    unsigned shift right
    128 − N bits
    psrlq   xmm0, 128 - N ; 0000 0000 0FFF FFFF 0000 0000 0FFF FFFF
    copy shuffle
    pshuflw xmm0, _MM_SHUFFLE(0, 0, 0, 0) ; 0000 0000 0FFF FFFF FFFF FFFF FFFF FFFF
     
    ; set the top N bits, where N ≥ 80
    pcmpeqd xmm0, xmm0 ; FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF
    unsigned shift left
    128 − N bits
    unsigned shift left
    128 − N bits
    psllq   xmm0, 128 - N ; FFFF FFF0 0000 0000 FFFF FFF0 0000 0000
    shuffle copy
    pshufhw xmm0, _MM_SHUFFLE(3, 3, 3, 3) ; FFFF FFFF FFFF FFFF FFFF FFF0 0000 0000

    We have N ≥ 80, which means that 128 - N ≤ 48, which means that there are at least 16 bits of ones left in low-order bits after we shift right. We then use a 4×16-bit shuffle to copy those known-all-ones 16 bits into the other lanes of the lower half. (A similar argument applies to setting the top bits.)

    This leaves 64 < N < 80. That uses a different trick:

    ; set the bottom N bits, where N ≤ 120
    pcmpeqd xmm0, xmm0 ; FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF
    unsigned shift right 8 bits
    psrldq  xmm0, 1 ; 00FF FFFF FFFF FFFF FFFF FFFF FFFF FFFF
    signed shift right
    120 − N bits
    signed shift right
    120 − N bits
    psrad  xmm0, 120 - N ; 0000 00FF FFFF FFFF FFFF FFFF FFFF FFFF

    The sneaky trick here is that we use a signed shift in order to preserve the bottom half. Unfortunately, there is no corresponding left shift that shifts in ones, so the best I can come up with is four instructions:

    ; set the top N bits, where 64 ≤ N ≤ 96
    pcmpeqd xmm0, xmm0 ; FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF
    unsigned shift left
    96 − N bits
    unsigned shift left
    96 − N bits
    psllq   xmm0, 96 - N ; FFFF FFFF FFF0 0000 FFFF FFFF FFF0 0000
    shuffle
    pshufd  xmm0, _MM_SHUFFLE(3, 3, 1, 0) ; FFFF FFFF FFFF FFFF FFFF FFFF FFF0 0000
    unsigned shift left 32 bits
    pslldq  xmm0, 4 ; FFFF FFFF FFFF FFFF FFFF FF00 0000 0000

    We view the 128-bit register as four 32-bit lanes. split the shift into two steps. First, we fill Lane 0 with the value we ultimately want in Lane 1, then we patch up the damage we did to Lane 2, then we do a shift the 128-bit value left 32 places to slide the value into position and zero-fill Lane 0.

    Note that a lot of the ranges of N overlap, so you often have a choice of solutions. There are other three-instruction solutions I didn't bother presenting here. The only one I couldn't find a three-instruction solution for was setting the top N bits where 64 < N < 80.

    If you find a three-instruction solution for this last case, share it in the comments.

  • The Old New Thing

    How do you prevent the linker from discarding a function you want to make available for debugging?

    • 22 Comments

    We saw some time ago that you can ask the Windows symbolic debugger engine to call a function directly from the debugger. To do this, of course, the function needs to exist.

    But what if you want a function for the sole purpose of debugging? It never gets called from the main program, so the linker will declare the code dead and remove it.

    One sledgehammer solution is to disable discarding of unused functions. This the global solution to a local problem, since you are now preventing the discard of any unused function, even though all you care about is one specific function.

    If you are comfortable hard-coding function decorations for specific architectures, you can use the /INCLUDE directive.

    #if defined(_X86_)
    #define DecorateCdeclFunctionName(fn) "_" #fn
    #elif defined(_AMD64_)
    #define DecorateCdeclFunctionName(fn) #fn
    #elif defined(_IA64_)
    #define DecorateCdeclFunctionName(fn) "." #fn
    #elif defined(_ALPHA_)
    #define DecorateCdeclFunctionName(fn) #fn
    #elif defined(_MIPS_)
    #define DecorateCdeclFunctionName(fn) #fn
    #elif defined(_PPC_)
    #define DecorateCdeclFunctionName(fn) ".." #fn
    #else
    #error Unknown architecture - don't know how it decorates cdecl.
    #endif
    #pragma comment(linker, "/include:" DecoratedCdeclFunctionName(TestMe))
    EXTERN_C void __cdecl TestMe(int x, int y)
    {
        ...
    }
    

    If you are not comfortable with that (and I don't blame you), you can create a false reference to the debugging function that cannot be optimized out. You do this by passing a pointer to the debugging function to a helper function outside your module that doesn't do anything interesting. Since the helper function is not in your module, the compiler doesn't know that the helper function doesn't do anything, so it cannot optimize out the debugging function.

    struct ForceFunctionToBeLinked
    {
      ForceFunctionToBeLinked(const void *p) { SetLastError(PtrToInt(p)); }
    };
    
    ForceFunctionToBeLinked forceTestMe(TestMe);
    

    The call to Set­Last­Error merely updates the thread's last-error code, but since this is not called at a time where anybody cares about the last-error code, it is has no meaningful effect. The compiler doesn't know that, though, so it has to generate the code, and that forces the function to be linked.

    The nice thing about this technique is that the optimizer sees that this class has no data members, so no data gets generated into the module's data segment. The not-nice thing about this technique is that it is kind of opaque.

  • The Old New Thing

    Horrifically nasty gotcha: FindResource and FindResourceEx

    • 15 Comments

    The Find­Resource­Ex function is an extension of the Find­Resource function in that it allows you to specify a particular language fork in which to search for the resource. Calilng the Find­Resource function is equivalent to calling Find­Resource­Ex and passing zero as the wLanguage.

    Except for the horrible nasty gotcha: The second and third parameters to Find­Resource­Ex are in the opposite order compared to the second and third parameters to Find­Resource!

    In other words, if you are adding custom language support to a program, you cannot just stick a wLanguage parameter on the end when you switch from Find­Resource to Find­Resource­Ex. You also have to flip the second and third parameters.

    Original code Find­Resource(hModule, MAKEINTRESOURCE(IDB_MYBITMAP), RT_BITMAP)
    You change it to Find­Resource­Ex(hModule, MAKEINTRESOURCE(IDB_MYBITMAP), RT_BITMAP, 0)
    You should have changed it to Find­Resource­Ex(hModule, RT_BITMAP, MAKEINTRESOURCE(IDB_MYBITMAP), 0)

    The nasty part of this is that since the second and third parameters are the same type, the compiler won't notice that you got them backward. The only way you find out is that your resource code suddenly stopped working.

  • The Old New Thing

    2014 year-end link clearance

    • 14 Comments

    Another round of the semi-annual link clearance.

  • The Old New Thing

    Even the publishing department had its own Year 2000 preparedness plan

    • 7 Comments

    On December 31, 1999, Microsoft Product Support Services were ready in case something horrible happened as the calendar rolled over into the new year.

    I'm told that Microsoft Press also had its own Year 2000 plan. They staffed their helpline continuously from Friday evening December 31, 1999 all the way through Sunday, January 2, 2000. They did this even though Microsoft Press did not normally staff its helpline ouside normal business hours, and even though all sample code in all publications come with a disclaimer that they are provided "as is" with no warranty.

    I do not know if they took any calls, but I suspect not.

  • The Old New Thing

    How did that program manage to pin itself to my taskbar when I installed it?

    • 27 Comments

    Occasionally, somebody will notice that upon installing a program, it managed to pin itself to the taskbar. But just like there is no Pin­To­Start­Menu function, there is also no Pin­To­Taskbar function, and for the same reason: Because applications would abuse it and auto-pin themselves because they are so awesome, and so that the developer could get a nice bonus.

    In spite of these roadblocks, some applications manage to pin themselves to the taskbar anyway, typically by programmatically driving the shortcut context menu. The developer then collects their bonus and goes out and gets drunk.

    There is no real way of blocking this behavior other than giving guidance not to do that. Customers who complain to the vendors about their presumptiveness may help. Scornful looks and ignoring them when they walk by the lunch table looking for a place to sit may also work. (But since they're drunk, they may not care.)

  • The Old New Thing

    Integer signum in SSE

    • 4 Comments

    The signum function is defined as follows:

    signum(x) =  −1  if x < 0
    signum(x) =  if x = 0
    signum(x) =  +1  if x > 0

    There are a couple of ways of calculating this in SSE integers.

    One way is to convert the C idiom

    int signum(int x) { return (x > 0) - (x < 0); }
    

    The SSE translation of this is mostly straightforward. The quirk is that the SSE comparison functions return −1 to indicate true, whereas C uses +1 to represent true. But this is easy to take into account:

    x > 0  ⇔  − pcmpgt(x, 0)
    x < 0  ⇔  − pcmpgt(0, x)

    Substituting this into the original signum function, we get

    signum(x) =  (x > 0)  −  (x < 0)
    − pcmpgt(x, 0)  −  − pcmpgt(0, x)
    − pcmpgt(x, 0)  +  pcmpgt(0, x)
    pcmpgt(0, x)  −  pcmpgt(x, 0)

    In assembly:

            ; assume x is in xmm0
    
            pxor    xmm1, xmm1
            pxor    xmm2, xmm2
            pcmpgtw xmm1, xmm0 ; xmm1 = pcmpgt(0, x)
            pcmpgtw xmm0, xmm2 ; xmm0 = pcmpgt(x, 0)
            psubw   xmm0, xmm1 ; xmm0 = signum
            ; answer is in xmm0
    

    With intrinsics:

    __m128i signum16(__m128i x)
    {
        return _mm_sub_epi16(_mm_cmpgt_epi16(_mm_setzero_si128(), x),
                             _mm_cmpgt_epi16(x, _mm_setzero_si128()));
    }
    

    This pattern extends mutatus mutandis to signum8, signum32, and signum64.

    Another solution is to use the signed minimum and maximum opcodes, using the formula

    signum(x) = min(max(x, −1), +1)

    In assembly:

            ; assume x is in xmm0
    
            pcmpgtw xmm1, xmm1 ; xmm1 = -1 in all lanes
            pmaxsw  xmm0, xmm1
            psrlw   xmm1, 15   ; xmm1 = +1 in all lanes
            pminsw  xmm0, xmm1
            ; answer is in xmm0
    

    With intrinsics:

    __m128i signum16(__m128i x)
    {
        // alternatively: minusones = _mm_set1_epi16(-1);
        __m128i minusones = _mm_cmpeq_epi16(_mm_setzero_si128(),
                                            _mm_setzero_si128());
        x = _mm_max_epi16(x, minusones);
    
        // alternatively: ones = _mm_set1_epi16(1);
        __m128i ones = _mm_srl_epi16(minusones, 15);
        x = _mm_min_epi16(x, ones);
    
        return x;
    }
    

    The catch here is that SSE2 supports only 16-bit signed minimum and maximum; to get other bit sizes, you need to bump up to SSE4. But if you're going to do that, you may as well use the psign instruction. In assembly:

            ; assume x is in xmm0
    
            pcmpgtw xmm1, xmm1 ; xmm1 = -1 in all lanes
            psrlw   xmm1, 15   ; xmm1 = +1 in all lanes
            psignw  xmm1, xmm0 ; apply sign of x to xmm1
            ; answer is in xmm1
    

    With intrinsics:

    __m128i signum16(__m128i x)
    {
        // alternatively: ones = _mm_set1_epi16(1);
        __m128i minusones = _mm_cmpeq_epi16(_mm_setzero_si128(),
                                            _mm_setzero_si128());
        __m128i ones = _mm_srl_epi16(minusones, 15);
        return _mm_sign_epi16(ones, x);
    }
    

    The psign instruction applies the sign of its second argument to its first argument. We load up the first argument with the value +1 in all lanes, then apply the sign of x, which negates the value if the corresponding lane of x is negative; sets the value to zero if the lane is zero, and leaves it alone if the corresponding lane is positive.

  • The Old New Thing

    Debugging walkthrough: Access violation on nonsense instruction

    • 20 Comments

    A colleague of mine asked for help puzzling out a mysterious crash dump which arrived via Windows Error Reporting.

    rax=00007fff219c5000 rbx=00000000023c8380 rcx=00000000023c8380
    rdx=0000000000000000 rsi=00000000043f0148 rdi=0000000000000000
    rip=00007fff21af2d22 rsp=000000000392e518 rbp=000000000392e580
     r8=00000000276e4639  r9=00000000043b2360 r10=00000000ffffffff
    r11=0000000000000000 r12=0000000000000001 r13=0000000000000000
    r14=000000000237cfc0 r15=00000000023d3ea0
    iopl=0         nv up ei pl zr na po nc
    cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010246
    nosebleed!CNosebleed::OnFrimble+0x1f891a:
    00007fff`21af2d22 30488b xor byte ptr [rax-75h],cl ds:00007fff`219c4f8b=41
    

    Well that's a pretty strange instruction. Especially since it doesn't match up with the source code at all.

    void CNosebleed::OnFrimble(...)
    {
        ...
        if (CanFrumble(...))
        {
            ...
        }
        else
        {
            hr = pCereal->AddMilk(pCarton);
            if (SUCCEEDED(hr))
            {
                pCereal->Snap();
                pCereal->Crackle(false);
                if (SUCCEEDED(pCereal->Pop(uId)) // ← crash here
                {
                    ....
                }
            }
        }
        ....
    }
    

    There is no bit-toggling in the actual code. The method calls to Snap, Crackle, and Pop are all interface calls and therefore should be vtable calls. We are clearly in a case of a bogus return address, possibly a stack smash (and therefore cause for concern from a security standpoint).

    My approach was to try to figure out what was happening just before the crash. And that meant figuring out how we ended up in the middle of an instruction.

    Here is the code surrounding the crash point.

    00007fff`21af2d17 ff90d0020000    call    qword ptr [rax+2D0h]
    00007fff`21af2d1d 488b03          mov     rax,qword ptr [rbx]
    00007fff`21af2d20 8b5530          mov     edx,dword ptr [rbp+30h]
    00007fff`21af2d23 488bcb          mov     rcx,rbx
    

    Notice that the code that crashed is actually the last byte of the mov edx, dword ptr [rbp+30h] (the 30) and the first two bytes of the mov rcx, rbx (the 488b).

    Disassembling backward is a tricky business on a processor with variable-length instructions, so to get my bearings, I looked for the call to Can­Frumble:

    0:011> #CanFrumble nosebleed!CNosebleed::OnFrimble
    nosebleed!CNosebleed::OnFrimble+0x1f883b
    00007fff`21af2c43 e8e0e40f00 call nosebleed!CNosebleed::OnFrimble
    

    The # command means "Start disassembling at the specified location and stop when you see the string I passed." This is an automated way of just hitting u until you get to the thing you are looking for.

    Now that I am at some known good code, I can disassemble forward:

    00007fff`21af2c48 488bcb          mov     rcx,rbx
    00007fff`21af2c4b 84c0            test    al,al
    00007fff`21af2c4d 0f849a000000    je      nosebleed!CNosebleed::OnFrimble+0x1f88e5 (00007fff`21af2ced)
    

    The above instructions check whether the Can­Frumble returned true, and if not, it jumps to 00007fff`21af2ced. Since we know that we are in the false path, we follow the jump.

    // Make a vtable call into pCereal->AddMilk()
    00007fff`21af2ced 488b03          mov     rax,qword ptr [rbx] ; vtable
    00007fff`21af2cf0 498bd7          mov     rdx,r15 ; pCarton
    00007fff`21af2cf3 ff9068010000    call    qword ptr [rax+168h] ; call
    00007fff`21af2cf9 8bf8            mov     edi,eax ; save to hr
    00007fff`21af2cfb 85c0            test    eax,eax ; succeeded?
    00007fff`21af2dfd 0f880dffffff    js      nosebleed!CNosebleed::OnFrimble+0x1f8808 (00007fff`21af2c10)
    
    // Now call Snap()
    00007fff`21af2d03 488b03          mov     rax,qword ptr [rbx] ; vtable
    00007fff`21af2d06 488bcb          mov     rcx,rbx ; "this"
    00007fff`21af2d09 ff9070020000    call    qword ptr [rax+270h] ; Snap
    
    / Now call Crackle
    00007fff`21af2d0f 488b03          mov     rax,qword ptr [rbx] ; vtable
    00007fff`21af2d12 33d2            xor     edx,edx ; parameter: false
    00007fff`21af2d14 488bcb          mov     rcx,rbx ; "this"
    00007fff`21af2d17 ff90d0020000    call    qword ptr [rax+2D0h] ; Crackle
    
    // Get ready to Pop
    00007fff`21af2d1d 488b03          mov     rax,qword ptr [rbx] ; vtable
    00007fff`21af2d20 8b5530          mov     edx,dword ptr [rbp+30h] ; uId
    00007fff`21af2d23 488bcb          mov     rcx,rbx ; "this"
    

    But we never got to execute the Pop because our return address from Crackle got messed up.

    Let's follow the call into Crackle.

    0:011> dps @rbx l1
    00000000`02b4b790  00007fff`219c50a0 nosebleed!CCereal::`vftable'
    0:011> dps 00007fff`219c50a0+2d0 l1
    00007fff`219c5370  00007fff`21aa5c28 nosebleed!CCereal::Crackle
    0:011> u 00007fff`21aa5c28
    nosebleed!CCereal::Crackle:
    00007fff`21aa5c28 889163010000    mov     byte ptr [rcx+163h],dl
    00007fff`21aa5c2e c3              ret
    

    So at least the pCereal pointer seems to be okay. It has a vtable and the slot in the vtable points to the function we expect. The Crackle method merely stashes the bool parameter into a member variable. No stack corruption here because rbx is nowhere near rsp.

    0:012> db @rbx+163 l1
    00000000`02b4b8f3  ??                                               ?
    

    Sadly, the byte in question was not captured in the dump, so we cannot verify whether the call actually was made. Similarly, the members of CCereal manipulated by the Snap method were also not captured in the dump, so we can't verify that either. (The only member of CCereal captured in the dump is the vtable itself.)

    So we can't find any evidence one way or the other as to whether any of the calls leading up to Pop actually occurred. Maybe we can try to figure out how many misaligned instructions we managed to execute before we crashed, see if that reveals anything. To do this, I'm going to disassemble at varying incorrect offsets and see which ones lead to the instruction that crashed.

    0:011> u .-1 l2
    nosebleed!CNosebleed::OnFrimble+0x1f8919:
    00007fff`21af2d21 55              push    rbp
    00007fff`21af2d22 30488b          xor     byte ptr [rax-75h],cl
    // ^^ this looks interesting; we'll come back to it
    
    0:011> u .-3 l2
    nosebleed!CNosebleed::OnFrimble+0x1f8917:
    00007fff`21af2d1f 038b5530488b    add     ecx,dword ptr [rbx-74B7CFABh]
    00007fff`21af2d25 cb              retf
    // ^^ this doesn't lead to the crashed instruction
    
    0:011> u .-4 l2
    nosebleed!CNosebleed::OnFrimble+0x1f8916:
    00007fff`21af2d1e 8b03            mov     eax,dword ptr [rbx]
    00007fff`21af2d20 8b5530          mov     edx,dword ptr [rbp+30h]
    // ^^ this doesn't lead to the crashed instruction
    
    0:012> u .-5 l3
    nosebleed!CNosebleed::OnFrimble+0x1f8914:
    00007fff`21af2d1c 00488b          add     byte ptr [rax-75h],cl
    00007fff`21af2d1f 038b5530488b    add     ecx,dword ptr [rbx-74B7CFABh]
    00007fff`21af2d25 cb              retf
    // ^^ this doesn't lead to the crashed instruction
    
    0:012> u .-6 l3
    nosebleed!CNosebleed::OnFrimble+0x1f8913:
    00007fff`21af2d1b 0000            add     byte ptr [rax],al
    00007fff`21af2d1d 488b03          mov     rax,qword ptr [rbx]
    00007fff`21af2d20 8b5530          mov     edx,dword ptr [rbp+30h]
    // ^^ this doesn't lead to the crashed instruction
    

    Exercise: Why didn't I bother checking .-2?

    You only need to test as far back as the maximum instruction length, and in practice you can give up much sooner because the maximimum instruction length involves a lot of prefixes which are unlikely to occur in real code.

    The only single-instruction rewind that makes sense is the push rbp. Let's see if it matches.

    0:011> ?? @rbp
    unsigned int64 0x453e700
    0:011> dps @rsp l1
    00000000`0453e698  00000000`0453e700
    

    Yup, it lines up. This wayward push is also consistent with the stack frame layout for the function.

    nosebleed!CNosebleed::OnFrimble:
    00007fff`218fa408 48895c2410      mov     qword ptr [rsp+10h],rbx
    00007fff`218fa40d 4889742418      mov     qword ptr [rsp+18h],rsi
    00007fff`218fa412 55              push    rbp
    00007fff`218fa413 57              push    rdi
    00007fff`218fa414 4154            push    r12
    00007fff`218fa416 4156            push    r14
    00007fff`218fa418 4157            push    r15
    00007fff`218fa41a 488bec          mov     rbp,rsp
    00007fff`218fa41d 4883ec60        sub     rsp,60h
    

    The values of rbp and rsp should differ by 0x60.

    0:012> ?? @rbp-@rsp
    unsigned int64 0x68
    

    The difference is in error by 8 bytes, exactly the size of the rbp register that was pushed.

    It therefore seems highly likely that the push rbp was executed.

    Repeating the exercise to find the instruction before the push rbp shows that no instruction fell through to the push rbp. Therefore, execution jumped to 00007fff`21af2d21 somehow.

    Another piece of data is that rax matches the value we expect it to have, sort of. Here are some selected lines from earlier in the debug session:

    // What we expected to have executed
    00007fff`21af2d1e 8b03            mov     eax,dword ptr [rbx]
    
    // The value we expected to have fetched
    0:011> dps @rbx l1
    00000000`02b4b790  00007fff`219c50a0 nosebleed!CCereal::`vftable'
    
    // The value in the rax register
    rax=00007fff219c5000 ...
    

    The value we expect is 00007fff`219c50a0, but the value in the register has the bottom eight bits cleared.

    Putting this all together, my theory is that the CPU executed the instruction at 00007fff`21af2d1e, and then due to some sort of hardware failure, instead of incrementing the rip register by two, it (1) incremented it by three, and then (2) as part of its confusion, zeroed out the bottom byte of rax. The erroneous rip led to the rogue push rbp and the crash on the nonsensical xor.

    It's not a great theory, but it's all I got.

    As to what sort of hardware failure could have occurred: This particular failure was reported twice, so a cosmic ray is less likely to be the culprit (because you have to get lightning to strike twice) than overheating or overclocking.

Page 9 of 448 (4,472 items) «7891011»