Larry Osterman's WebLog

Confessions of an Old Fogey
Blog - Title

The case of the inconsistent right shift results…

The case of the inconsistent right shift results…

Rate This
  • Comments 17

One of our testers just filed a bug against something I’m working on.  They reported that if they compiled code which calculated: 1130149156 >> –05701653 it generated different results on 32bit and 64bit operating systems.  On 32bit machines it reported 0 but on 64bit machines, it reported 0x21a.

I realized that I could produce a simple reproduction for the scenario to dig into it a bit deeper:

int _tmain(int argc, _TCHAR* argv[])
{
    __int64 shift = 0x435cb524;
    __int64 amount = 0x55;
    __int64 result = shift >> amount;
    std::cout << shift << " >> " << amount << " = " << result << std::endl;
    return 0;
}

That’s pretty straightforward and it *does* reproduce the behavior.  On x86 it reports 0 and on x64 it reports 0x21a.  I can understand the x86 result (you’re shifting right more than the processor size, it shifts off the end and you get 0) but not the x64. What’s going on?

Well, for starters I asked our C language folks.  I know I’m shifting by more than the processor word size (85), but the results should be the same, right?

Well no.  The immediate answer I got was:

From C++ 03, 5.8/1: The behavior is undefined if the right operand is negative, or greater than or equal to the length in bits of the promoted left operand.

Ok.  It’s undefined behavior.  But that doesn’t really explain the difference.  When in doubt, let’s go to the assembly….

000000013F5215D3  mov         rax,qword ptr [amount]  
000000013F5215D8  movzx       ecx,al  
000000013F5215DB  mov         rax,qword ptr [shift]  
000000013F5215E0  sar         rax,cl  
000000013F5215E3  mov         qword ptr [result],rax  
000000013F5215E8  mov         rdx,qword ptr [shift] 

The relevant instruction is highlighted.  It’s doing a shift arithmetic right of “shift” by “amount”.

What about the x86 version?

00CC14CA  mov         ecx,dword ptr [amount]  
00CC14CD  mov         eax,dword ptr [shift]  
00CC14D0  mov         edx,dword ptr [ebp-8]  
00CC14D3  call        @ILT+85(__allshr) (0CC105Ah)  
00CC14D8  mov         dword ptr [result],eax  
00CC14DB  mov         dword ptr [ebp-28h],edx  

Now that’s interesting.  The x64 version is using a processor shift function but on 32bit machines, it’s using a C runtime library function (__allshr).  And the one that’s weird is the x64 version.

While I don’t have an x64 processor manual, I *do* have a 286 processor manual from back in the day (I have all sorts of stuff in my office).  And in my 80286 manual, I found:

“If a shift count greater than 31 is attempted, only the bottom five bits of the shift count are used. (the iAPX 86 uses all eight bits of the shift count.)”

A co-worker gave me the current text:

The destination operand can be a register or a memory location. The count operand can be an immediate value or the CL register. The count is masked to 5 bits (or 6 bits if in 64-bit mode and REX.W is used). The count range is limited to 0 to 31 (or 63 if 64-bit mode and REX.W is used). A special opcode encoding is provided for a count of 1.

So the mystery is now solved.  The shift of 0x55 only considers the low 6 bits.  The low 6 bits of 0x55 is 0x15 or 21.  0x435cb524 >> 21 is 0x21a.

One could argue that this is a bug in the __allshr function on x86 but you really can’t argue with “the behavior is undefined”.  Both scenarios are doing the “right thing”.  That’s the beauty of the “behavior is undefined” wording.  The compiler would be perfectly within spec if it decided to reformat my hard drive when it encountered this (although I’m happy it doesn’t Smile).

Now our feature crew just needs to figure out how best to resolve the bug.

  • I remember that change from way back when. The 286 (actually I think it was the 186 that started it) used this as an optimization because shifts/rotates of more than 16 zeroed out/wrapped around any operand. However, I would have thought that the 386 would have made this 6 bits and the x64, 7 bits! Is this a bug in the spec, or is there some other reason for this behaviour that I've missed?

  • @ErikF: 5 bits is 31, 6 bits is 63.  Shift right 63 bits doesn't make sense on a 32 bit platform.

  • Ahhh, my brain wasn't doing math today. Never mind....

  • I'm not too familiar with x86 instructions, but I'm guessing the reason the x86 version uses a C runtime function is because x86 has no instruction for right-shifting a 64 bit integer?

    That the function doesn't match the behaviour of the x64 instruction is unfortunate, but as you say, it's hard to argue with undefined behaviour. :)

  • @Sven: Yup - the 32bit x86 processor doesn't have instructions to shift 64bit values (most 32bit machines have extremely limited support for 64bit instructions).

  • Presumably the spec is the way it is for performance reasons, but it's a shame. It means that a>>x and a>>(y+z) are not equivalent even when x==y+z.

    I am surprised the compiler doesn't emit a warning. (I checked with VS2008, in debug and release, warning level 4.) I don't expect the compiler to warn in all cases (the amount value could be the result of a complex calculation, sub-routines, user-input, etc.) but in this case the compiler can easily know the value of amount; it's set once and never changed.

    You do get a compiler warning if you make amount const. Before now I had assumed making a variable like that const was more to avoid programmer error (i.e. tell you if you then modify it) and that the compiler could work it out itself, but now I'll use it even more often. I guess/hope the optimizer works things like that out at a later stage. Maybe that assumption is wrong, too.

    I could swear there are cases where the compiler has warned me about things like this without the variable being const. Maybe I'm imagining things!

  • You can see exactly the same problem using a 32-bit program. If you compile this in release it'll tell you the answer is zero, and compiled in debug it'll tell you it's 538. This is because in release the compiler calculates the answer.

    int _tmain(int argc, _TCHAR* argv[])

    {

       int shift = 0x435cb524;

       int amount = 0x55;

       int result = shift >> amount;

       std::cout << shift << " >> " << amount << " = " << result << std::endl;

       return 0;

    }

  • The behaviour is undefined probably because the specification expects it to be directly mapped to the assembly shift instruction of the CPU. And while Intel masks 5 bits another architecture could not mask anything. I think this is what most people expect (always treating C like a high level assembly).

    I would expect the compiler to warn me, when I use >> on a type that can't be directly mapped to a 'CPU type' (like __int64 on 32-bits, but no warning when compiling __int64 on 64-bits) and the provider of the C-library to provide documentation how >> works with such types. For __int64 >> I would expect the implementation to use either a 5-bit or 6-bit mask. Probably the 6-bit mask would be better, because it would behave the same way on 32-bit and 64-bit CPUs - this is the probably most common scenario now, writing for both 32 and 64-bits. Shifting to the end like __allshr would be odd to me.

  • @pgliznie: Actually the compiler *will* warn you if it can determine the result of the operation.  If I had 0x435cb524 >> 0x55, the compiler prints out a warning.  

  • I hit this on SafeInt, and it is documented on my blog somewhere. Basically, while the standard says this is undefined, what the Microsoft compilers will do is to shift by bits % (sizeof(void*)*8)

    If you'd been using SafeInt, it would have thrown an assert if you did this....

  • @David: I think the other comments have established that what the Microsoft compilers do is a mixture of (a) shift by the requested amount at compile time, leaving 0 (b) leaving it up to the processor (which reduces the shift amount modulo) and (c) call a library function, depending on the type of the argument and the optimization level.

  • @Larry

    > @Sven: Yup - the 32bit x86 processor doesn't have instructions to shift 64bit values (most 32bit machines have extremely limited support for 64bit instructions).

    It has SHLD and SHRD for 64-bit shifts. However, they are so slow that calling dedicated function could actually be faster.

  • @MazeGen:

    That was my thought as well, but I realised it wasn't that simple. It should work for arithmetic as well as logical shifts, however the shift count is still limited to 31, so some extra work is needed there.

    That said, for all we know, __allshr might actually use the double precision shift instructions, but special-case large shift counts. Maybe it was decided that it was easier to wrap it all into a function than have the compiler generate the code every time.

    Another way to implement the function would be to use a loop to SAR the upper half, which would shift the LSB into the CF flag, then RCR the lower half, which would shift the CF flag into the MSB. If the count was left unchanged, or clipped to 64 rather than masked, it would produce the results reported here.

  • @David LeBlanc

    I think the intersection of use cases for SafeInt and use cases for shift ops is almost zero; most of the time shifts are used for hand-optimized code for either bit manipulation (for example extracting ARGB components, etc) or as a substitute of multiplications and divisions with (some) known factors. Not only SafeInt would slow performance down, but code using shifts often uses the overflow wrap around as a feature.

  • Thank you Larry, very interesting!

Page 1 of 2 (17 items) 12