This is an intellectual exercise: when shifts a 32-bit unsigned integer in C++, how to detect whether the calculation overflows efficiently?

Here is the function prototype. shl_overflow will return true if v << cl overflows (cl is between 0 and 31. And we assume that sizeof(unsigned long) == 4 and sizeof(unsigned long long) == 8).

bool shl_overflow(unsigned long v, int cl)

The most natural way to implement this function is to extend v to 64-bit integer:

bool shl_overflow(unsigned long v, int cl)
{
    unsigned long long vl = v;
    return (vl << cl >> 32) != 0;
}

Now, let’s dig into the assembly world. We’ll limit the discussion on x86.

mov     eax, DWORD PTR _v$[esp-4]
mov     ecx, DWORD PTR _cl$[esp-4]
xor     edx, edx
call    __allshl
xor     eax, eax
or      eax, edx
jne     overflow

The implementation has to use three specific registers: eax, edx and ecx. And there is an expensive external function call.

If you step into __allshl in the debugger, you can find that it will use shld to shift 64-bit integer. VC provides some intrinsics which map to CPU instructions. For example, __ll_lshift will map to shld.

Because the high dword of vl is 0, we can simplify our code:

bool shl_overflow(unsigned long v, int cl)
{
    unsigned long long vl = __ll_lshift(v, cl);
    return (static_cast<unsigned long>(vl >> 32)) != 0;
}

The assembly looks like:

mov     eax, DWORD PTR _v$[esp-4]
mov     ecx, DWORD PTR _cl$[esp-4]
xor     edx, edx
shld    edx, eax, cl
test    edx
jne     overflow

Much better now.

Another approach is based on bit representation.

bool shl_overflow(unsigned long v, int cl)
{
    v = _rotl(v, cl);
    unsigned long index;
    return _BitScanForward(&index, v) ? index >= cl : false;
}

The idea is simple. If v << cl overflows, that means the most significant cl bits of v should contains "1".

There are two ways to test that.

1. Scan v from the least significant bits to the most, and test the index against 32 – cl. However, we have to handle the case when cl = 0.

2. Rotate v cl bits left first, so the most significant cl bits will be the least significant cl bits. Then we can scan and test the index against cl directly.

Notice that, the scan may fail if v is 0. The second way is simpler and more efficient.

The assembly looks like:

mov     ecx, DWORD PTR _cl$[esp-4]
mov     eax, DWORD PTR _v$[esp-4]
rol     eax, cl
bsf     eax, eax
je      notoverflow
cmp     eax, ecx
jl      overflow

It only uses two registers. It can also be extended to handle 64-bit shift. One drawback is an extra conditional jump (The extra jump can be replaced by "cmovz eax, ecx", but there is no way to ask the compiler to generate that)