I was looking at assembly code trying to improve an important performance scenario when I found a strange call to _chkstk

    011E100E push  10h
    011E1010 pop   eax
    011E1011 call  _chkstk (011E1810h)
    011E1016 mov   ecx,esp

My understanding is that _chkstk call is generatesd by C++ compiler when there are more than 4kb local variable allocation. But here the code is only allocating 16 bytes. This is important because _chkstk is not cheap:

011E1810  push        ecx
011E1811  lea         ecx, [esp+4]
011E1815  sub         ecx,eax  
011E1817  sbb         eax,eax 
011E1819  not         eax 
011E181B  and         ecx,eax 
011E181D  mov         eax,esp 
011E181F  and         eax,0FFFFF000h 
011E1824  cmp         ecx,eax 
011E1826  jb          cs20 (011E1832h) 
011E1828  mov         eax,ecx 
011E182A  pop         ecx 
011E182B  xchg        eax,esp 
011E182C  mov         eax,dword ptr [eax] 
011E182E  mov         dword ptr [esp],eax 
011E1831  ret 

 

After digging through layers of  macros, I finally found out that this _chkstk call is generated by an _alloca call for 16-byte of storage. Here is a repro case:

int _tmain(int argc, _TCHAR* argv[])

{     RECT * pRect = (RECT *) _alloca(sizeof(RECT));

 

    pRect->left = 10;

    return pRect->left;

}

As the object is always needed, the fix is quite easy, just allocate on the stack together with other local variables (for free).