Optimizing C++ Code : Constant-Folding

Optimizing C++ Code : Constant-Folding

Rate This
  • Comments 25

If you have arrived in the middle of this blog series, you might want instead to begin at the beginning.

This post examines Constant-Folding – one of the simplest optimizations performed by the VC++ compiler.  In this optimization, the compiler works out the result of an expression while it is compiling (at “compile-time”), and inserts the answer directly into the generated code.  This avoids the cost of performing those same calculations when the program runs (at “runtime”).

Here is an example.  A tiny main function, stored in the file App.cpp:

int main() { return 7 + 8; }

First, some admin about this blog: 

  • We will build programs from the command-line (rather than from within Visual Studio)
  • We will use Visual Studio 2012.  In particular, the version of the compiler that generates x64 code (rather than code for the aging x86 architecture), and that compiles on an x64 computer (i.e., “x64 on x64”)

If you want to follow along, please follow the instructions found here:  essentially, you just need to choose the right variant from a list of possible “Visual Studio Tools”.

(Note that if you are using the free compiler from Visual Studio Express, it runs only on x86, but will happily generate code for x64: “x64 on x86”.  This is equally good for experiments)

We can build our sample program with the command:  CL /FA App.cpp.  The /FA switch creates an output file holding the assembly code that the compiler generates for us.  You can display it with: type App.asm to show:

 

PUBLIC  main
_TEXT   SEGMENT
main    PROC
        mov     eax, 15
        ret     0
main    ENDP
_TEXT   ENDS
END

The point of interest is the mov eax, 15 instruction – it just insert the value 15 into the EAX register (which, by definition of the x64 calling standard, is the way that an x64 function sets the int value it will return, as the function’s result, to its caller).  The compiler does not emit instructions that would add 7 to 8 at runtime.  They would have gone something like:

PUBLIC  main
_TEXT   SEGMENT
main    PROC
        mov     eax, 7
        add     eax, 8
        ret     0
main    ENDP
_TEXT   ENDS
END

(Note carefully, the last instruction, in both snippets, ret 0.  This means: return control to the caller, and pop zero bytes from the stack.  Do not be misled into thinking this means return the value 0 to the caller!)

Let me guess: you are probably thinking “this is all very well, but really!  What kind of idiot would ever write code that includes arithmetic like 7 + 8”?  Of course, you are right.  But the compiler does see such constructs, often as a side-effect of macros.  Here is an example to persuade you that Constant-Folding is a worthwhile optimization:

#define SECS_PER_MINUTE  60
#define MINUTES_PER_HOUR 60
#define HOURS_PER_DAY    24

enum Event { Started, Stopped, LostData, ParityError };

struct {
    int        clock_time;
    enum Event ev;
    char*      reason;
}   Record;

int main() {
    const int table_size = SECS_PER_MINUTE * MINUTES_PER_HOUR * HOURS_PER_DAY * sizeof Record;
    // rest of program
}

Here we are going to create a table big enough to hold a Record for each second of an entire day.  So table_size would be the size, in bytes, of that table.  It’s easy to check the assembly instruction that sets the variable table_size :

        mov     DWORD PTR table_size$[rsp], 1382400     ; 00151800H

No multiply instructions here! – it’s all calculated at compile-time: 60 * 60 * 24 * 16 = 1382400.

In fact, if we could peek inside the compiler, we would find that this level of Constant-Folding is so simple, it’s performed by the FrontEnd.  It does not require the heavy lifting power of the BackEnd Optimizer.  And therefore, it’s always on.  It makes no difference whether you request optimization (with /O2) or disable optimization (with /Od) – this optimization always cuts in.

How complicated can the expression be, and yet we still fold those constants together at compile-time?  In fact, the FrontEnd will cope with pretty much any arithmetic expression involving constants (even values such as sizeof, as above, so long as they can be evaluated at compile-time) and operators + - * / % << >> ++ and --.  You can even throw in bools, logical operators, ifs and ?:  

Are there any cases of Constant-Folding, then, that do require the power of the BackEnd Optimizer?  Yes.  Consider this example:

int bump(int n) { return n + 1; }

int main() { return 3 + bump(6); }

With the commands, cl /FA /Od App.cpp which says "no optimizations, thank you" and type App.asm, we get:

mov     ecx, 6
call    ?bump@@YAHH@Z                           ; bump
add     eax, 3

Just as we would expect: load 6 into ECX – which holds the first argument, in the x64 calling convention, to our function bump.  Then call bump.  Its result is returned in EAX.  Finally, add 3 into EAX.

Let’s see what happens if we request optimization, with: cl /FA /O2 App.cpp

mov     eax, 10

Here the BackEnd Optimizer has recognized that the bump function is so small that its body should simply be included into its caller (a sophisticated optimization called “function inlining” that we will examine later in the series).  It then realizes it can evaluated the entire expression at compile time, and so ends up with a single instruction.  Quite impressive, right?

  • Hi Jim!

    Thanks for continuing the series! :-)

    BTW, out of curiosity, a (possibly forward looking) question -- how sophisticated are the modification of the BackEnd Optimizer required to implement something like N3652: open-std.org/.../n3652.html // C++14's constexpr, which enhanced C++11's constexpr compile-time function evaluation (CTFE) capabilities

    In the light of the last example -- is this something that can be (relatively) non-intrusively plugged-in into the existing infrastructure, or rather a more involved change (requiring, say, a major rewrite)?

  • Matt, if I remember correctly from Herb Sutter's recent build talk, constexpr requires rewriting the compiler to use an AST, which is something that they're working on.

  • Excuse me?

    In my opinion one should use enum as possible as being allowed. After all, the kind of things your artical introduced relies on the behaviour of the compiler decision. Which means a clever compiler and another stupid one may generate binaries in huge different way.

    With the help of the enumerators, however, these kind of 'optimization' will always be positive states and, you can use any non-function involved expressions to generate constants as you will. like mathematical/bit mask&shift expressions, you can even use the meta-programing templates to generate constants. Members in an enumerator can't be initialized with variable and any expressions for initialization has to be 'constantable'. Especially with the classed enumerators, It can be a good at making regulary constant integers.

  • @MattPD

    The support for constexpr will be entirely contained within the FrontEnd of the VC++ compiler.  So no changes to the BackEnd here.  (This is similar to C++ templates - an abstraction provided-by, and entirely processed-by, the FrontEnd)  

    The work to support compile-time evaluation of constexpr function is quite a big job: in effect, the compiler must evaluate - essentially interpret - what might be a quite complex C++ function.  For example, it can be even be recursive.

  • @GregM

    Building ASTs in the FrontEnd would enable some useful features - including better static analyses.  (At the moment the "ANALYZE | Run Code Analysis" feature in Visual Studio uses a slightly different FrontEnd).  I believe we can get part of constexpr without ASTs, but that would be a question for the FrontEnd team.

  • @Simon Dan

    Sorry - not sure I follow.  I was not advocating any particular usage pattern for C++ enums in my example - I just chose an enum as one way to illustrate that even "sizeof" can be constant-folded.  On your second paragraph: yes, I agree that C++ will constant-fold (ie, evaluate at compile-time) quite complex expressions.

  • Hi Jim!

    What is amazing is that when using /Ob2 (=inlining only) instead of /O2, bump *wont* be inlined unless bump is declared inlined - rather weird, isn't it?

  • Just one request:  Pick up the pace!  At 2-3 weeks/post it's going to be the end of the year before this series is done and I don't want to wait that long :)

    P.S. Hi Jim - we actually met at an MVP summit years ago when you were on the Phoenix team.  Maybe 2004 or so?

  • @ Martin

    No - with /Ob2 (automatically turned on if you select /O2 anyway) I see 'bump' being inlined and the constant-fold happening.  I'm using the compiler from "Microsoft Visual Studio 11.0".  Can you double-check your result?

  • @ Carl D

    Yes, I had aimed to post once each week.  But another project came along and it's stealing my cycles.  Hopefully, I can speed things up once that project stabilizes, by end of summer.

  • Blogs are my business.  My bread and butter.  If it were not for blogs, the internet would not be in a state rip for my plucking.  And that state is good for my business.  Now how I can tie this into a VS add-in . . . working on it.

    Yours,

    The Slum Lord

  • @TSL

    Welcome aboard!  Glad you are investigating those aspects others so often forget.

  • @Jim Hogg, Thank you! I am hoping that soon you guys would bring loop-interchange optimization (like Intel compiler does).

    OAN, I wish if Microsoft teams had friendly collaboration (beyond SharePoint).. yet easy means to share wisdom among them, Charka (javascript engine) for IE9/IE10/IE11 would definitely perform better in try-catch performance test:

    newilk.com/.../SpeedTest1.

    In ideal case, everything in this test should be under 10ms which exist in (of course) Chrome! Chrome and others have strategy of making ONE perfect base and emerging all of their products from it.

    Then compare the performance of rendering engines: Opera, Gecko and Webkit with latest Trident with this test: newilk.com/.../DomManipulation

    At Microsoft every product has its own performance team and its own definition of "fair-enough". When we brought these performance issues to the attention of IE team, their stance was "there is no real-world scenario which uses so many calls for try-catch and others.. its just a stress test". Then why other vendors are paying attention to these so-called "irrelevant" tiny details and make themselves at wining position over Microsoft while you guys still chose to ignore whatever feedback coming your way!

    Come on... this must be an exciting challenge... a hobby for performance savvy Assembly/C/C++ geeks at Microsoft . A day job ? may be two.. to figure out the bottle neck in Chakra and Trident and unkink these nasty issues!! And this chapter is closed for all upcoming released of IE.. You can access the source code of those, and im craving for it :-(

  • @Nate

    Yes, Microsoft teams do collaborate - especially compiler folks.  I have passed on your comments to the Chakra team (some of whom used to work in the C++ backend team)

  • @Jim - Sorry, I failed to say that I tested with VS2010:

    Here's my test: (I just hope the formatting won't get messed up too much)

    + + + +

    ***** /Ob2 without inline declaration *****

    *** File: App-not_inline.cpp

    // bump is *not* inline

    int bump(int n) { return n + 1; }

    int main() { return 3 + bump(6); }

    *** cl /FA /Ob2 App-not_inline.cpp

    *** Result: App-not_inline.asm

    ; Listing generated by Microsoft (R) Optimizing Compiler Version 16.00.40219.01

    TITLE C:\Users\trappel\Documents\Visual Studio 2010\Projects\asm\App-not_inline.cpp

    .686P

    .XMM

    include listing.inc

    .model flat

    INCLUDELIB LIBCMT

    INCLUDELIB OLDNAMES

    PUBLIC ?bump@@YAHH@Z ; bump

    ; Function compile flags: /Odtp

    _TEXT SEGMENT

    _n$ = 8 ; size = 4

    ?bump@@YAHH@Z PROC ; bump

    ; File c:\users\trappel\documents\visual studio 2010\projects\asm\app-not_inline.cpp

    ; Line 2

    push ebp

    mov ebp, esp

    mov eax, DWORD PTR _n$[ebp]

    add eax, 1

    pop ebp

    ret 0

    ?bump@@YAHH@Z ENDP ; bump

    _TEXT ENDS

    PUBLIC _main

    ; Function compile flags: /Odtp

    _TEXT SEGMENT

    _main PROC

    ; Line 4

    push ebp

    mov ebp, esp

    push 6

    call ?bump@@YAHH@Z ; bump

    add esp, 4

    add eax, 3

    pop ebp

    ret 0

    _main ENDP

    _TEXT ENDS

    END

    ***** /Ob2 WITH inline declaration *****

    *** File: App-inline.cpp

    // bump *is* inline

    inline int bump(int n) { return n + 1; }

    int main() { return 3 + bump(6); }

    *** cl /FA /Ob2 App-inline.cpp

    App-inline.cpp

    *** Result: App-inline.asm

    ; Listing generated by Microsoft (R) Optimizing Compiler Version 16.00.40219.01

    TITLE C:\Users\trappel\Documents\Visual Studio 2010\Projects\asm\App-inline.cpp

    .686P

    .XMM

    include listing.inc

    .model flat

    INCLUDELIB LIBCMT

    INCLUDELIB OLDNAMES

    PUBLIC _main

    ; Function compile flags: /Odtp

    _TEXT SEGMENT

    _main PROC

    ; File c:\users\trappel\documents\visual studio 2010\projects\asm\app-inline.cpp

    ; Line 4

    push ebp

    mov ebp, esp

    mov eax, 6

    add eax, 4

    pop ebp

    ret 0

    _main ENDP

    _TEXT ENDS

    END

    ***** /O2 - full opt - without inline declaration *****

    *** File: App-not_inline.cpp

    // bump is *not* inline

    int bump(int n) { return n + 1; }

    int main() { return 3 + bump(6); }

    *** cl /FA /O2 App-not_inline.cpp

    App-not_inline.cpp

    *** Result: App-not_inline.asm

    ; Listing generated by Microsoft (R) Optimizing Compiler Version 16.00.40219.01

    TITLE C:\Users\trappel\Documents\Visual Studio 2010\Projects\asm\App-not_inline.cpp

    .686P

    .XMM

    include listing.inc

    .model flat

    INCLUDELIB LIBCMT

    INCLUDELIB OLDNAMES

    PUBLIC ?bump@@YAHH@Z ; bump

    ; Function compile flags: /Ogtpy

    ; COMDAT ?bump@@YAHH@Z

    _TEXT SEGMENT

    _n$ = 8 ; size = 4

    ?bump@@YAHH@Z PROC ; bump, COMDAT

    ; File c:\users\trappel\documents\visual studio 2010\projects\asm\app-not_inline.cpp

    ; Line 2

    mov eax, DWORD PTR _n$[esp-4]

    inc eax

    ret 0

    ?bump@@YAHH@Z ENDP ; bump

    _TEXT ENDS

    PUBLIC _main

    ; Function compile flags: /Ogtpy

    ; COMDAT _main

    _TEXT SEGMENT

    _main PROC ; COMDAT

    ; Line 4

    mov eax, 10 ; 0000000aH

    ret 0

    _main ENDP

    _TEXT ENDS

    END

    + + + +

Page 1 of 2 (25 items) 12