Optimizing C++ Code : Constant-Folding

Optimizing C++ Code : Constant-Folding

Rate This
  • Comments 25

If you have arrived in the middle of this blog series, you might want instead to begin at the beginning.

This post examines Constant-Folding – one of the simplest optimizations performed by the VC++ compiler.  In this optimization, the compiler works out the result of an expression while it is compiling (at “compile-time”), and inserts the answer directly into the generated code.  This avoids the cost of performing those same calculations when the program runs (at “runtime”).

Here is an example.  A tiny main function, stored in the file App.cpp:

int main() { return 7 + 8; }

First, some admin about this blog: 

  • We will build programs from the command-line (rather than from within Visual Studio)
  • We will use Visual Studio 2012.  In particular, the version of the compiler that generates x64 code (rather than code for the aging x86 architecture), and that compiles on an x64 computer (i.e., “x64 on x64”)

If you want to follow along, please follow the instructions found here:  essentially, you just need to choose the right variant from a list of possible “Visual Studio Tools”.

(Note that if you are using the free compiler from Visual Studio Express, it runs only on x86, but will happily generate code for x64: “x64 on x86”.  This is equally good for experiments)

We can build our sample program with the command:  CL /FA App.cpp.  The /FA switch creates an output file holding the assembly code that the compiler generates for us.  You can display it with: type App.asm to show:


PUBLIC  main
main    PROC
        mov     eax, 15
        ret     0
main    ENDP

The point of interest is the mov eax, 15 instruction – it just insert the value 15 into the EAX register (which, by definition of the x64 calling standard, is the way that an x64 function sets the int value it will return, as the function’s result, to its caller).  The compiler does not emit instructions that would add 7 to 8 at runtime.  They would have gone something like:

PUBLIC  main
main    PROC
        mov     eax, 7
        add     eax, 8
        ret     0
main    ENDP

(Note carefully, the last instruction, in both snippets, ret 0.  This means: return control to the caller, and pop zero bytes from the stack.  Do not be misled into thinking this means return the value 0 to the caller!)

Let me guess: you are probably thinking “this is all very well, but really!  What kind of idiot would ever write code that includes arithmetic like 7 + 8”?  Of course, you are right.  But the compiler does see such constructs, often as a side-effect of macros.  Here is an example to persuade you that Constant-Folding is a worthwhile optimization:

#define SECS_PER_MINUTE  60
#define HOURS_PER_DAY    24

enum Event { Started, Stopped, LostData, ParityError };

struct {
    int        clock_time;
    enum Event ev;
    char*      reason;
}   Record;

int main() {
    const int table_size = SECS_PER_MINUTE * MINUTES_PER_HOUR * HOURS_PER_DAY * sizeof Record;
    // rest of program

Here we are going to create a table big enough to hold a Record for each second of an entire day.  So table_size would be the size, in bytes, of that table.  It’s easy to check the assembly instruction that sets the variable table_size :

        mov     DWORD PTR table_size$[rsp], 1382400     ; 00151800H

No multiply instructions here! – it’s all calculated at compile-time: 60 * 60 * 24 * 16 = 1382400.

In fact, if we could peek inside the compiler, we would find that this level of Constant-Folding is so simple, it’s performed by the FrontEnd.  It does not require the heavy lifting power of the BackEnd Optimizer.  And therefore, it’s always on.  It makes no difference whether you request optimization (with /O2) or disable optimization (with /Od) – this optimization always cuts in.

How complicated can the expression be, and yet we still fold those constants together at compile-time?  In fact, the FrontEnd will cope with pretty much any arithmetic expression involving constants (even values such as sizeof, as above, so long as they can be evaluated at compile-time) and operators + - * / % << >> ++ and --.  You can even throw in bools, logical operators, ifs and ?:  

Are there any cases of Constant-Folding, then, that do require the power of the BackEnd Optimizer?  Yes.  Consider this example:

int bump(int n) { return n + 1; }

int main() { return 3 + bump(6); }

With the commands, cl /FA /Od App.cpp which says "no optimizations, thank you" and type App.asm, we get:

mov     ecx, 6
call    ?bump@@YAHH@Z                           ; bump
add     eax, 3

Just as we would expect: load 6 into ECX – which holds the first argument, in the x64 calling convention, to our function bump.  Then call bump.  Its result is returned in EAX.  Finally, add 3 into EAX.

Let’s see what happens if we request optimization, with: cl /FA /O2 App.cpp

mov     eax, 10

Here the BackEnd Optimizer has recognized that the bump function is so small that its body should simply be included into its caller (a sophisticated optimization called “function inlining” that we will examine later in the series).  It then realizes it can evaluated the entire expression at compile time, and so ends up with a single instruction.  Quite impressive, right?

  • @Jim - and, yes, the listing I posted was for the x86 run - but the x64 compiler of VS2010 has the same behaviour. (when using /Ob2 (=inlining only) instead of /O2, {bump} won't be inlined unless bump is declared inline)

  • @Martin Ba. _

    Seriously? You can use http://pastebin.com/ or similar for such long dumps!

    Please keep the comments space tidy.

  • @Martin

    Ah! - you need to tell the compiler to optimize (typically, with the /O2 flag) before the /Ob flag (which controls function-inlining, a specific optimization) behaves as you would expect.  See msdn.microsoft.com/.../47238hez(v=vs.100).aspx

  • @Jim

    Thanks for your reply. I hope Chakra team will consider it in this release of IE.

    Furthermore, I have a question about branch prediction optimization in JavaScript (and in general). I have submitted the question here channel9.msdn.com/.../Compiler-branch-prediction-object-oriented-design-and-JavaScript- Would you please take a look? I haven't found any related TechNet or MSDN forum, where we can ask about code optimization in various languages.

  • @Jim - Thanks. You comment wrt. the /Ob switch is much appreciated.

    I get it now:

    /Ob1 work even if optimizations are disabled (/Od). (inlines stuff declared 'inline')

    /Ob2 only works with /O* (1,2,x,g) - if the other optimizations are not enabled (/Od), /Ob2 behaves like /Ob1

  • @Nate

    I can't comment on the JavaScript compiler.  But for the native C++ compiler, we emit CMOV instructions, where possible (eg to implement the C's " ? : " ternary operator.  Otherwise, we leave it up to the hardware's sophisticated branch prediction logic to do its best.

  • Since which version does VC++ support Constant Folding?

    I thought it is a widely implemented technique when they thought this in university and I have been relying that the compiler will optimize my code.

  • The case of integral constants is pretty borring, please write about floating point constants.

  • @Afriza N. Arief

    I don't know when VC++ first started to perform constant-folding - it's such a simple optimization, I would guess it was there from day-one - and that dates back to the early 1990s,

  • @Bob

    Yes, integer constant-folding is pretty straightforward.  Floating-point is much more interesting, as you probably know - for example, making sure that the results calculated at compile-time will exactly match the results that would be calculated at run-time: especially in the face of changing the environment at run-time - for example, to request a different rounding mode (via the Floating Point Control Register on the chip)

    I'll add this topic to the list of possible future blog posts.

Page 2 of 2 (25 items) 12