Quick Tips On Using Whole Program Optimization

Quick Tips On Using Whole Program Optimization

  • Comments 38

Hi, I’m Jerry Goodwin from the Visual C++ code generation and optimization team, with a couple quick tips on using Whole Program Optimization, also referred to as Link Time Code Generation (LTCG).

 

If you’re writing native C++ code, you can typically speed up your optimized code by about another 3-4% by adding the /GL flag to your compiles.  This flag tells the compiler to defer code generation until you link your program. Then at link time the linker calls back to the compiler to finish compilation. If you compile all your sources this way, the compiler optimizes your program as a whole rather than one source file at a time. For users building with the IDE, this option is found under the C/C++ optimization settings, and is already on by default for retail builds in new projects.

 

Using Whole Program Optimization provides the optimizer with a number of extra optimization opportunities, but I’ll give just one example. Many people are already familiar with the benefits of inlining a called function into the caller. We can only do inlining when we are generating code for both the calling function and the called function at the same time. With Link Time Code Gen we can inline functions from one source file into callers defined in another source file, as long as both source files were compiled with /GL.

 

If you do use /GL, here are four caveats to keep in mind:

 

1.       When building from the command line or via makefiles, you need to add the /LTCG switch to the link command line to tell the linker to expect to see one or more object files that were compiled with /GL. If you don’t, some build time will be wasted because the linker will have to start over when it gets to the module compiled with /GL. If you build through the IDE this is in your project configuration settings on the Linker optimization page.

2.       Using /GL reduces your compile times, but your link time will increase, because work is being moved to during the link. Overall build time might increase a little, but shouldn’t increase a lot.

3.       Don’t compile managed code with /GL. Link time code gen provides little or no benefit to managed code, and this option combination (/GL /clr) is being removed in the next compiler release, so you can future-proof your build by using link time code generation only for native code. If you’re building managed code using the IDE, the default setting is to use /GL in release builds, and I recommend you disable it for managed code. For mixed managed and native code, compiling only the native code with /GL and linking with /LTCG gives best results.

4.       Never use /GL for code you intend to put in a library and ship to your customers. Doing so means that your customers will be doing the code gen for your library when they link their application. Since some of your customers could have different versions of the compiler, shipping a lib built this way could cause various maintenance problems for you. If your customer’s compiler is from a prior release, their link may fail. If their version is newer than yours, the code they generate won’t be exactly equal to what you’ve tested, and could behave differently for them than when you tested it. In VS 2008, the IDE default for the class library template release configuration is to build using /GL, and I strongly encourage everyone to reset that.

 

Here are links for more information on this topic:

 

The /GL compiler switch (http://msdn.microsoft.com/en-us/library/0zza0de8.aspx)

            The /LTCG linker switch (http://msdn.microsoft.com/en-us/library/xbf3tbeh.aspx)

            A detailed article about Link Time Code Generation (http://msdn.microsoft.com/en-us/magazine/cc301698.aspx)

 

  • I agree with the lean and mean sentiment.  But for optimized release builds I prefer faster generated code at the expense of slower code generation. (e.g. faster runtime with slower compilation)

  • @Dollyann: it seems you are one of those folks that would rather complain than help.

    If you have this problem for so many different configurations, then you should have no trouble submitting a sample.

    If it's really causing you a 30% performance hit, then this might be worth your time to do.  Plus you'll be helping out everyone else.

  • @Dollyann, if you can't submit a sample, here's a couple of things you can do.

    1. Add -P to the command line arguments for both VS2005 and VS2008.  How much bigger is the resulting .i file for VS2008 vs. VS2005?  This would help us determine if the compiler is slower, or is just compiling more source code.  That will help us know what we're dealing with.

    2. I didn't see any precompiled header flags in your sample command line.  Use of precompiled headers could slash your compile times by 2x (or 10x, depending on the size of your headers).  Using precompiled headers will usually offset (or at least amortize) any losses from release to release in compile time due to larger header files.

  • I have compiled with -P and VC2008 is generating .i files with about 3 times the size of VC2005 ones. So its not the compiler. How can that happen? Maybe its because VC2008 uses the windows sdk? Maybe the windows headers are including more and more headers than the old ones.

    And yes, i usually dont use precompiled headers because the compile time was always fine.

  • Cool Dollyan, now we're getting somewhere.

    As a person about to switch from 05 to 08, I'm interested.

    Are you including windows.h by any chance?

    I wonder where the header bloat is coming from...

  • Yes, windows.h and standard c++ headers.

  • I can understand how code compiled to CIL make LTCG almost useless, but can't #pragma unmanaged code in source files compiled with /clr still benefit from LTCG?

  • You are right, allowing LTCG for #pragma unmanaged code even when building with /clr could be beneficial.

    However, we evaluated how many customers were using this feature (using SQM data) and found it to be a very marginal scenario.  We also discussed this change with some major customers and none reported it as an important feature for them.

    We came to the conclusion that the cost of maintaining and moving forward a very complex implementation outweighed the benefit of this feature, so it has been removed.

    We instead want to focus on improving LTCG codegen for the 99.99% case where the binary is completely native.

Page 3 of 3 (38 items) 123