Quick Tips On Using Whole Program Optimization

Quick Tips On Using Whole Program Optimization

  • Comments 38

Hi, I’m Jerry Goodwin from the Visual C++ code generation and optimization team, with a couple quick tips on using Whole Program Optimization, also referred to as Link Time Code Generation (LTCG).

 

If you’re writing native C++ code, you can typically speed up your optimized code by about another 3-4% by adding the /GL flag to your compiles.  This flag tells the compiler to defer code generation until you link your program. Then at link time the linker calls back to the compiler to finish compilation. If you compile all your sources this way, the compiler optimizes your program as a whole rather than one source file at a time. For users building with the IDE, this option is found under the C/C++ optimization settings, and is already on by default for retail builds in new projects.

 

Using Whole Program Optimization provides the optimizer with a number of extra optimization opportunities, but I’ll give just one example. Many people are already familiar with the benefits of inlining a called function into the caller. We can only do inlining when we are generating code for both the calling function and the called function at the same time. With Link Time Code Gen we can inline functions from one source file into callers defined in another source file, as long as both source files were compiled with /GL.

 

If you do use /GL, here are four caveats to keep in mind:

 

1.       When building from the command line or via makefiles, you need to add the /LTCG switch to the link command line to tell the linker to expect to see one or more object files that were compiled with /GL. If you don’t, some build time will be wasted because the linker will have to start over when it gets to the module compiled with /GL. If you build through the IDE this is in your project configuration settings on the Linker optimization page.

2.       Using /GL reduces your compile times, but your link time will increase, because work is being moved to during the link. Overall build time might increase a little, but shouldn’t increase a lot.

3.       Don’t compile managed code with /GL. Link time code gen provides little or no benefit to managed code, and this option combination (/GL /clr) is being removed in the next compiler release, so you can future-proof your build by using link time code generation only for native code. If you’re building managed code using the IDE, the default setting is to use /GL in release builds, and I recommend you disable it for managed code. For mixed managed and native code, compiling only the native code with /GL and linking with /LTCG gives best results.

4.       Never use /GL for code you intend to put in a library and ship to your customers. Doing so means that your customers will be doing the code gen for your library when they link their application. Since some of your customers could have different versions of the compiler, shipping a lib built this way could cause various maintenance problems for you. If your customer’s compiler is from a prior release, their link may fail. If their version is newer than yours, the code they generate won’t be exactly equal to what you’ve tested, and could behave differently for them than when you tested it. In VS 2008, the IDE default for the class library template release configuration is to build using /GL, and I strongly encourage everyone to reset that.

 

Here are links for more information on this topic:

 

The /GL compiler switch (http://msdn.microsoft.com/en-us/library/0zza0de8.aspx)

            The /LTCG linker switch (http://msdn.microsoft.com/en-us/library/xbf3tbeh.aspx)

            A detailed article about Link Time Code Generation (http://msdn.microsoft.com/en-us/magazine/cc301698.aspx)

 

  • @Dollyan, it's not clear whether you're talking about compiler throughput (compile time) or compiler code quality (run time of the compiled code). We continuously benchmark our compiler against a range of performance suites and I'm sure we didn't see any 30% drops.

    If you're talking about code quality I suspect you may be comparing an optimized build from VS 2005 to a non-optimized build from VS 2008. The issue linked above has something to do with losing an optimization setting in the project during conversion from VS 2005 to VS 2008, but I'm not up on the details of that issue. That problem was fixed in the VS 2008 SP1 update, though you may need to reset some of your project settings for the projects you have already converted.

    If you suspect that is the problem, you can eliminate the IDE and its build system from the variables by building from the VS Command Prompt. If you can reproduce this performance drop with sources you are willing to share with us we'd request that you open an MSConnect issue, because we'd be happy to be able to look at the issue more closely than we can via vcblog.

  • @Dollyan, I run VC++ 5, VS 2005 and VS 2008 all side-by-side (and use all three!) VC6 is a speed demon, but I've found so little difference between VS 2005 and VS 2008 performance, it would be easy to confuse which one is being used. This doesn't mean you aren't seeing a performance drop, but just that it's not a global thing.

  • By the way, in reference to the above link. I reported a bug last year which turned out to be the same thing. That this issue was closed with

    "by design" is EXACTLY what pisses me off about the Visual Studio team. This is a bug--anyone who designed such a thing would be a complete moron. I can't count the number of such bug submissions I've made where the VS team has just dismissed them out-of-hand.

    A perfect example is my posting above about release performance and size. Executables are measurably more bloated with 2005/2008/2010 than with VC++6. The CRT startup code has always been horrible, but I've yet to figure out the rest. VC++ 6 apps are also faster. If you really want to make VS2010 the next 6, fix these two problems. (If I want bloat, I'll use .NET; I use C++ precisely because I want lean and mean; give that back, please. Better yet, make it so the executable code is leaner and meaner.)

  • vote+1 on mean and lean, rather than bloated and slower ( a la everything .NET ).

  • vote+2 on lean, mean and small.

  • vote+2 on lean, mean and small.

  • @Jerry Goodwin

    I am talking about compile time, not generated code quality. As I posted before, using the following cmd line (from the VS 2005 and 2008 cmd prompts) shows a compile time for VS 2008 about 30% bigger than with VS 2005's compiler. I have tested it with a lot of compiler options and diferent projects, and the VS 2008 C++ compiler is always a lot slower.

    cl /O2 /I "G:\Boost-1.38.0\include" /I "G:\zlib-1.2.3\include" /I "../Include" /D "WIN32" /D "NDEBUG" /D "_LIB" /D "_UNICODE" /D "UNICODE" /FD /EHsc /MD /Fo"Release\\" /Fd"..\..\Release\vc90.pdb" /W3 /c /Zi /TP /MP "file1.cpp" "file2.cpp" "file3.cpp" "file4.cpp" "file5.cpp"  "file6.cpp" "file7.cpp" "file8.cpp"

    I have a core2duo and tested both compilers with and without /mp.

  • It's the link time that kills me.  There's a serious problem when it takes over 20 minutes to link a DLL.  Incremental linking is faster but we often get an error that we've run out of memory for the ILK file.

  • So if VS2010 is new VC6, why doesn't it make tight code as VC6?

    VC6 is an outdated compiler requiring better language support, but code generation should be better not worse. And it is getting worse and worse, templates are not parsed properly and lead to bloated code when it never should.

    Same goes for CRT. Make it decent for once.

  • VC6 and the CRT from that era didn't have the level of extra pointer checking and argument validation that VS 2005 and later does.  That means slightly more bloated and slower code with less buffer overruns.

  • Compile time (compiler throughput) has to be traded off against code quality (better, tighter, faster generated code). Some customers are going to be more concerned about one, some about the other, depending on what they're doing. We have to try to find a good balance. Obviously for /Od (non-optimized) compiles we strongly favor throughput over code quality. For optimized compiles it’s a much more difficult balance.

    I wasn't here during the VC6 to VC7 development cycle, but in general we're constantly trying to improve the compiler by adding new optimizations. Before any new optimization goes in the programmer has to measure and report the impact on code quality and runtime performance. If the change isn't a win, it doesn't go in. We also do continuous measurement of the compiler to detect any performance regressions. Occasionally we discover a bug in the compiler that has to be fixed in a way that impacts performance. Correctness trumps performance.

    There's also tuning that goes on, each release of the compiler is targeted at the versions of the processors that are in production at the time we release the compiler. The processor manufacturers are constantly tweaking their designs so that's something of a moving target.

    With all these "moving parts", it’s very easy to pick a single measurement (e.g. throughput) and show a single non-representative case where the new compiler is worse than the previous one. And for a given customer, that one case may be the only case that's important to them, which is unfortunate.

    The message I'm trying to give is that we pay close attention to all these issues, work hard to constantly improve, continuously measure and report internally, and only accept performance hits for correctness or security issues.

    I'm not well positioned to respond to the whole issue related to the MSConnect bug linked above, because it's about the IDE not the code generator, but I've been talking to the folks that are. They've been doing some re-investigation, and I'm hoping you'll see something here on that topic soon. The bottom line as I see it is that some users were accidentally changed from an optimized compile using VC 8 to a non-optimized compile using VC 9. That's the only way anyone is going to see a 30% runtime performance degradation between the two releases, and it’s due to a change in the command line parameters passed to the compiler, not the compiler itself. If you are seeing a significant performance reduction between any two releases, I would encourage you to first validate that you're comparing apples to apples by building the same code from the command line with the same command line flags. If that shows a performance regression, then please open an issue with MSConnect if you can provide us with something we can reproduce here.

    Jerry Goodwin

  • PS:

    Similarly, if you are seeing that VC 6 generates better, tighter, faster code than VC 9, and you’re willing to share part of your code with us, please open an issue with MSConnect. We would certainly like to understand places where our compiler’s performance isn’t as good as it used to be.

  • @Jerry Goodwin

    Dude, didnt you read what i wrote? ITS NOT ABOUT RUNTIME PERFORMANCE, ITS ABOUT COMPILE TIME. And I have used CMD PROMPTS AS I SAID ABOVE, with the same parameters, with only simple parameters , and with a lot of different parameter configurations. Compiler Version 15.00.30729.01 is always about 30% slower than 14.00.50727.762).

    Repeating: I MEAN COMPILE TIME, NOT RUNTIME PERFORMANCE. I have more than 10 projects, all of them compile 30% slower with the new vs 2008 c++ compiler. And yes, I have checked all the parameters, they are all correct, all valid, i have tested with tons of parameter cfgs and its always the same.

  • As Gerry mentioned, we make tradeoffs between compile time and the quality of the code we generate.  /Od is about keeping build time quick at the expense of code quality, while /O2 builds generate better code at the cost of long builds.  As we strive to improve our code quality for /O2 builds, you should see build times increase with newer compilers.  I noticed /O2 was one of the parameters you specified on your command line, so it would be interesting to know if you get the same performance changes with /Od instead.

    The compiler team does spend a good amount of time investigating compiler throughput.  We understand it's critical to many customers, not only in /Od builds, but also /O2 builds.  We have been focusing our performance analysis on comparing VS2010 to VS 2008.  Given your feedback, we'll need to broaden our comparisons to include older versions of the compiler as well.

    For anyone posting about problems with the compiler, or any tool for that matter, sharing detailed information is critical.  Dollyan, the compiler flags were quite useful, but having some idea as to the size of the projects and how many libraries it pulls in would helpful in understanding the cause of the regression.

    In the coming months we plan on sharing more information about the enhancements we've made to the compiler - especially in the area of code generation.  We'll be sure to also share information about throughput.

  • /Od does not make any difference. VS 2008 c++ compiler is still much slower than VS 2005's. I have projects of all sizes. It seems the shipped VS 2008 c++ compiler was built in debug mode :)

    We C++ programmers want lean and mean and fast. Let the bloated stuff for the .net staff.

Page 2 of 3 (38 items) 123