Welcome to MSDN Blogs Sign in | Join | Help

Quick Tips On Using Whole Program Optimization

Hi, I’m Jerry Goodwin from the Visual C++ code generation and optimization team, with a couple quick tips on using Whole Program Optimization, also referred to as Link Time Code Generation (LTCG).

 

If you’re writing native C++ code, you can typically speed up your optimized code by about another 3-4% by adding the /GL flag to your compiles.  This flag tells the compiler to defer code generation until you link your program. Then at link time the linker calls back to the compiler to finish compilation. If you compile all your sources this way, the compiler optimizes your program as a whole rather than one source file at a time. For users building with the IDE, this option is found under the C/C++ optimization settings, and is already on by default for retail builds in new projects.

 

Using Whole Program Optimization provides the optimizer with a number of extra optimization opportunities, but I’ll give just one example. Many people are already familiar with the benefits of inlining a called function into the caller. We can only do inlining when we are generating code for both the calling function and the called function at the same time. With Link Time Code Gen we can inline functions from one source file into callers defined in another source file, as long as both source files were compiled with /GL.

 

If you do use /GL, here are four caveats to keep in mind:

 

1.       When building from the command line or via makefiles, you need to add the /LTCG switch to the link command line to tell the linker to expect to see one or more object files that were compiled with /GL. If you don’t, some build time will be wasted because the linker will have to start over when it gets to the module compiled with /GL. If you build through the IDE this is in your project configuration settings on the Linker optimization page.

2.       Using /GL reduces your compile times, but your link time will increase, because work is being moved to during the link. Overall build time might increase a little, but shouldn’t increase a lot.

3.       Don’t compile managed code with /GL. Link time code gen provides little or no benefit to managed code, and this option combination (/GL /clr) is being removed in the next compiler release, so you can future-proof your build by using link time code generation only for native code. If you’re building managed code using the IDE, the default setting is to use /GL in release builds, and I recommend you disable it for managed code. For mixed managed and native code, compiling only the native code with /GL and linking with /LTCG gives best results.

4.       Never use /GL for code you intend to put in a library and ship to your customers. Doing so means that your customers will be doing the code gen for your library when they link their application. Since some of your customers could have different versions of the compiler, shipping a lib built this way could cause various maintenance problems for you. If your customer’s compiler is from a prior release, their link may fail. If their version is newer than yours, the code they generate won’t be exactly equal to what you’ve tested, and could behave differently for them than when you tested it. In VS 2008, the IDE default for the class library template release configuration is to build using /GL, and I strongly encourage everyone to reset that.

 

Here are links for more information on this topic:

 

The /GL compiler switch (http://msdn.microsoft.com/en-us/library/0zza0de8.aspx)

            The /LTCG linker switch (http://msdn.microsoft.com/en-us/library/xbf3tbeh.aspx)

            A detailed article about Link Time Code Generation (http://msdn.microsoft.com/en-us/magazine/cc301698.aspx)

 

Published Tuesday, February 24, 2009 8:52 AM by vcblog

Comments

# Click & Solve » Quick Tips On Using Whole Program Optimization

Tuesday, February 24, 2009 1:26 PM by Roger Lipscombe

# re: Quick Tips On Using Whole Program Optimization

That, and there's a bug with /GL and /LTCG that occasionally causes the build to hang when combined with /MP (use multiple processors).

I reported this on Connect a while ago, and a fix is now apparently available via PSS.

Tuesday, February 24, 2009 4:15 PM by Ageq

# re: Quick Tips On Using Whole Program Optimization

"Source file" here is not qualified.

Does it mean both .h and .cpp files?

I was pretty sure one of the options didn't like header only implementation.

Tuesday, February 24, 2009 4:40 PM by Jerry Goodwin

# re: Quick Tips On Using Whole Program Optimization

It means whatever is listed on the command line when passed to the compiler, typically a .c or .cpp file (though any file type can be compiled using the /Tp or /TP flags). The compiler produces one .obj file for each source file compiled.

Wednesday, February 25, 2009 5:11 AM by Ageq

# re: Quick Tips On Using Whole Program Optimization

Then I must have confused it with another company that said that only if you seperate your headers and implementation in .cpp files you would benefit from some XYZ Company VC++ optimisation feature..

I don't see many obj files produced for header files with implementation/templates/etc to be frank.. (reads: any).

Wednesday, February 25, 2009 12:42 PM by Les

# re: Quick Tips On Using Whole Program Optimization

in #4 "shipping a lib built this way" by lib I am assuming you mean a static library, not an import library. Correct?

Wednesday, February 25, 2009 4:34 PM by Greg

# Link/compiler emit message on inlining

Can we get the compiler/linker to emit a message stating the function X was automatically inlined?

Wednesday, February 25, 2009 8:49 PM by Jerry Goodwin

# re: Quick Tips On Using Whole Program Optimization

re #4, yes I was talking about static libs. If you are shipping a .dll with an import library, you can use LTCG to build the .dll and it should be beneficial. The import lib will be fine, too. Thanks for the clarification.

Wednesday, February 25, 2009 9:05 PM by Jerry Goodwin

# re:LInk/Compiler emit message on inlining

In a non-LTCG compile, we already have W4-level warnings that indicate when a function you didn't ask to be inlined was inlined (C4711) and for the case where you did ask for something to be inlined but it wasn't (C4710). There's also the C4714 case where a function you marked __forceinline can't be inlined for some reason.

These messages are emitted during code generation, so in the LTCG case you'd see the same messages emitted during the link step. But to enable them you have to turn them on when compiling, because they're compiler command line options, not linker command line options. All the options the compiler would normally pass to the code generator in a regular compile are saved in the /GL-format .obj file and then the code generator honors them when the linker calls the code generator at link time. Another option like this is /wdNNNN (which you might want to use along with /W4 to filter warnings you aren't interested in).

Wednesday, February 25, 2009 9:26 PM by Jerry Goodwin

# re: Quick Tips On Using Whole Program Optimization

A slight correction: C4711 is off by default, so you'd need to use /Wall to get it turned on from the cl.exe command line. Another way would be to add this to your source files:

#pragma warning ( 1 : 4711 )

Thursday, February 26, 2009 6:28 PM by Jon

# re: Quick Tips On Using Whole Program Optimization

Of course, any 3-4% performance gain is swamped by the possible 30-40% performance loss resulting from a bug introduced in Visual Studio 2008 SP1. The last response to reports of this bug via Connect was that the performance loss is "by design" (Connect bug ID 389232). Also see IDs 383764 and 402589.

Friday, February 27, 2009 2:28 PM by o[po

# re: Quick Tips On Using Whole Program Optimization

LOL. Do you have the URLs to the Connect items?

I've noticed this and other compilers can really generate better especially CRT related code..

And what if we really cannot retype, repeat and repeat all the template qualifications and use only a header file?

Friday, February 27, 2009 4:51 PM by Pavel Minaev

# re: Quick Tips On Using Whole Program Optimization

Saturday, February 28, 2009 10:36 PM by Joe

# re: Quick Tips On Using Whole Program Optimization

Sounds good in theory, but in reality VC++ 6.0 still compiles faster, smaller, code than 2005/2008/2010 in almost all situations (I've yet to find a situation where it doesn't, but that doesn't mean one doesn't exist.)

Monday, March 02, 2009 8:52 AM by Dollyan

# VS 2008 C++ compiler is SLOW!

I have just installed VS 2008 alongside my old VS 2005, all of them fully updated (service packs) and it seems the VS 2008 cl.exe (Compiler Version 15.00.30729.01) is slower than the VS 2005 one (Compiler Version 14.00.50727.762). Is it correct? I have tested it with a lot of projects and it seems the 2008 one is about 30% slower than the old compiler. I am really disappointed :(

Example of cmd line (have tested tons of options but the new compiler is always slow):

cl /O2 /I "G:\Boost-1.38.0\include" /I "G:\zlib-1.2.3\include" /I "../Include" /D "WIN32" /D "NDEBUG" /D "_LIB" /D "_UNICODE" /D "UNICODE" /FD /EHsc /MD /Fo"Release\\" /Fd"..\..\Release\vc90.pdb" /W3 /c /Zi /TP /MP "file1.cpp" "file2.cpp" "file3.cpp" "file4.cpp" "file5.cpp"  "file6.cpp" "file7.cpp" "file8.cpp"

Monday, March 02, 2009 8:07 PM by Jerry Goodwin

# re: Quick Tips On Using Whole Program Optimization

@Dollyan, it's not clear whether you're talking about compiler throughput (compile time) or compiler code quality (run time of the compiled code). We continuously benchmark our compiler against a range of performance suites and I'm sure we didn't see any 30% drops.

If you're talking about code quality I suspect you may be comparing an optimized build from VS 2005 to a non-optimized build from VS 2008. The issue linked above has something to do with losing an optimization setting in the project during conversion from VS 2005 to VS 2008, but I'm not up on the details of that issue. That problem was fixed in the VS 2008 SP1 update, though you may need to reset some of your project settings for the projects you have already converted.

If you suspect that is the problem, you can eliminate the IDE and its build system from the variables by building from the VS Command Prompt. If you can reproduce this performance drop with sources you are willing to share with us we'd request that you open an MSConnect issue, because we'd be happy to be able to look at the issue more closely than we can via vcblog.

Tuesday, March 03, 2009 11:34 AM by Joe

# re: Quick Tips On Using Whole Program Optimization

@Dollyan, I run VC++ 5, VS 2005 and VS 2008 all side-by-side (and use all three!) VC6 is a speed demon, but I've found so little difference between VS 2005 and VS 2008 performance, it would be easy to confuse which one is being used. This doesn't mean you aren't seeing a performance drop, but just that it's not a global thing.

Tuesday, March 03, 2009 11:43 AM by Joe

# re: Quick Tips On Using Whole Program Optimization

By the way, in reference to the above link. I reported a bug last year which turned out to be the same thing. That this issue was closed with

"by design" is EXACTLY what pisses me off about the Visual Studio team. This is a bug--anyone who designed such a thing would be a complete moron. I can't count the number of such bug submissions I've made where the VS team has just dismissed them out-of-hand.

A perfect example is my posting above about release performance and size. Executables are measurably more bloated with 2005/2008/2010 than with VC++6. The CRT startup code has always been horrible, but I've yet to figure out the rest. VC++ 6 apps are also faster. If you really want to make VS2010 the next 6, fix these two problems. (If I want bloat, I'll use .NET; I use C++ precisely because I want lean and mean; give that back, please. Better yet, make it so the executable code is leaner and meaner.)

Tuesday, March 03, 2009 2:54 PM by Oa

# re: Quick Tips On Using Whole Program Optimization

vote+1 on mean and lean, rather than bloated and slower ( a la everything .NET ).

Tuesday, March 03, 2009 3:59 PM by longtime c++/Mfc dev

# re: Quick Tips On Using Whole Program Optimization

vote+2 on lean, mean and small.

Tuesday, March 03, 2009 3:59 PM by longtime c++/Mfc dev

# re: Quick Tips On Using Whole Program Optimization

vote+2 on lean, mean and small.

Tuesday, March 03, 2009 4:20 PM by Dollyan

# re: VS 2008 C++ compiler is SLOW!

@Jerry Goodwin

I am talking about compile time, not generated code quality. As I posted before, using the following cmd line (from the VS 2005 and 2008 cmd prompts) shows a compile time for VS 2008 about 30% bigger than with VS 2005's compiler. I have tested it with a lot of compiler options and diferent projects, and the VS 2008 C++ compiler is always a lot slower.

cl /O2 /I "G:\Boost-1.38.0\include" /I "G:\zlib-1.2.3\include" /I "../Include" /D "WIN32" /D "NDEBUG" /D "_LIB" /D "_UNICODE" /D "UNICODE" /FD /EHsc /MD /Fo"Release\\" /Fd"..\..\Release\vc90.pdb" /W3 /c /Zi /TP /MP "file1.cpp" "file2.cpp" "file3.cpp" "file4.cpp" "file5.cpp"  "file6.cpp" "file7.cpp" "file8.cpp"

I have a core2duo and tested both compilers with and without /mp.

Wednesday, March 04, 2009 9:29 AM by JK

# re: Quick Tips On Using Whole Program Optimization

It's the link time that kills me.  There's a serious problem when it takes over 20 minutes to link a DLL.  Incremental linking is faster but we often get an error that we've run out of memory for the ILK file.

Thursday, March 05, 2009 4:25 AM by YTQ

# re: Quick Tips On Using Whole Program Optimization

So if VS2010 is new VC6, why doesn't it make tight code as VC6?

VC6 is an outdated compiler requiring better language support, but code generation should be better not worse. And it is getting worse and worse, templates are not parsed properly and lead to bloated code when it never should.

Same goes for CRT. Make it decent for once.

Tuesday, March 10, 2009 4:53 PM by greg

# VS / CRT / 6 vs 10

VC6 and the CRT from that era didn't have the level of extra pointer checking and argument validation that VS 2005 and later does.  That means slightly more bloated and slower code with less buffer overruns.

Wednesday, March 11, 2009 4:07 PM by Jerry Goodwin

# re: Quick Tips On Using Whole Program Optimization

Compile time (compiler throughput) has to be traded off against code quality (better, tighter, faster generated code). Some customers are going to be more concerned about one, some about the other, depending on what they're doing. We have to try to find a good balance. Obviously for /Od (non-optimized) compiles we strongly favor throughput over code quality. For optimized compiles it’s a much more difficult balance.

I wasn't here during the VC6 to VC7 development cycle, but in general we're constantly trying to improve the compiler by adding new optimizations. Before any new optimization goes in the programmer has to measure and report the impact on code quality and runtime performance. If the change isn't a win, it doesn't go in. We also do continuous measurement of the compiler to detect any performance regressions. Occasionally we discover a bug in the compiler that has to be fixed in a way that impacts performance. Correctness trumps performance.

There's also tuning that goes on, each release of the compiler is targeted at the versions of the processors that are in production at the time we release the compiler. The processor manufacturers are constantly tweaking their designs so that's something of a moving target.

With all these "moving parts", it’s very easy to pick a single measurement (e.g. throughput) and show a single non-representative case where the new compiler is worse than the previous one. And for a given customer, that one case may be the only case that's important to them, which is unfortunate.

The message I'm trying to give is that we pay close attention to all these issues, work hard to constantly improve, continuously measure and report internally, and only accept performance hits for correctness or security issues.

I'm not well positioned to respond to the whole issue related to the MSConnect bug linked above, because it's about the IDE not the code generator, but I've been talking to the folks that are. They've been doing some re-investigation, and I'm hoping you'll see something here on that topic soon. The bottom line as I see it is that some users were accidentally changed from an optimized compile using VC 8 to a non-optimized compile using VC 9. That's the only way anyone is going to see a 30% runtime performance degradation between the two releases, and it’s due to a change in the command line parameters passed to the compiler, not the compiler itself. If you are seeing a significant performance reduction between any two releases, I would encourage you to first validate that you're comparing apples to apples by building the same code from the command line with the same command line flags. If that shows a performance regression, then please open an issue with MSConnect if you can provide us with something we can reproduce here.

Jerry Goodwin

Wednesday, March 11, 2009 4:11 PM by Jerry Goodwin

# re: Quick Tips On Using Whole Program Optimization

PS:

Similarly, if you are seeing that VC 6 generates better, tighter, faster code than VC 9, and you’re willing to share part of your code with us, please open an issue with MSConnect. We would certainly like to understand places where our compiler’s performance isn’t as good as it used to be.

Saturday, March 14, 2009 1:07 AM by Dollyan

# re: Quick Tips On Using Whole Program Optimization

@Jerry Goodwin

Dude, didnt you read what i wrote? ITS NOT ABOUT RUNTIME PERFORMANCE, ITS ABOUT COMPILE TIME. And I have used CMD PROMPTS AS I SAID ABOVE, with the same parameters, with only simple parameters , and with a lot of different parameter configurations. Compiler Version 15.00.30729.01 is always about 30% slower than 14.00.50727.762).

Repeating: I MEAN COMPILE TIME, NOT RUNTIME PERFORMANCE. I have more than 10 projects, all of them compile 30% slower with the new vs 2008 c++ compiler. And yes, I have checked all the parameters, they are all correct, all valid, i have tested with tons of parameter cfgs and its always the same.

Sunday, March 15, 2009 1:41 AM by Andre Vachon

# # re: VS 2008 C++ compiler is SLOW!

As Gerry mentioned, we make tradeoffs between compile time and the quality of the code we generate.  /Od is about keeping build time quick at the expense of code quality, while /O2 builds generate better code at the cost of long builds.  As we strive to improve our code quality for /O2 builds, you should see build times increase with newer compilers.  I noticed /O2 was one of the parameters you specified on your command line, so it would be interesting to know if you get the same performance changes with /Od instead.

The compiler team does spend a good amount of time investigating compiler throughput.  We understand it's critical to many customers, not only in /Od builds, but also /O2 builds.  We have been focusing our performance analysis on comparing VS2010 to VS 2008.  Given your feedback, we'll need to broaden our comparisons to include older versions of the compiler as well.

For anyone posting about problems with the compiler, or any tool for that matter, sharing detailed information is critical.  Dollyan, the compiler flags were quite useful, but having some idea as to the size of the projects and how many libraries it pulls in would helpful in understanding the cause of the regression.

In the coming months we plan on sharing more information about the enhancements we've made to the compiler - especially in the area of code generation.  We'll be sure to also share information about throughput.

Sunday, March 15, 2009 1:02 PM by Dollyan

# re: Quick Tips On Using Whole Program Optimization

/Od does not make any difference. VS 2008 c++ compiler is still much slower than VS 2005's. I have projects of all sizes. It seems the shipped VS 2008 c++ compiler was built in debug mode :)

We C++ programmers want lean and mean and fast. Let the bloated stuff for the .net staff.

Monday, March 16, 2009 10:47 AM by longtime c++/Mfc dev

# re: Quick Tips On Using Whole Program Optimization

I agree with the lean and mean sentiment.  But for optimized release builds I prefer faster generated code at the expense of slower code generation. (e.g. faster runtime with slower compilation)

Tuesday, March 17, 2009 7:55 PM by observer

# "30% compile time hit"

@Dollyann: it seems you are one of those folks that would rather complain than help.

If you have this problem for so many different configurations, then you should have no trouble submitting a sample.

If it's really causing you a 30% performance hit, then this might be worth your time to do.  Plus you'll be helping out everyone else.

Wednesday, March 18, 2009 12:20 PM by Mark Hall (VC++)

# re: "30% compile time hit"

@Dollyann, if you can't submit a sample, here's a couple of things you can do.

1. Add -P to the command line arguments for both VS2005 and VS2008.  How much bigger is the resulting .i file for VS2008 vs. VS2005?  This would help us determine if the compiler is slower, or is just compiling more source code.  That will help us know what we're dealing with.

2. I didn't see any precompiled header flags in your sample command line.  Use of precompiled headers could slash your compile times by 2x (or 10x, depending on the size of your headers).  Using precompiled headers will usually offset (or at least amortize) any losses from release to release in compile time due to larger header files.

Saturday, March 21, 2009 12:47 PM by Dollyan

# re: Quick Tips On Using Whole Program Optimization

I have compiled with -P and VC2008 is generating .i files with about 3 times the size of VC2005 ones. So its not the compiler. How can that happen? Maybe its because VC2008 uses the windows sdk? Maybe the windows headers are including more and more headers than the old ones.

And yes, i usually dont use precompiled headers because the compile time was always fine.

Tuesday, March 24, 2009 2:50 PM by observer

# re: Quick Tips On Using Whole Program Optimization

Cool Dollyan, now we're getting somewhere.

As a person about to switch from 05 to 08, I'm interested.

Are you including windows.h by any chance?

I wonder where the header bloat is coming from...

Tuesday, March 24, 2009 11:55 PM by Dollyan

# re: Quick Tips On Using Whole Program Optimization

Yes, windows.h and standard c++ headers.

Thursday, April 16, 2009 3:14 PM by Yuhong Bao

# re: Quick Tips On Using Whole Program Optimization

I can understand how code compiled to CIL make LTCG almost useless, but can't #pragma unmanaged code in source files compiled with /clr still benefit from LTCG?

Friday, April 17, 2009 7:40 PM by Andre Vachon

# re: Quick Tips On Using Whole Program Optimization

You are right, allowing LTCG for #pragma unmanaged code even when building with /clr could be beneficial.

However, we evaluated how many customers were using this feature (using SQM data) and found it to be a very marginal scenario.  We also discussed this change with some major customers and none reported it as an important feature for them.

We came to the conclusion that the cost of maintaining and moving forward a very complex implementation outweighed the benefit of this feature, so it has been removed.

We instead want to focus on improving LTCG codegen for the 99.99% case where the binary is completely native.

New Comments to this post are disabled
 
Page view tracker