/GL and PGO

/GL and PGO

  • Comments 24

Hi, I’m Lin Xu, a Program Manager working on the C++ compiler.

Recently, we collated performance numbers from our testing passes over this release cycle. We track many different benchmarks closely for all of the architectures and switch options (/O1, /O2, /GL, /PGO). We also track these across multiple CPU models. (Yes, this is quite a big matrix. Look for an upcoming blog post from the QA team to learn more.)

We’re pretty excited about the improvements we made for this release in code quality. (Read Ten’s recent post about it here) As I looked at the numbers, one thing jumped out at me: To really take advantage of these improvements, applications need to be compiled with /GL, and PGO, if possible.

If you aren’t familiar with PGO, you can read Lawrence’s blog post on Profile Guided Optimization here.

I’ve summarized some of our data comparing VS2010 Beta2 with VS2008 SP1.0. Here is a comparison between integer benchmark performance with various switches on x86 and x64:

These particular graphs are based on a benchmark suite similar to SPEC CPU 2006. But our benchmarks include real world code as well. We build and measure performance of many Microsoft products, including SQL, Windows and Office.

 Let’s say you currently build release builds with the /O2 switch in VS2008. If you moved to VS2010, you might see on x64:

·         10% faster code if you turned on /GL,

·         16% faster code if you turned on /GL and PGO

and on x86,

·         7% faster code if you turned on /GL,

·         13% faster code if you turned on /GL and PGO.

Now, for the last couple of releases, a new VC++ project will have /GL on for release builds. However, settings for upgraded projects are not changed. So whether you use the Visual Studio build system or your own custom build system, go ahead and check that you are specifying /GL for your release builds!

 The other recommendation I have is to use PGO. Doing so requires a larger investment (it requires you to figure out scenarios and create training data) – but it can improve the performance of your app above and beyond /GL. PGO works best on medium or larger applications. Small applications may see little benefit from PGO, depending on the application’s workload.

We recently created training data and turned on PGO for part of the C++ intellisense engine in Visual Studio 2010 and saw ~25% better performance on some scenarios. When we turned on PGO for the compiler, we measured ~10% speedup in compiler throughput. Again, you can learn about how to turn on PGO in your own builds in Lawrence’s blog post here.

/GL shouldn’t increase your build time significantly, but note that /GL is not compatible with Edit and Continue (/ZI) and incremental builds (the linker option /INCREMENTAL). You can read some quick tips about /GL in another previous blog post here.

As Lawrence describes in his blog post, with PGO your application is built twice – once for the instrumented build and once for the final optimized build. This means build times increase more significantly, as but as I’ve noted above, the performance increase is also more significant.

So, if your application is CPU bound, I hope that these numbers will convince you to take a second look at your release build settings and turn on /GL and PGO!


Leave a Comment
  • Please add 4 and 7 and type the answer here:
  • Post
  • Those graphs are very misleading. Note the squiggle on the x-axis.

  • I'd say that the text is misleading as well. The way I read the charts is that LTCG without PGO is not worth it, as it only gives about 2.5% on x86 and 1% on x64 compared to a non-LTCG build on the same compiler. This has been my experience as well and is the reason I always end up turning off LTCG, because the very small performance increase is not worth the rather large increase in build time, which hits hard if you are doing continuous builds. I think when we looked at it, /GL caused our compile+link time to skyrocket from about 20 minutes to 45 minutes.

    LTCG with PGO looks slightly more advantageous -- about 9% -- but with an even greater hit to build time and the added hassle of creating and managing the profile data, it looks even less palatable to me than plain /GL.

  • First off, if your application is not CPU intensive, generated code quality will have little impact on overall performance.  If your app spends all it’s time waiting on IO operations or other OS operations, LTCG and PGO will have marginal benefit.  Profiling your application is always a must in order to understand what needs to be optimized.

    But if your app is CPU intensive, then these compiler options are worth looking at.

    /GL (aka LTCG aka Link Time Code Generation) enables, among other things, much better inlining across the whole program.  Code that is written with many very small functions that get called often will see great improvements from using /GL.

    PGO is all about hot\cold code separation.  So large programs that have lots of control flow, with lots of error handling code, will see the best benefit.

    Phaeron, I’m happy you guys tried the feature and did your own performance analysis.  I hope other people take the time to do that analysis and see if these options provide additional performance for their applications.  These flags do require more compile time, and they also make code harder to debug.  Many customers choose to do day-to-day development using  /Od, which compiles very quickly and provides the best debugging experience, but generates very poor code quality.  Later in the cycle, when most development is complete, then they turn on more optimizations to ship the best performing application possible.  This is again not appropriate in all cases, but a commonly used practice to reduce the development time.

    Andre Vachon

    Lead Program Manager

    C++ compiler

  • Agreed with Phaeron, the text is misleading. Please, treat your customers with just a minimum of respect, write something honest, rather than trying to make your product look better than it is. (Especially since your product is actually pretty decent already, it doesn't need to be wrapped in misleading statements) Leave that to the marketing people.

    If you want to tell us that "VC10 generates really good code", then compare *the same flags* for VC9 and VC10. Saying "VC10 with all optimizations enabled is faster than VC9 without optimizations" is not news. It's not impressive. It's so goddamn obvious it makes me wonder what you're trying to hide.

    I'm sure what you meant to say instead, which would be honest, was:

    Let’s say you currently build release builds with VS2008. If you moved to VS2010, you might see on x64:

    ·         8% faster code with just /O2,

    ·         9% faster code if you use /GL,

    ·         8% faster code if use /GL and PGO

    and on x86,

    ·         4% faster code with just /O2,

    ·         5% faster code if you use /GL,

    ·         2% faster code if you use /GL and PGO.

    The silly thing is that these numbers are still pretty respectable. So why couldn't you just give us those, rather than faking some ridiculously high numbers by comparing VC9 with few optimizations against VC10 with everything enabled?

    Of course, if the point of the post was instead to convince us that /GL and PGO are awesome optimization settings, then compare apples to apples, and show us that by listing the numbers for *one* compiler, not by comparing VC9 vs VC10.

  • Faster in SPEC-like code is nice (Intel always manages to get its compiler to do that well, even if it cheats to do it).  If my code ran CPUs up 100%, all the time, then I can see this being useful for more than marketing.  Alas, my code runs CPUs about 1% typically.  Making that even twice as fast (100% faster) wouldn't amount to anything anyone would notice.  A better UI would be much more welcome, but I suppose you don't do UI.  The VS UI needs to be thrown out and started from scratch.  I don't mean the nuts and bolts that hold it together, but the whole concept of how it works.  Start that over.

  • Well, I never trust exact figures for these kinds of tests because they are system dependent. But the biggest thing that they are saying is that they have been spending time improving their code quality and things like that are always welcome.

    @jalf: If you notice the graphs show

    VS: 2008


    /O2 /GL

    /O2 /GL /LTGC:PGO

    VS: 2010


    /O2 /GL

    /O2 /GL /LTGC:PGO

    This is a bit of a stretch but it does show somewhat that they gave data for like builds. I too would have preferred that they give the full compiler/linker command lines but I'm sure if we nag enough they will give them.


    Remember that the VC compiler team has nothing to do with the VS IDE.

  • What I'm more interested in is the size of compiled code, especially for ARM processors.

  • We do actively track code size as part of our metrics, but smaller code is not always faster.

    Seperating hot code from code code (which is what PGO does) can drastically reduce working set, even if raw code size grows in certain cases.

    Making very hot code larger - possibly by unrolling loops - can often lead to faster execution at the cost of slightly larger code.

    These are unfortunately all trade offs the compiler must make at code generation time.  By using PGO, the compiler can rely on real profile information to make the best possible code generation decision.

    Visual Studio 2010 does not include an ARM targeting compiler.

    -Andre Vachon

    Lead Program Manager

    C++ Compiler

  • >> "Visual Studio 2010 does not include an ARM targeting compiler."

    Does this mean no Windows CE support in Visual Studio 2010?

  • Did these tests include math or array intensive processing such as image processing, numerical analysis, etc?

  • Greg,

    These two links tell you the contents of the SPEC benchmarks



    Andre Vachon

    Lead Program Manager

    C++ Compiler

  • Jeremy

    That is correct.  There is no WinCE support in Visual Studio 2010.

    Visual Studio 2008 SP1 is the platform for CE7 development

    Andre Vachon

    Lead Program Manager

    C++ Compiler

  • http://msdn.microsoft.com/en-us/library/sa69he4t(VS.100).aspx

  • I'm developing a CPU intensive program, so the performance advances would be more than welcome.

    However, turning just /GL on on VC9 SP1 makes it die of internal error (submitted bug, heard nothing back). Without /GL the compiler or linker has never crashed.

    The debugger and intellisense, however, are crashing left and right. The more difficult bug you are trying to debug, the more crash-happy the debugger will be. Track a bug down for hours and finally just about to nail it - crash. Wrt intellisense, I pretty good at avoiding ctrl-space (at least without ctrl-s first), but I occasionally do it and 50% of the time, it crashes.

    Also, if you leave debugger at a breakpoint and go away for few hours (meetings, ugh), when you come back and continue, stop debugging or anything, the whole machine hangs. Occasionally, it comes back after tens of minutes, but often I have to powercycle.

    Please, before using man years to optimizing few 1/1000s of the running time, make the damn thing stable. Not everyone codes C#, C++ side of VC it just plain rotten as it is.

    (Disclaimer: You can always suspect hardware, but this is ECC memory, and the actual program just about never encounters *unexplainable* crashes. The OS doesn't BSOD either.)

  • vherva,

    Sorry you had such a bad experienced with VS9 with LTCG.  I'd be very interested in hearing about your experience with this feature in VS 2010.

    -Andre Vachon

    Lead Program Manager

    C++ Compiler

Page 1 of 2 (24 items) 12