Pogo aka PGO aka Profile Guided Optimization
My name is Lawrence Joel and I am a Software Developer Engineer in Testing working with the C/C++ Backend Compiler group. For today's topic I want to blog about a pretty cool compiler optimization called Profile Guided Optimization (PGO or Pogo as we in the C/C++ team would like to call it). The tool is available for Microsoft Visual C/C++ 2005 and up. In this blog I will give a description of what PGO is, how it will improve your application and how to use it.
What is PGO?
PGO is an approach to optimization where the compiler uses profile information to make better optimization decisions for the program. Profiling is the process of gathering information of how the program is used during runtime. In a nutshell, PGO is optimizations based on user scenarios whereas static optimizations rely on the source file structure.
PGO has a three phase approach. The first phase can be known as the instrumental phase (see figure 1). With the instrumental phase, the linker takes the cil files (these are produced by the frontend compiler with /GL flag, eg. Cl.exe foo.cpp /GL) and passes the modules to the C/C++ Backend Compiler. The Backend Compiler will then inserts probe instructions wherever it is necessary. A .pgd file will be created with the executable; this is a database file that will be used in later phases. Note that the executable is bloated due to the probes.
Figure 1: Instrumentation Phase
The second phase can be known as the training phase. This is where you run the executable under different scenarios. The probes will record runtime information and save the data to a .pgc file. After each run an appname!#.pgc file will be created (where appname is the name of the running application and # is 1 + the number of appname!#.pgc files in the directory). For example, with figure 2, for each scenario run of an executable the method call information is collected and recorded in the .pgc file. Note: to use PGO effectively you should make sure that your scenarios have good coverage over your application.
Figure 2: Training Phase
The third phase can be known as the PG Optimization phase (see Figure 3). With this phase the .pgc files are merged to the .pgd file which will be used by the C/C++ Backend Compiler to make better optimization decisions on the code and thus make a more efficient executable.
Figure 3: PG Optimization Phase
What advantages does Pogo provide?
Pogo optimizes the most commonly touched areas in a program. The compiler has a better idea as to what are the common inputs and control flow for the application. Here is a partial list of the optimizations that PGO provides:
· Inlining – By weighing method calls with the number of calls per execution, the compiler can make better inlining decisions.
· Virtual Call Speculation – If a particular derived type is often passed into a method then its override method can be inlined. This helps by limiting the number of calls to the vtable.
· Basic Block Reordering – This optimization finds the most executed paths and places the basic blocks of those paths spatially closer together. This helps in locality by optimizing instruction cache usage and branch prediction. Also, code that is not used during the training phases are moved to the bottom most section. Doing this together with “function layout” described below can significantly reduced the working set (number of pages used in one time interval) of sizeable applications.
· Size/Speed Optimization - With profile information, the compiler can find out the frequency of function usage. With this information the compiler can optimize for speed on the functions that are more frequently used and optimize for size on the functions that are less frequently used.
· Function Layout - Place functions in the same sections if they mainly used together based on the profile scenario.
· Conditional Branch Optimization - An example can be for if/else blocks. If the condition is more often false then true, it would be better to have the else block before the if block.
How to use PGO?
Here are the steps for a standard usage of PGO;
1. Compile the source code files that you want to be profiled with flag /GL.
2. Link all the files with /LTCG:PGINSTRUMENT (or /LTCG:PGI). This will create a .PGD file with your executable file. Note that when you link with /LTCG:PGI, some optimization may be overridden to make way for the instrumentation. Such optimizations are in effect if you specify /Ob, /Os or /Ot.
3. Train the application by running it with different scenarios.
4. Re-Link the files with /LTCG:PGOPTIMIZE (or /LTCG:PGO) to produce an optimized image of the application.
You might find yourself in a situation where you updated the source files after the .PGD file was created. If you were to re-link the object files with /LTCG:PGO then the profile information would be ignored. If you made small changes to the source file, it would be way too costly to repeat the process in creating a new .PGD file and .pgc files. To overcome this problem you can instead re-link the files with /LTCG:PGUPDATE (or /LTCG:PGU). This flag will allow the link to compile the new source code using the original .PGD file.
Another useful part of PGO is the ability to manage your .pgc files. Visual Studios provides a tool called Pgomgr which allows you to set priorities on the trained scenarios. For example, an ATM software company notices that the most common transactions performed on their software is withdraws and deposits. It would be in there best interest to set such transactions at a higher priority over the other transactions that are made on their software. This can be done by running the following: pgomgr /merge:2 appname!1.pgc appname.pgd, this will give appname!1.pgc a weight of 2. The default weight for a .pgc file is 1. When the files are re-linked with /ltcg:pgo or /ltcg:pgu then appname!1.pgc will have higher priority over the other .pgc scenarios.
If you want to only gather profile information within an interval of execution or time then there are a couple of ways to go about it. There is a tool called Pgosweep that interrupts a running program and stores the current profile information to a new .pgc file and clears the information from the runtime data structure. For example, if you have an application that does not end and you want to differentiate between its daytime behavior vs its nighttime behavior you can do the following: pgosweep app.exe daytime.pgc. Another approach you can use is a helper method called PgoAutoSweep. PgoAutoSweep will aid when trying to partition profile information within execution. The example below was taken from MSDN’s Walkthroughs in Visual C++ 2008, “Walkthrough: Using Profile-Guided Optimizations” (current link: http://msdn.microsoft.com/en-us/library/xct6db7f.aspx). The example below will create two .PGC files. The first contains data that describes the runtime behavior until count is equal to 3, and the second contains the data collected after this point until application termination.
int count = 10;
int g = 0;
printf("hello from func2 %d\n", count);
printf("hello from func1 %d\n", count);
if (count == 3)
g = 1;
Note: To build the example I had to write cl app.cpp /GL "%VSPATH%\VC\lib\pgobootrun.lib", where %VSPATH% is the path to your latest Microsoft Visual Studio program directory.
For more information on PGO please read Kang Su’s excellent article under MSDN’s Unmanaged C++ Articles titled “Profile-Guided Optimization with Microsoft Visual C++ 2005”. Current link: http://msdn.microsoft.com/en-us/library/aa289170.aspx. For information on PGO usage you can look at MSDN’s C/C++ Build Tools section “Profile-Guided Optimizations” page. Current link: http://msdn.microsoft.com/en-us/library/e7k32f4k.aspx .
Does PGO work with VC2008 Express Edition?
Interesting and Useful.
Nowadays VC blog is active and hot.
Will PGO work with OpenMP in VC10
Very nice article. Can be very useful for some of our current project. Thanks a lot...
Profile-guided optimization is nice, but I would like to see more support for static hint annotations to the compiler to guide these optimizations. In particular, I'd like to be able to indicate in source the likelihood of a branch and which code paths are very cold or hot. My reasoning is:
1) It is difficult to ensure that the profiles are representative, especially with varying use cases and machine profiles, and with non-automated UI paths.
2) Static source-based annotations can be more portable between compilers and different versions of the same compiler. For instance, if VC++ supported likely/unlikely, I could use the same annotations for GCC and VC++. I also wouldn't have to worry about not being able to carry my hints forward due to an incompatible binary profiling format change in a future version of VC++.
3) The profile may undesirably bias the compiler toward an unwanted optimization path. For instance, the compiler may see that a particular branch biases 90% toward one side and strongly reorganize the code for an overall win, but I may want to combat that because the 10% case is along the critical path in a real-time scenario. On the other hand, I may not want to bloat my executable to save 20ms in a UI error path.
4) The probes may unacceptably change the performance profile of the application and thus invalidate the generated profiling data.
One of the frustrating aspects of the C++ compiler over the releases after VS 6.0 was that little performance improvements in generated code appeared. We do heavy batch processing with long running mostly integer computations and have not seen run time performance improvements.
I agree with Phaeron's thoughts on this matter. PGO is indeed a nice capability, but it can be difficult to create profiles that are consistently representative of critical-path use cases.
Optimization hints via source annotation would be most welcome.
How does this work if you have multiple DLLs? We have a .net exe that uses multiple c++ dlls. I haven't figured out a way to use PGO in a multiple-project way.
Couldn't static annotations suffer from misuse and other problems that usually plague optimising code?
At compile time it would be very hard to predict which way the code would be used and you could then end up with a case where your annotations cause someones build to keep hitting the wrong branch prediction or the slowest code paths.
Of course, I'm not saying that PGO solves these problems, I am just saying that it does at least get information from the execution of the program rather than how you think the program will behave based on your design.
In the end any kind of optimisations of this type can be tricky to use since you always run the risk of training it the wrong way, or you get it annotated in the wrong way and one size doesn't always fit all.
Static annotations don't mean non-profile-guided. One of the advantages of having them is that you could manually tune based on profiling data other than that generated by Microsoft's tools. How am I supposed to influence POGO based on a sampling run I've done in VTune? Or better yet, what about from my own profiling logic, since I don't want to give instrumented builds to end users or possibly even testing?
The 90/10 rule still applies here. It'd be stupid to try manually tuning every single branch in a program, but not all branches are hard to predict -- I can pretty safely say that I don't want my out of memory handler to be the predicted likely case. And if I've got a critical inner loop like a video decoder, you'd better believe I'd be willing to spend the time to tune and maintain those annotations.
Source code annotations have one additional advantage: they version, diff, and merge along with the rest of the source code. As part of source, they'll work with existing source mechanisms. With .pgc files, you've got binary blobs that require special attention in your build and submission processes.
Now, if the POGO tools could turn the profiling data into annotations... THAT could be something I could really use....
>Does PGO work with VC2008 Express Edition?
Thanks for your feedback. To answer your direct question, static annotations is one of the features that has been under discussion for quite awhile inside the team. There are pros and cons in having such a feature, and we are considering what to do in this space for future releases. I think in a broader sense you are asking about how to understand and control PGO. How can I understand the decisions that PGO makes? And how do I understand and visualize the data that PGO collects? There are many features that we are considering for the future, such as an UI for exposing the PGO data, using other profile database information, incremental tuning to profile information, etc. all of which aim to increasing PGO’s usability.
I think run-time assertions also can be exploited for optimization if they're language features, not library part.
void foo( int* x, int* y, int* z )
*x = 100;
*z = 200;
*y = *x + 100;
The compiler can't optimize third statement (unless foo() function itself is inlined) as x may be equal to z thus *z=200 affects *x. If there's a way to tell compiler that x can never be equal to z, that optimization is possible.
__assert( x != z );
*y = *x + 100; // will be optimized to *y = 200;
So '__assert' will be used for debug-purpose in debug builds, but also for hits for optimization in release builds.
Yesyes, I know there's 'restricted' or kinda thing for this particular case but assertions will give more flexibility.
__assert( x < 3 );
switch( x )
case 0: ... break;
case 1: ... break;
case 2: ... break;
What do you think about it?
>>Does PGO work with VC2008 Express Edition?
Also, if you want to find out what features are apart of what edition you can look in http://msdn.microsoft.com/en-us/vstudio/products/cc149003.aspx.
If you have Visual Studio already installed on your machine, a quick way to find out it has PGO support is to look for pgort.dll under \Program Files\Microsoft Visual Studio 9.0\VC\bin\. If it exist then you have PGO support.
> One of the frustrating aspects of the C++
compiler over the releases after VS 6.0 was that little performance improvements in generated code appeared.
> We do heavy batch processing with long running mostly integer computations and have not seen run time performance improvements.
I don't think you can totally rely on compiler improvements to make your code magically run faster. Sometimes you might actually need to dust off a profiler and figure out why your code is slow.