To introduce myself I am Ankit Asthana and I am the program manager for the backend C++ compiler. In my last blog I provided an introduction to what Profile Guided Optimization (PGO) is all about with an exercise which involved PGO’izing the NBody sample application. In this blog I would like to talk about how PGO is used within SAP to power and deliver next generation performance for SAP NetWeaver. Starting with a little bit of history, the effort to PGO’ize SAP NetWeaver was undertaken and completed by SAPs Microsoft Platforms Group. The SAP Microsoft Platforms Group is responsible for Microsoft Windows specific technology within SAP’s kernel. This effort was in particular led by Mike Gusev and Jason Kafka, both senior members of SAPs Microsoft Platforms Group who were also kind enough to provide content for this blog. With that said and done, let us get started with building some domain knowledge about SAP NetWeaver before we get into how and why PGO was introduced for SAP NetWeaver.
SAP NetWeaver is an open platform that offers a comprehensive set of technologies for running mission-critical business applications and integrating people, processes, and information. In addition, SAP NetWeaver serves as the technical foundation of SAP's Business Process Platform offerings by providing capabilities for service provisioning, composition (service consumption), and governance.
At the heart of applications running on SAP NetWeaver is SAP NetWeaver Application Server (AS) which serves as the central foundation for the entire SAP software stack. SAP NetWeaver Application Server (AS) comes in two flavors ABAP and Java. AS serves as an abstraction layer and handles calls from ABAP/Java applications to the underlying combination of operating system and database. Each Application server consists of a set of processes. These processes have one of the following roles (Dispatcher, Worker or Background). An Application Server has exactly one Dispatcher Process and several Worker processes which are needed for load distribution. The Dispatcher process is responsible for spawning additional processes and dispatching requests from the frontend to free Worker processes. The Background process is a special worker process. Each of these processes share the same application code (i.e. are spawned from the same binary) which is called the SAP Kernel.
The primary data loop of the SAP kernel is single threaded by design. Although there exist dedicated threads for signal handling and high precision timing, the core itself is single threaded. The core consists of several modules which are mainly written in C, spanning several million lines of code. Newer kernel generations are designed differently, especially with multi-core machines in mind. Nevertheless the majority of customer base runs SAP NetWeaver products with the above described kernel architecture. Furthermore, the single threaded kernel, consists of a codebase which is extremely hard to refactor and given the very slow (if at all) growth in single core performance of modern CPU's , using PGO was a logical low cost step moving forward in order to gain more performance without redesigning, and possibly breaking, the current software.
Before we get dig deep into the how, let us do a recap of how Profile Guided Optimization (PGO) works. There are essentially three steps required in PGO'izing an application (Instrument, Train and Optimize) as shown in figure 1 below. Remember the 'Instrument' and 'Optimize' steps require an 'LTCG:PGINSTRUMENT' and 'LTCG:PGOPTIMIZE' build respectively (for more information please take a look at my last blog post). Figure 1: Steps involved in PGO'izing an application
When incorporating PGO as a part of their build process the SAP NetWeaver team had the following two goals in mind:
Maintain the current build process!The first goal was essential and can be applied generically for any product. The SAP NetWeaver team has a huge developer base which relies heavily on a working kernel with the latest changes which is produced by automated nightly builds. The PGO process was enabled as a part of this nightly build. As a failsafe, if any of the PGO steps were to fail (Instrument, Train or Optimize) the build would revert to provide a /O2 build. Alternatively, if any PGO steps were to fail the build could also be reverted back to an /LTCG build in the process retaining most of the performance when compared to an /O2 build.
Automate! Automate! Automate!As I mentioned, the PGO process for SAP NetWeaver was encapsulated within the automated nightly build process for the SAP Kernel. Although not required, the SAP NetWeaver team went down the approach of automating the PGO process. The major reasoning behind this was that the SAP NetWeaver team did not want to have a dedicated person/team responsible for training the Application Server and moving the resulting performance profiles (.pgd files) into the build process. Additionally adding weight to this goal was the fact that the build processes were supervised by a different department which was not knowledgeable in the art of compiler/linker error analysis or maintaining application server training scenarios. Again to clarify unless your code base significantly diverges on a daily basis (i.e. lot of churn in the code base) you are not required to perform 'instrument', 'train' steps in the PGO process. An optimized PGO build can be built repeatedly (without the instrumentation and training step) until the source base has vastly diverged from the last time the application was PGO-trained. For large development teams, an ideal workflow is for one developer doing PGO and checking the training data (.pgd) into the source repository. Other developers in the team should then be able to sync their code repositories and use the training data file to directly build PGO optimized builds. When the counts go stale over period of time, the application is PGO-retrained.
Performance gains observed by PGO’izing an application are directly proportional to how well the application was PGO-trained. In other words, the secret sauce behind getting the most out of PGO is choosing the right training scenario. Good training scenarios are typically modeled after real life performance centric user scenarios. There lies a fine balance between choosing a training scenario which is too specialized or too generalized. If it is too specialized, you will only optimize code used within few use cases, if it is too general you might miss that sweet spot which is relevant for most of your customers.
SAP’s Microsoft Platform group chose one of SAP’s Standard Application Benchmarks as training scenario. The benchmark simulates thousands of users performing typical business transactions in parallel. The benchmark puts a massive CPU intensive workload onto the Application Server and executes the relevant code paths in the SAP kernel in the process PGO-training them.
Performance is measured in SAP Application Performance Standard (SAPS) units, which basically describes how many business transactions could be processed per hour by the Application Server.
Using SAP Kernel built with /LTCG as a baseline for performance comparison and using the VS2010 toolset. The performance gain observed with PGO’izing the SAP kernel was up to 20% and in certain scenarios even higher.
This should provide you folks with an idea on how PGO is used to make the SAP NetWeaver more performant. Again to emphasize if your product’s core performance centric scenarios are native and the bottleneck is CPU bound PGO is worth a shot. In such scenarios PGO results in improved performance for a subset of important user centric scenarios without changing the source code of your application. In my future blogs I will try to cover more case studies for products using PGO and follow it up with a blog on ‘PGO under the hood’. So stay tuned! Additionally, if you would like us to blog about some other PGO-related content please do let me know.
Did you get a mail considering the memory usage problem when using PGO when building Firefox?
Oops, wrong link, should have been https://bugzil.la/845840