Hello everyone, I’m Asmaa Taha; I’m a test developer in the Visual C++ compiler optimization test team.
Our Team has various testing systems that automate the testing and reporting processes to convey the state of the product to our feature teams. Some of them are useful in automating the feature testing we have for the compiler (and making them go faster!). Others are useful in automating building real world code like Windows using the latest versions of our compiler in our labs. Last but not least, we have automation to measure the performance of our compiler.
In this blog I will mainly talk about the system that measures the performance of our compiler. We call this system the VCBench System-Visual C++ Compiler Benchmark1 System.
The purpose of VCBench is to measure and tune the performance of our compiler and compare it with other compilers. Two of the major benchmarks measured in VCBench are the Spec2000 and Spec2006 (see footnote 2 below) benchmarks. They report code size and execution time of a number of primarily integer algorithms as well as a number of floating point algorithms; for this reason we call these “Code Quality” benchmarks. Code Quality benchmarks measure the speed and size of binaries generated by our compiler. We also have “Code Throughput” benchmarks that measure the time it takes our compiler to generate binaries. These are the two major performance concerns of the Visual C++ compiler.
VCBench is run on daily builds of the VC++ compiler to monitor performance changes. In the case of a performance regression, a bug will be opened and our test and developer teams will coordinate to find and fix the performance loss. In addition to regression prevention, VCBench is used by our developers working on new features. The developers will iterate compiler prototype builds several times through the VCBench system in order to tune heuristics of new optimizations. Before they can add their feature to the product they must present their performance deltas to their peers and management for sign off. Often times there are trade-offs between compiler throughput, generated code speed and generated code size that are -sometimes heatedly- debated.
Tests are run for multiple iterations to minimize noise while still maintaining acceptable throughput, then the system reports the median. We report the median as opposed to the mean due to the fact that performance results are generally not a normal distribution – they are skewed! Tests can run with different optimizations switches; in general we run a matrix to exercise as many of the optimization code paths as possible with the machine resources that we have. The outputs of these tests are saved to a SQL database. Results are available to developers through an internal website to make it easy to track results.
To reduce noise on the benchmarking machines, we take several steps:
1. Stop as many services and processes as possible.
2. Disable network driver: this will turn off the interrupts from NIC caused by broadcast packets.
3. Set the test’s processor affinity to run on one processor/core only.
4. Set the run to high priority which will decrease the number of context switches.
5. Run the test for several iterations.
VCBench allows submitting private runs, so any developer can submit a custom built compiler with any configuration. The set of configuration for private submission is a super-set of the configurations that we run on the daily builds of our product. The results for these runs are inserted in the database and can be compared to any other run, either private or daily automated run. VCBench notifies the developers when their runs finish along with whether it succeeded or failed. At this point it gives the developers the option to see the performance impact of their changes.
I hope this blog gave you an idea of how performance testing for back-end compiler is done; and there will be another blog that will compare dev10 and vc6.
1. Benchmark “A standard of measurement or evaluation." A computer benchmark is typically a computer program that performs a strictly defined set of operations (a workload) and returns some form of result (a metric) describing how the tested computer performed. Computer benchmark metrics usually measure speed (how fast was the workload completed) or throughput (how many workloads per unit time were measured). Running the same computer benchmark on multiple computers allows a comparison to be made.
2. SPEC is an acronym for the Standard Performance Evaluation Corporation. SPEC is a non-profit organization composed of computer vendors, systems integrators, universities, research organizations, publishers and consultants whose goal is to establish, maintain and endorse a standardized set of relevant benchmarks for computer systems. Although no one set of tests can fully characterize overall system performance, SPEC believes that the user community will benefit from an objective series of tests which can serve as a common reference point.
3. Noise is variation in the output if not accounted for by changes in the compiler.
PingBack from http://www.clickandsolve.com/?p=23712
Thank you for submitting this cool story - Trackback from DotNetShoutout
Is the next release going to fix the appalling floating point bug in the VC libraries reported here : http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=299211 ?
As a numerical programmer, I would far rather see the team sweat this, than sweat the last percentage out of SPEC benchmarks, because testing different floating point rounding modes is (i) by far the most cost effective way of checking for rounding induced errors in algorithms and (ii) already fully supported by all standard hardware. The lack of software support for key hardware standards over 20 years old, now virtually universally adopted, is very disappointing.
Is this written correctly: /fp:fast provides superior precision than /fp: precise. Really?
re: Performance Tests
while profiling I see a lot of performance go up in smoke using string operators (comparison, concatenation, etc). Is this area covered by VCBench? Does Dev10 show performance improvement over vc6 in these areas?
A simple few tests to convert between different compressed file formats using widely used open source code would greatly help instead of the synthetic Spec benchmarks. For example, compare wav file to mp3 conversion times using LAME command line when compiled using MSVC and GCC for a few different bit rates. Verify that the input and output files are identical between the different compilers. Additional tests would be to use ImageMagic's command line to open a file, blur, sharpe, transform, rotate, sharpen, etc. it and save it as a png file.
These well known open source tools already compile in MSVC and will for a small effort provide many tens repeatable and useful unit/performance tests. These tests can be done for integer only operations or for operations using floating point.
These tests will also test out many different coding styles and internal program architectures as the open source converstion tools (e.g., imagemagick) have many different developers at different times working on them. The packages mentioned (lame, imagemagick, netpbm, libtiff/tifftools, mediainfo) all have well defined/simple command line interfaces and take standard file formats as imput. This should make them simple to script into unit/performance tests.
Greg, thanks for the pointers here. We are always looking for better Real-World performance sensitive code to use as tuning benchmarks.
While we do look at the spec benchmarks, we also look at a variety of other tests, benchmarks and metrics, including various Windows and SQL performance tests which validate a wide variety of coding constructs and features.
How much attention is being paid to the linker? The linker's inability to incrementally link with changes to static libraries is absolutely crippling, and the workaround of linking the .obj files both changes behavior and isn't that effective. I'm not sure that whether the compiler is 20% faster or slower matters that much when the link stage takes 40 seconds or more.
I agree with Phaeron.
I have to wait 60 minutes in release builds for link.exe to complete LTGC linking in vs 2005.
compile time could easily be reduced by parallelized builds with the /MP switch, link-times cannot.
asdf, please be aware that LTCG really does mean Link-Time-Code-Gen. We recompile all of the code (every single function), bottom up, at link time.
So any LTCG recompile, even if you just changed one function, is a clean compile and link.
LTCG generates much better code quality, but if you want a quick edit-compile-debug cycle, you'll need to turn off LTCG.
This blog is not the forum for support on bugs. Please see the Support Forum for assistance on general questions (under Helpful links to the left). Please report problems through Microsoft Connect (also under Helpful Links). For this particular instance we have replied privately to this poster as our goal is customer satisfaction.
I am curious regarding steps outlined under "To reduce noise on the benchmarking machines". I agree that we need to ensure that we are not reporting noise while generating nightly builds.
But in general VC++ used by devs is mostly the machine which has lot of background noise going on. So does your team have any data on what is the delta between clean and dev box? Spec benchmark are good but not sufficient when product is used by so many people to build applications:)
Ketan, this depends on how isolated the dev boxes are. And even if they do the same set of things that our system does, we have not really measured what the difference in noise levels are.