Recently, many people have asked me about the performance of Linq. The questions have ranged from the broad, "How can I analyze the performance of code using Linq?", to the very specific, "Why is the performance of this code sample not what I expected?"
I want to address these questions. So before I dive into some of them individually, I want to set up a framework for discussions about performance. The framework consists of ideas that are well known and understood and yet often when people talk or think about performance they forget about them.
Performance is critical to the success of any application and engineering performance is a fundamental part of what developers should be doing on a day to day basis. Great performance doesn't just happen. It must have requirements, be designed, implemented, and tested just like any other feature. Individuals and teams must foster a culture of performance.
Rico Mariani wrote a great post about why performance must be measured in context. That means that performance goals should be made in relation to what the user sees and cares about. Then when possible these goals should be translated into more low-level goals that are easier to quantify and directly measure, but never take your eye of the original goals. Many times when performance is tuned in one dimension it can degrade in other dimensions. It is like trying to cram a whole bunch of stuff into a tiny space, you may push it in here but it pops out over there. So how does a programmer know which dimension is most important? Consider the performance in context. Furthermore, the performance tuning that a programmer makes might not even matter when considered in context. So spend your time where in counts. Finally, it is much easier to consider trade-offs between performance requirements and other requirements when the performance requirements are in context. Set up a performance budget in terms that are material and important to the customer.
Designing for Performance
Designing for performance is important. Many performance problems can be fixed as they are identified later, but some are deeply rooted in poor design choices that were made early on that are difficult to resolve later. However, care must be taken because often these poor design choices were made for the sake of "better performance". For example, in an app where there is a lot of data parallelism, it doesn't make a lot of sense to make it so complex with stateful operations that it precludes concurrency.
Most of these poor design choices are caused by the design not solving the right problem (performance requirements) or because the requirements changed over time and the earlier design made it difficult to adapt to the new requirements.
There are at least two basic tools for understanding performance.
1. Asymptotic Analysis
I am sure that my readers know and love asymptotic analysis. It is so important that developers know what they are doing when they call a System.* function, which has an O(n) time complexity, n times -- O(n^2) behavior. Know what is going on in terms of both space and time.
2. Benchmarks / Profilers
So often when a performance problem exists, developers will convince themselves that they know what the problem is without accurately measuring the behavior of the system. They will proceed to "fix" the problem only to discover that what they thought would be of great benefit is in fact unimportant.
I've done that. When C# 1.0 was in beta back in 2000, I was working at a small company that decided to try out this new .NET stuff. My first application was a good sized application that including a custom scripting engine inside of it. The performance wasn't great and so I set about fixing it. I was sure that I knew what the problem was and so I began improving some algorithms in various complex parts of the system. But I found that it didn't help matters at all.
At this point, I broke out a profiler and began measuring the actual behavior of the system. The suprising result (at the time) was that pretty much all of the performance problems came from a small loop that was building up the text buffer for the scripting engine. It was using the + operator on strings. Since this was my first .NET application, I had no idea that concatenating hundreds of thousands of strings was a bad idea.
When I changed the code to use a StringBuilder instead, it sailed. I made a few other improvements (all of which were targeted and very small), and the application was running fine.
Now the point of all of this is not that the + operator on strings should not be used. If that were true then we would make the C# compiler issue a warning when you used it. The point is that a programmer should be aware of the costs and tradeoffs involved with a programming decision and act accordingly. Measurement is a powerful tool. Knowing where the problems lie is the key to success. Measure early and often.
The result of profiling is a large set of various statistics about an application. If the measurements are taken in context then they are a powerful tool. As with any statistics, what is being measured is as important as the actual measurement.
Analyzing an application is very rewarding. You are able to see the material benefits of any improvement in a very quantitative way. Apply the scientific method in the process of analyzing a problem. Once you suspect you know what the problem is then consider it the hypothesis and then prove it by observing the data (profiling data) from an operation (the suspect code) across various trials.
Try to think big and understand not only what is the problem but why it is a problem. The best solutions attack not only the symptoms but the fundamental causes of the problem. Look to understand the nature of the problem.
Some Good Performance Resources
Rico Mariani's (Visual Studio Performance Architect) Blog
Vance Morrison's (CLR Performance Architect) Blog