Colin Thomsen's Microsoft Blog

I'm a developer working on the code profiler that ships with Visual Studio 2010 Premium and Ultimate editions. At a previous company I worked on computer vision software for face and gaze tracking.

  • Colin Thomsen's Microsoft Blog

    Link: Beginners Guide to Performance Profiling


    The Visual Studio 2010 MSDN documentation includes some more detailed examples (including screenshots) than previous versions. Here's a decent intro to profiling:
    Beginners Guide to Performance Profiling

  • Colin Thomsen's Microsoft Blog

    Visual Studio 2008, Beta 2 (now with some of my code)


    Today we released Beta 2 of VS2008. This is the first public release from Microsoft that contains a nontrivial amount of code that I wrote (even though I haven't written too much code just yet). I had barely synched up the source tree and only fixed a couple of bugs when we released Beta 1 but now I've found my feet and am contributing more.

    The major release announcements have focussed on the flashier (and admittedly very cool) aspects of the Beta like LINQ and some of the HTML editing and Javascript debugging features. However, us Profiler folks have also been toiling away adding new features and fixing bugs. Look out for things like (and some of these already featured in Beta 1, but they just keep getting better):

    • A promotion to the new Developer menu
    • Hot path - find the critical path/paths through your call trees
    • Noise reduction - trim and/or fold your call trees so that they are easier to examine. See above for folding example.
    • Comparison reports - compare subsequent profiler runs to determine if code changes are improving performance
    • x64 OS support - profile on x64 Vista or W2K3 server 

    If you can, please download it and let us know what you think. If you don't have the time at least take a look at the overview video showing some of the major features. You should also check out Ian's entry about controlling data collection while profiling. Hopefully I'll have time to go through some of the new profiler-specific features soon.

  • Colin Thomsen's Microsoft Blog

    Basic Profiler Scenarios


    This post was going to cover some basic scenarios discussing the differences between sampling and instrumentation and when you would choose to switch methods, but then I found there is already something like that in MSDN. If you haven't already, go and take a look. See if you can improve the performance of the PeopleTrax app.

    Instead I'll discuss sampling and instrumentation from a user's perspective. There are already many definitions of sampling vs instrumentation so I won't repeat them.

    For some background reading on the sampling aspect, take a look at David Gray's post. There are a few things that he hasn't covered in that post. The main question I had was should I use sampling or instrumentation?

    A generic answer to that would be:

    • If you know your performance problem is CPU-related (i.e. you see the CPU is running at or near 100% in task manager) then you should probably start with sampling.
    • If you suspect your problem may be related to resource contention (e.g. locks, network, disk etc), instrumentation would be a better starting point.

    Sometimes you may not be sure what type of performance issue you are facing or you may be trying to resolve several types of issues. Read on for more details.


    Why use sampling instead of instrumentation?

    Sampling is lighter weight than instrumentation (see below for reasons why instrumentation is more resouce intensive) and you don't need to change your executable/binaries to use sampling.

    What events do you sample with?

    By default the profiler samples with clock cycles. This should be familiar to most users because they relate to the commonly quoted frequency of the machine. For example, 1 GHz is 1 billion clock cycles / second. If you use the default profiler setting for clock cycles that would mean 100 samples every second on a 1 GHz machine.

    Alternatively, you could choose to sample using Page Faults, which might occur frequently if you are allocating/deallocating memory a lot. You could also choose to profile using system calls or some lower level counter.

    How many samples is enough to accurately represent my program profile?

    This is not a simple question to answer. By default we only sample every 10000000 clock cycles, which might seem like a long time between samples. In that time, your problematic code might block waiting on a lock or some other construct and the thread it is running in might be pre-empted allowing another thread to run. When the next sample is taken the other thread could still be running which means the problematic code is not included in the sample.

    The risk of missing the key data is something that is inherent in any sample-based data collection. In statistics the approach is to minimize the risk of missing key information by making the number of samples large enough relative to the general population. For example, if you have a demographic that includes 10000 people, taking only 1 sample is unlikely to be representative. Taking a sample of 1000 people might be considered representative. There are more links about this on Wikipedia.

    Won't this slow down my app?

    No, not really. When a sample is taken the current thread is suspended (other application threads continue to run) so that the current call stack can be collected. When the stack walk is finished, execution returns to the application thread. Sampling should have a limited effect on most applications.

    Sounds good, why use instrumentation?

    See below.


    Why use instrumentation?

    As discussed above, sampling doesn't always give you the whole picture. If you really want to know what is going on with a program the most complete way is to keep track of every single call to every function.

    How does instrumentation work (briefly)?

    Unlike sampling, with instrumentation the profiler changes the binary by inserting special pieces of code called probes at the start and end of each function. This process is called 'instrumenting the binary' and it works by taking a binary (dll or exe) along with its PDB and making a new 'instrumented binary'. By comparing a counter at the end of the function with the start, it is easy to determine how long a function took to execute.

    What if I call other people's code?

    Usually you don't have access to the PDB files for other people's code which means you can't instrument it. Fortunately as part of the instrumentation process the profiler inserts special probes around each call to an external function so that you can track these calls (although not any functions that they might call).

    Why not just use Instrumentation all the time?

    Computers execute a lot of instructions in 10000000 clock cycles, so using instrumentation can generate a LOT of data compared with sampling. The process of calling the probe functions in an application thread can also degrade performance more than sampling would.

  • Colin Thomsen's Microsoft Blog

    Why Performance Matters


    Everybody likes to think that what they're working on is important so this is why I think performance (and measuring performance) matters and why it matters now more than ever.

    In the past chip manufacturers like Intel and AMD did a lot of the performance work for us software guys by consistently delivering faster chips that often made performance issues just disappear. This has meant that many developers treat performance issues as secondary concerns that will get better all by themselves if they wait long enough. Unfortunately free lunches don't last forever as Herb Sutter discusses.

    Chip manufacturers can no longer keep increasing the clock speeds of their chips to boost performance so they are reducing the clock speed and increasing the number of cores to take computing to new levels of energy-efficient performance. The result, which is discussed on one of Intel's blog's, is that some applications will actually run less quickly on newer multicore hardware. This will surely shock some software consumers when they upgrade to a better machine and find it running software slower than before.

    Clearly we need to change our applications to allow them to take advantage of multiple cores. Unfortunately this introduces a lot of complexity and there are many competing opinions about how we should do this. Some seasoned developers are pretty negative about using multithreaded development in any application. Some academics suggest that we use different language constructs altogether to avoid the inherent nondeterminism associated with developing using threads.

    Consider also that the types of applications we are developing are also changing. Sure, the traditional rich-client applications are still very popular, but there is also a demand for light-weight web-based clients that communicate with a central server (or servers). The software running on the servers must cater for many users and will have very strict performance requirements.

    So how does all this fit in with performance measurement? Well now developers have to write concurrent applications that are difficult to understand and develop. They write web-delivered applications that must respond promptly to many concurrent users and it is unlikely that just upgrading hardware is going to fix any performance problems that crop up. The free lunch is basically over, so now we have to pay. One way to minimize the cost of our lunches is to be able to 'debug' or resolve dynamic software issues using a profiler just like you would use a debugger to fix issues with program correctness.

    One of the main benefits of using a profiler instead of manually inspecting the code is that it avoids the 'gut feel' approach to performance optimization that is common. For example, a developer sees a loop like:

    for (int i=0; i < some_vector.size(); ++i)

    So they decide to optimize by making a temporary so that the size() function doesn't get called for every iteration of the loop:

    const int some_vector_length = some_vec.size(); 
    for (int i=0; i < some_vector_length; ++i)

    The number of lines of code has now increased by 1. If the length of the vector is always small, it is unlikely this buys much in the way of performance. Even worse, a developer may start to do things like loop unrolling when the real cause of the performance problems is something they don't notice. As the complexity of the code goes up the maintenance costs increase. If a profiler were used, it would be much easier to isolate the cause of a performance problem without wasting time optimizing code that is barely impacting the performance of an application.

    Before I get too carried away, I should clarify that performance matters, but only if the performance is poor. For example, if you're working on a User Interface (UI), according to Jakob Nielsen if the response time to a user action is less than 0.1 seconds, the user will feel that the system is reacting immediately to their action. If you're working on a computer game the performance requirement might be that the frame rate must be at least 30 Hz. In both of these cases the user will notice if the performance requirement isn't met, but they will probably not notice or care about performance if the performance requirement is met.

    If you haven't used a profiler before, go and try out a Community Technology Preview (CTP) of Orcas which will be the next version of Visual Studio. For the full experience you should avoid using the VPC images which have reduced profiler functionality. Some day, maybe soon if not already, you'll have to fix a performance problem with your code and using a profiler might help.

  • Colin Thomsen's Microsoft Blog

    Performance: Find Application Bottlenecks With Visual Studio Profiler


    If you're a subscriber to msdn magazine, take a look at the article in the March 2008, Vol 23, No 4 issue on Page 81 which describes how to use the Visual Studio 2008 profiler to improve the performance of an application. A couple of members of the profiler team examine a Mandelbrot fractal drawing program in some detail. They isolate and fix several performance problems in the code, speeding up program execution approximately tenfold.

    UPDATE: You can read the article here.

  • Colin Thomsen's Microsoft Blog

    Sysinternals is Live


    I use a bunch of Sysinternals tools for diagnosing problems while developing. My two favorites are:

    • Process Explorer, a more fully-featured version of Task Manager that can report environment variables for running processes, show loaded DLLs and even display callstacks. It can also tell you which process is currently accessing a certain file or DLL, which is useful if you're trying to delete a file and getting a 'file is in use and cannot be deleted' error.
    • Process Monitor, which can record all accesses to files, disks and the registry. Very useful for diagnosing complicated scenarios with multi-process development.

    Recently the Sysinternals tools have been hosted on a new live site that can be accessed via the web, or as a file share. Now I can easily run a Sysinternals tool and be sure that it is the newest version:


    I can also update my own local cache of useful tools by periodically copying from the file share.

  • Colin Thomsen's Microsoft Blog

    Tech-Ed 2007


    Tech-Ed 2007 is starting tomorrow and the Profiler Team is sending a few people to sunny Orlando for the event. This is great news for me because my boss, Steve Carroll, is away for the week (just kidding Steve), but it is really great news for folks at Tech Ed because he'll be there presenting with Marc Popkin-Paine:

    DEV313 - Improving Code Performance with Microsoft Visual Studio Team System  [N210 E]

    June 07

    9:45 AM

    11:00 AM

    I believe they'll be demoing a few new Orcas features and giving a pretty good introduction to profiling. If you didn't know Visual Studio Team System has a profiler, or you don't think performance is important, you should definitely check this out.

    If you're not lucky enough to be able to make it to Orlando this year, be sure to take a look at Virtual Tech Ed, which will include webcasts and other content from some of the sessions. One that jumps out at me is MSDN Webcast: A Lap around Microsoft Visual Studio Code Name "Orcas" (Level 200).

    UPDATE: Steve is already helping people at Tech Ed. If you're there and you're interested in performance go and have a chat with him in the Technical Learning Center.

  • Colin Thomsen's Microsoft Blog

    VS2010: Just My Code


    The ‘Just My Code’ feature in the profiler has a few differences to the ‘Just My Code’ feature in the debugger so this post should provide a useful introduction.

    Example Program

    Here’s a very simple program I’ll use in this post.

    using System;
    namespace ConsoleApplication1
        class Program
            static void Main(string[] args)
            private static void Foo()
                double d = 0;
                for (int i = 0; i < 100000000; ++i)
                    d += Math.Sqrt(i);


    Why ‘Just My Code’?

    Typically when profiling you are most interested in optimizing code that you either wrote or you have control over. Sure, sometimes there will be issues in the frameworks that you are using or in other binaries, but even then you often control the calls into those frameworks.  Just My Code or JMC is intended to filter the data that is displayed in profiler reports so that more of the code you control shows up in the reports and the reports are more manageable.

    For example, the Call Tree after collecting sampling data for the simple program above, with JMC off, is shown below:

    With the default JMC options, this reduces down to:

    What is ‘My Code’?

    There are two conditions for code being considered ‘My Code’ by the profiler and they are both at the Module level (Module Name column in the screenshots above). In the example above, this means the checks are made against the clr.dll, mscoreee.dll, mscoreei.dll and ConsoleApplication1.exe binaries.

    Modules considered ‘My Code’:

    1. the copyright string for the module does not contain ‘Microsoft’, OR:
    2. the module name is the same as the module name generated by building any project in the currently open Solution in Visual Studio.

    How do I turn JMC on or off?

    You can temporarily toggle JMC on or off on the profiler Summary Page in the Notifications area using ‘Show All Code’ or ‘Hide All Code’ (shown in red below):

    The default setting may be configured as discussed in the following section.

    How do I configure JMC?

    Use Tools –> Options –> Performance Tools –> General and set options in the ‘Just My Code’ section:

    The default has JMC on, showing one level of non-user callee functions. In the example above with JMC on, this is why we see the call to COMDouble::Sqrt(dobule) showing up in the call tree.

    It is also possible to show one-level of non-user code calling user code, which in the example above would add one level of the non-user code that calls main, as shown below:

    Why is ‘Just My Code’ only available for sampling?

    When you instrument binaries for profiling, you have already performed some level of JMC. Only binaries that you instrument and first-level calls into other binaries will show up in the instrumentation report, so JMC is not really necessary.

  • Colin Thomsen's Microsoft Blog

    Tools of the Trade


    I've been thinking about what some of the most important tools are for me while coding. Here's a few:

    • Good IDE - syntax highlighting, integrated builds, source control integration, search facility, debugger and profiler built-in. I use VSTS.
    • Source control/bug tracking system. I use TFS (typically a dogfood version of TFS).
    • Windows Task Manager.
      I use task manager to:
      - View CPU usage
      - Kill processes
      - Start a new explorer.exe if I ever kill explorer.exe. Do this from the Applications tab.
    • Process Explorer.
      I use Process Explorer like task manager, but it can also:
      - Find what process has a handle (e.g. a file) open. This can be handy if you want to delete a file but a process has it locked.
      - Find out what DLLs a process has loaded
      - List the environment variables for a running process
    • Process Monitor
      I use Process Monitor to record file, registry and process activity. This is very useful when debugging issues in complex programs like VSTS which have a lot of registry interactions.
    • DebugView
      Display debugging output from programs without having to attach a debugger. This is very useful if you want to run your program outside a debugger and still want to see all those debug prints.
    • Media Player. I like to listen to music while I code.
    • Outlook. It is somewhat sad that I spend a fair percentage of my day reading emails, scheduling or checking up on meetings and writing notes in an Outlook journal, but I do and so I have Outlook open all the time.
    • Internet Explorer. I need to use MSDN a lot and do web searches. I also read RSS feeds of relevant blogs with IE.
    • Regedit.
    • Remote Desktop. I work on different machines pretty regularly and Remote Desktop makes switching between machines easy.
  • Colin Thomsen's Microsoft Blog

    Remote Debugging


    Every so often, on days like today, I need to debug on a machine where I don't have a debugger installed. There are a few options in this case, but one of the most convenient of these is to use Visual Studio's remote debugging facility.

    The procedure is pretty convenient with Visual Studio 2010, since all you have to do is:

    1. Log into the machine you wish to debug on (you can use TS). In my case let's use msl-1440087.
    2. Make sure you can navigate to a share on the machine with the debugger installed (galaxy-dev for example).
    3. Run the right version of msvsmon.exe from the share, for example, the 32 bit version, as follows:
      > "\\galaxy-dev\c$\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\IDE\Remote Debugger\x86\msvsmon.exe"
    4. Go back to the machine that the debugger is installed on (galaxy-dev) and choose 'Tools -> Attach To Process...'
    5. Enter the machine name where you started msvsmon.exe in the Qualifier section.
    6. Select the process you wish to debug,
    7. Choose the type of debugging - Native or Managed etc.
    8. Select 'Attach'.

    Debug as you usually would. Thanks to Gregg for writing a blog post reminding me how to do this. You might also be interested in his post about Remote Debugging without domain accounts.

Page 3 of 4 (38 items) 1234