When I first joined the Concurrency Visualizer team, I thought, “Wouldn’t it be cool to make an application that could write messages on the Threads view?” I wasn’t alone, many of us had thought about it. It turns out to be not so hard.
James wrote about the threads view three weeks ago, so refresh your memory if you need to. On the Threads view, time is on the horizontal axis, and each thread in your application has its own channel on the vertical axis. By creating a set of threads and rigidly controlling their behavior over time, you can control the graphic on the Threads view. Our goal was to create a tool you can use to understand and optimize what your program is doing; the capability to draw pictures is just a fun consequence of accurate measurement.
Learning to write again
To write a letter, we need to first break it up into a grid, similar to the way letters map to pixels on a monitor. We produce a two dimensional array, were 1s represent the foreground, and 0s the background. Here’s the letter “H” in a 3x4 representation:
new int[][] {
new int[] { 1, 0, 1 },
new int[] { 1, 0, 1 },
new int[] { 1, 1, 1 },
new int[] { 1, 0, 1 }
}
We need to use the same number of threads as there are rows in the letter representation. There’s no restriction on how many columns a letter uses, as long as the array is rectangular.
For each row of a letter, let’s use the first row for example (1, 0, 1), we need to translate this sequence of numbers into a sequence of actions for the writer thread to perform. When a writer thread encounters a one, it will perform some spinning operation, attempting to keep the CPU busy for a fixed amount of time. We show execution as green on the timeline, so keeping the thread busy, as opposed to waiting or sleeping will result in a green block.
Stopwatch timer = Stopwatch.StartNew();
int sum = 0;
while (timer.ElapsedMilliseconds < Program.BlockWidth)
{
sum += sum * 50 % 17 << 3;
}
For a zero, a writer thread waits on a shared object using the Monitor class with a timeout equal to the block length (in milliseconds).
lock (this.LetterMonitor)
{
Monitor.Wait(this.LetterMonitor, Program.BlockWidth);
}
After a worker thread has completed a row, it waits on the same monitor object. All writer threads share the same monitor object, and it is used to synchronize all threads before a new letter is started. We use another thread to control the starting of writer threads. The monitor worker thread computes the length of each letter in time (Block Width * Letter Width), adds some spacing, and pulses the monitor object at those time intervals to signal the beginning of a new letter.
Putting it together

Hopefully you can see quite a lot of what I described above in this picture; I’ll point out some details.
Thread 8072 is our monitor worker thread, notice that it sleeps while each letter is being written, and some addition time for a space, before waking up, acquiring the lock and telling all of the other threads to get busy.
Notice that on the writer threads (4726, 3204, 8560 and 4056), the red segments are solid and contiguous, but the green areas are choppy, and mixed in with yellow. That’s the operating system at work, managing the execution of threads. When there’s a yellow segment in between green, it means we identified a time where the operating system put a thread on hold to let something else run for a while. I took this trace on a 4 core machine, which led me to the choice of letters 4-high. If I had chosen 8, there would be a lot more yellow, as the 8 writer threads fought over the cores.
Here’s a close up of the letter “H” from the same run:

Notice how my writer threads don’t start at exactly the same instant? Windows and CLR are doing something there to get my threads ready, but I don’t have any control over how long that takes. Also notice that the red segment in thread 4276 is longer than the others. The Concurrency Visualizer shows me that it’s a full 75 milliseconds longer than I wanted it to be.
Details like this led me to add the shared monitor and monitor worker thread. My initial attempt did not synchronize worker threads after they started, and some threads would get pretty far ahead of others, all for reasons that I couldn’t control. There are so many details to be concerned with in developing concurrent applications; we hope you can use the Concurrency Visualizer to diagnose your performance issues.
-Ryan
In addition to the concurrency visualizer feature that we have been blogging about here, there is great debugging support for parallel applications in Visual Studio 2010. I have created a lot of content for the parallel debugger windows (Parallel Tasks and Parallel Stacks) and have gathered all the links in one place.
Check out my blog post titled Parallel Debugging.
Cheers
Daniel
One reason to use the Concurrency Visualizer is to maximally utilize system resources. To aid in this effort, it displays the execution of the program as green segments in its timeline. However, the Visualizer does not distinguish between the user’s work and any other work in the process, so seeing a lot of green doesn’t necessarily mean that the program is working efficiently. For example, I can write a simple program that generates thousands of unnecessary objects, forcing the garbage collector to execute when I would expect my code to be running. The Concurrency Visualizer would display this overhead as green segments which I could mistake for my code’s execution. Consider a simple example where I (very inefficiently) create a list of the even numbers between 0 and n:
int n = 1000000000; ArrayList l = new ArrayList();
for(int i = 0; i < n; i++)
{
Object o = new { val = i }; // Allocate an object that wraps an int
if (i % 2 == 0)
l.Add(o); // Some longer living objects, objects not in l get collected
}
Let’s walk through the Concurrency Visualizer profile of this application.


For a single-threaded program, it appears to be quite efficient. It’s almost always running on one core, which means that it’s not blocking on synchronization objects, IO, etc. This is good! Perhaps we’d even consider writing a concurrent version to exploit the extra available resources on the machine. However, if we look at the Execution Report, we see that we’re really not executing my code:

Notice how much time is spent in garbage collections—about 77%! This is clearly not what we want our application to be doing, and had we not looked at the Execution Profile, we might have mistaken the execution on the timeline to be the execution of our code.
While the green shown in the CPU and Threads views is a quick way to gain insight, it is recommended that one also examines the Execution Profile to get the full picture.
-Matt
Welcome to fourth and final installment of the "beginner's guide" series. In my previous entry, I discussed the "Threads" view of the Concurrency Visualizer. In this entry, I will discuss the "Cores" view.
Using the same set of results as the previous entries, I now navigate to the "Cores" view to find this:
This view shows time along the x-axis and each logical core along the y-axis. While the CPU Utilization view provides no information regarding thread affinity, this view presents just that. The legend on the lower-half of the screen illustrates the mapping between colors and threads along with statistics which give insight into context switches. The chart on the upper-half of the screen tells you which threads were running on each logical core at any given moment in time. For example, the legend shows that thread with id 2880 is represented by purple. Looking at the chart above, I can see that this thread ran on logical core 1 for approximately 4 ms.
By comparing the thread ids listed in the table to the thread ids shown in the threads view, I can see that the four threads which I created rarely cross cores. In fact, only one of the four worker threads, with id 4480, crosses cores. It’s also interesting to see that the percent of context switches for this thread was only 5.62%. All of this is very useful information but doesn’t help me to investigate my assumption as to the cause of preemption. As shown in the CPU Utilization and Threads views, the preemption results from other processes using the CPU.
This view is particularly useful when managing your own thread affinity as it can help to diagnose thread scheduling bugs. If you aren't managing thread affinity, it can be informative (as we saw in this case) but may not always be actionable.
This concludes the "beginner's guide" series! I hope you found these helpful and I encourage you to go download Visual Studio 2010 beta 2 and try this tool out for yourself!
-James
Hello and welcome to the third installment of the "beginner's guide" series. While I discussed the "CPU Utilization" view last time, I will now discuss the "Threads" view in the profiler.
Working with the same code as in my last post, I will now re-examine the performance from a different perspective. I will take a look at each thread's execution history. Launching the performance wizard the same way as before, I now navigate to the "Threads" view to find this:

Each horizontal line represents either an "I/O Channel" or a thread. In this case, the top two represent I/O channels while the rest represent threads. The left column labels each thread by name and thread id. There are nine threads represented on this chart (along with two I/O channels). I am interested only in the main thread and the last four worker threads, which I spawned to do all the busy work. I would therefore like to remove the unimportant threads (and the I/O channels) from view. I can hide threads from view by right-clicking and selecting "hide". After removing from view all uninteresting threads, I am left with this:

Now, to make this clear, this chart shows the state of each thread throughout the lifetime of its execution with time increasing along the x-axis. Upon examining the Visible Timeline Profile, I can see that since the main thread is mostly red, it is spending the majority of its time synchronizing. I can also see that the four worker threads are mostly spent on execution. This makes sense when considering that I wrote my application such that the main thread spawns four worker threads to do all of the computation. Meanwhile, the main thread simply waits for them to finish via a "join" call. While we're on the subject of the Visible Timeline Profile, I'd also like to point out that the statistics associated with each thread state are now updated since I hid the other threads. They will also update when you pan or zoom in and out. These updated statistics, which reflect all unhidden threads in the visible time range, can become much more meaningful when you hide or zoom to isolate specific areas of interest.
It's also apparent that the four threads eventually finish executing and disappear from the chart. At the moment that the last worker thread finishes, the main thread resumes control.
It is also worth noting that the main thread sleeps (colored blue) briefly. This mostly results from the way the OS and CLR handle thread creation and scheduling, which is beyond the scope of this post.
From this view, it is clear that multiple threads are executing concurrently but despite having an identical work-load, we can see that they don't all finish at the same time. I suspect this results from the visible preemption from other processes, colored yellow (likely my browser, Outlook, and others). Different amounts of preemption for each thread causes them to finish at different times.
There is much more to talk about here but in the interest of keeping this post relatively short, I'll leave the rest for you to discover (go download beta 2 and try it out for yourself!). Once again, the Concurrency Visualizer has painted a very informative and accurate picture of my app's multithreaded behavior!
-James
For those of you who have watched my
first screencast, you may be interested in taking a look at this one
this one, which dives deeper into the Threads View of the Concurrency Visualizer.
-James
In my previous post, I described how to profile your multithreaded application using the Visual Studio Performance Wizard. As I mentioned, the summary report presented after profiling yielded three options. In this entry, I will discuss the "CPU Utilization" option. After clicking on this, the following screen appears:

This graph displays time along the x-axis and the number of logical cores along the y-axis. The legend explains what each color represents. The chart is colored mostly green, which represents my app's execution. The higher along the y-axis the green extends, the more CPU power was used at that point in time by my app. It is worth mentioning that the logical cores listed along the y-axis do not map to physical cores - this graph speaks nothing to thread affinity. For this, use the "cores" view.
This chart also indicates a noticable portion of CPU power used by other processes, colored yellow. This can provide a bit of insight into how other processes on the system are affecting your app's behavior. Based on this view alone, it's hard to draw any conclusions with perfect confidence. In the absence of other processes, I would expect my app to use 100% of the CPU the entire time before abruptly finishing. It appears that other processes prevent my app from using all of the CPU power, especially towards the beginning of its execution. I suspect that the "tapering" seen starting near four seconds results from some of the threads finishing earlier than others (which I wouldn't expect given the identical work-load). I can't be sure, however, until I look at the "Threads" view (described in my next post).
Overall, this profiling experience has been a success. By observing my app's CPU utlization, I have gained a lot of insight into its behavior. At the very least, it is using more than one core!
-James
Hello world! This is the first of four "beginner's guide" installments. Though individiuals more familiar with concurrency issues may find it easier to get started with our tools, my team and I are interested in making it easier for a broader range of developers to quickly get started with concurrency visualization.
This first post will highlight the basics: how to use the Performance Wizard to profile a mutlithreaded application built within Visual Studio. The next three installments will dig in to the basics of the Concurrency Visualizer. Let's get started!
I won't focus on the details of my app but it's worth mentioning that I parallelize my code by manually creating four threads (equal to the number of cores on my machine) and I have each thread operate on one fourth of the problem. To be sure that this actually executes in parallel, I want to profile my solution.
To do this, I start Visual Studio as an administrator. This is necessary because the Concurrency Visualizer enables kernel-level logging, which requires administrator privileges due to the low-level information exposed. Though there are two ways to get to the Concurrency Visualizer, I will describe how to access it via the Debug Menu (see my screencast for the other way). Once I have my solution ready to run, I click "Start Performance Analysis" under the Debug Menu.

When prompted for a profiling method, I select "Concurrency" and check the box that says "Visualize the behavior of a multithreaded application".

I then click "next" and get to the page shown below. Though I have the option of profiling an executable, I want to profile the solution I currently have open in Visual Studio. By default, my solution is highlighted so I just click "next".

The last page in the wizard asks if I want to launch profiling after the wizard finishes. By default, this box is checked because users will most often want to do this. This is exactly what I want to do so I simply click "finish".

After waiting for a few seconds for the analysis to complete, I am presented with the summary report:
The summary report varies depending on the selected profiling options. In this case, the summary report presents me with three options. Each of these gives you a different way of viewing the multithreaded behavior of your application. I will give a more detailed overview of each of these three options in the following posts.
-James
If you haven't had the chance to check out channel 9, you're missing out.
Our team has put out two videos since the launch of beta 2; I highly recommend you take a look at them when you get a chance. In this screencast, running just under 9 minutes, I give a brief overview of the new profiling capabilities coming to Visual Studio 2010. If you have about 45 minutes free, make sure to take a look at the team interview where members of my team dive deeply into parallel performance profiling advancments.
For a really nice overview, read John Robbins' Blog describing the concurrency visualizer in VS2010.
-James
We are very excited to further improve the parallel capabilites of the Visual Studio 2010 profiler with the release of beta 2. One of the features that we worked on, "demystify", allows you to click anywhere within the tool and get a customized help page explaining what you clicked on. Unfortunately, this didn't make it into beta 2 but it will be available in the final release version of Visual Studio 2010. For now, you can click the image below to preview it via your browser! Click anywhere on the image map for an explanation of the specific feature. You can also switch views by clicking on the three tabs in the upper-left corner of the image in addition to the four tabs on the lower-half of the image. Click below to check it out!
-James
Welcome to our new blog!
Many of us on the Parallel Developer Tools team wanted to start this blog in hopes of helping developers tackle the difficulties inherent to parallel programming. We think that analysis tools are especially valuable in this space. We hope this blog will become a valuable resource to anyone trying to analyze the behavior of concurrent programs, to see what really happens under the covers.
As for me, I’m a program manager focusing on parallel profiling capabilities. You will get to know other members of the team as they make posts.
Stay tuned here for regular updates!
-James