Everyone appreciates a fast and responsive UI, and Visual Studio is no exception. Extensions that run in Visual Studio play a significant role in how responsive the IDE will be for its users. Visual Studio has been evolving over the past few cycles to not only improve performance, but also responsiveness during operations that may take a while to execute, offering cancellation or the ability to run these operations in the background while you can interact with the IDE in the meantime.

IDE responsiveness during long-running operations requires these operations to be written asynchronously or off the UI thread, which can be challenging. Although it might be easy to write and maintain async code that uses the C#/VB async keyword for responsiveness during these long running operations, doing so can cause deadlocks if that async code is ever called by a method that must synchronously block until the async work has completed. For example, code as simple as this would deadlock if run on the UI thread of any GUI app:

It can be very tempting to write code such as the above so that you can call DoSomethingAsync() most of the time to provide a responsive UI, but call DoSomething() when you have to do it synchronously. In fact completing something synchronously is quite often necessary in VS to satisfy old IVs* interfaces that were not designed with async in mind. So how do you write async code that won't deadlock when it must synchronously block the UI thread?

In this post, we outline modern guidelines for Visual Studio 2013 for managed code developers writing VS extensions regarding the use of async and multi-threaded code that avoids pitfalls such as the one above. We'll start with a short history lesson in COM that may help explain why the above code deadlocks. Then we prescribe the tools and coding patterns to use to avoid these pitfalls.

See also MSDN on this topic: Managing Multiple Threads in Managed Code

A small history lesson in COM thread marshaling

With Visual Studio 2010 came the introduction of significant chunks of managed code to the Visual Studio product itself. The Visual C++ project system was the first project system to be (mostly) rewritten in (ironically) managed code. This was also the version when the text editor was rewritten in managed code. Since then Solution Explorer has been rewritten in managed code and the Javascript project system was introduced as an all-managed project system.

With that managed code came subtle but important differences in how services behaved and interacted, notwithstanding backward compatibility being a firm pillar. Relevant to this post are differences in how threading rules between components were implemented.

When everything was C++ native code, COM ensured that almost everything happened on the main STA thread (i.e. the UI thread). If code running in another apartment (e.g. a background thread) called any of these COM components the background thread would block while the call was re-issued on the main thread. This protected the COM component from having to deal with concurrent execution, but left it open to reentrancy (being invoked while in an outbound call). This technique worked whether the caller was managed (automatically) or native code (via the proxy stub that COM would generate for the caller).

When those same COM components were rewritten in managed code (C# in most cases) some of these automatic thread marshaling behaviors became less certain. For instance, if native code called the rewritten managed COM component, then it would execute on the main thread since the native caller was typically already on the main thread, or since native code calls managed code through a COM proxy the call could get marshaled. But if the caller and new service are both written in managed code, the CLR removes the COM marshaling interop boundary between the two components to improve performance. This removal of the interop boundary meant that any assurance that the managed COM service might otherwise have of always executing on the UI thread was no longer guaranteed. As a result, the conscientious managed code developer writing VS components should either write thread-safe code or be sure that every public entrypoint marshals to the UI thread explicitly before invoking any internal code to help assure thread-safety.

When everything was COM written in native code, marshaling to the UI thread was done by posting a message to the windows message queue for the main thread and then blocking the calling thread until the call completes. The main thread would pick up the message in normal course of its message pump, execute the code, and then return to the message pump. In some cases the main thread was busy, and these messages would just wait until the main thread returned to its message pump. In a few cases, this work on the main thread was actually blocking the main thread from returning to its message pump waiting for some background thread to complete its work, which in turn was blocked waiting for the main thread to do some work. This deadlock would be broken by the main thread waiting using CoWaitForMultipleHandles, which ran a filtered message pump that only processed messages with a matching "COM logical thread ID" that let it know it was related work and presumably necessary to execute to avoid deadlocks.

Switching to the UI thread

When managed code needs to marshal a call to the UI thread in Visual Studio, ultimately the same approach would be taken as in native code. If you're on a background thread, switching to the UI thread required adding a message to the message queue and (usually) waiting for it to be executed and handling the result. But at an actual coding level this tended to surface in either of two ways: SynchronizationContext.Post for asynchronous invocation, or relying on a truly native COM component to marshal the call to the UI thread and then call the managed code back from the new thread.

In fact one of the simplest ways of getting to the UI thread for a managed develop in VS has been to use ThreadHelper.Invoke. Internally this uses the method of calling a native COM service in order to get to the UI thread and then it invokes your delegate.

The problems start when the deadlock resolving code kicks in. The COM logical thread ID doesn't automatically propagate for managed code like it does for native code. So the VS filtered message pump doesn't know which marshaling messages to execute when the main thread is blocked in managed code in order to avoid deadlocks. So it lets them all in. Well, almost. Posted messages (in the SynchronizationContext.Post sense) don't get in, but all the "RPC" level marshaling calls do get in regardless of their actual relevance to what the main thread is waiting on.

The one or two fundamental ways to get to the UI thread notwithstanding, there were at least a dozen ways to get to the UI thread in VS (each having slightly different behaviors, priorities, reentrancy levels, etc.) This made it very difficult for code to choose which method was appropriate, and often required that the code had complete knowledge of what scenario it was called in, which made it impossible to get right when the same code executed in multiple scenarios.

Reentrancy

Because of this nearly wide open policy for executing code from other threads, an evil we call "reentrancy" occurs when the main thread is blocked waiting for something and something unrelated from another thread jumps in and begins executing unrelated work. The main thread may have been blocked on anything (e.g. it could be a contested lock, I/O, or actually a background thread) and suddenly it's executing something completely unrelated. When that work never calls back into your component, the problem is merely annoying and can slow down your own code because it can't resume execution until the offending party gets off your callstack. But if that work eventually calls into the same component that it interrupted, the results can be devastating. Your component may be 'thread safe' in the sense that it always executes on the UI thread, but reentrancy poses another threat to your data integrity. Consider this case:

Code inspection may not suggest that this code is vulnerable to threading issues. But the call to File.Open may result in the main thread blocking. During this block an RPC call may re-enter the UI thread, and call this same method. This second execution of the method will also satisfy the file open count test and start opening a file. It will assign the result to the last element in the array, increment the field, and exit. Finally, the original call (that was interrupted) will finish its File.Open call, and then throw when assigning the result to the array since m_filesOpened is now out of bounds (beyond the last element of the array). This is remarkably similar to multi-threaded concurrency issues, but remarkably can be reproduced even though your method only ran on the UI thread, which allowed reentrancy.

Product crashes and hangs can occur due to reentrancy when code wasn't prepared for it due to the data corruption reentrancy can cause. And these symptoms often are detected long after the reentrancy has occurred, making it very difficult when you're analyzing the results of the devastation to figure out how it was introduced in the first place.

Lions and Tigers and Bears, oh my!

So we have two evils: deadlocks and reentrancy. Without reentrancy to the main thread we have deadlocks when the main thread is blocked on background threads that are in turn blocked on the main thread. And with reentrancy we tend to get too much reentrancy leading to corruption, crashes and hangs. As early as Dev10, VS architects would meet to discuss the current 'balance' between reentrancy and deadlocks to resolve some of the ship-blocking bugs that would plague that version of VS.

Avoiding this reentrancy involves turning two knobs: what kinds of messages are allowed through the message filter and/or the priority those messages themselves come in with. Letting fewer messages in tends to create deadlocks, whereas letting in more messages tends to create more reentrancy with their own crashes and hangs. Since adjusting the overall policy in the message filter was rife with dangers on either side, each cycle we tended to fix the major bugs by some localized code change that relied heavily on the other players' current behavior and was therefore fragile – leading to yet another meeting and code change later that cycle or in the next one.

Clearly there was a need for a systemic fix. The native COM logical thread ID was attractive but wasn't an option for managed code as far as we could see.

Asynchronous programming in managed code

Opportunities to write async code within VS tended to be few and far between, since most of the time the code implemented some IVs* interface that had synchronous method signatures such that postponing the work till later isn't an option. In a very few cases (such as async project load) it was possible, but required use of the VS Task Library which was designed to work from native code rather than something that felt more C# friendly. Use of the managed-friendly TPL Tasks library that shipped with .NET 4.0 was possible in some cases, but often led to deadlocks because TPL Tasks lack of any dependency chain analysis that would avoid deadlocks with the main thread. While the VS Task Library could avoid the deadlocks, it demanded that the code be VS specific instead of being rehostable inside and outside Visual Studio.

In more recent history a new pattern emerged in code that ran in the Visual Studio process: use of the C#/VB async keyword. This keyword makes writing asynchronous code simple and expressive, but didn't work at all with the VS Task Library. Since so much code has to execute synchronously on the main thread in order to implement some IVs* interface, writing asynchronous code usually meant you'd deadlock in VS when the async Task was synchronously blocked on using Task.Wait() or Task.Result.

Introducing the JoinableTaskFactory

To solve all of these problems (deadlocks, reentrancy, and async), we are pleased to introduce the JoinableTaskFactory and related classes. "Joinable tasks" are tasks that know their dependency chain, which is calculated and updated dynamically with the natural execution of code, and can mitigate deadlocks when the UI thread blocks on their completion. They block unwanted reentrancy by turning off the message pump completely, but avoid deadlocks by knowing their own dependency chain and allowing related work in by a private channel to the UI thread. Code written with the C# async keyword also (mostly) just works unmodified, when originally invoked using the JoinableTaskFactory.

So in essence, we've finally solved the classic deadlock vs. reentrancy problem. There is now just one way we recommend to get to the UI thread that works all the time. And you can now write natural C# async code in VS components so by leveraging async we can improve responsiveness in the IDE so developers don't see so many "please wait" dialogs. Goodness.

Let's look at some concrete examples. Please note that most of these examples requires that you add a reference to Microsoft.VisualStudio.Threading.dll and add the following line to your source file:

using Microsoft.VisualStudio.Threading;

Switch to and from the UI thread in an asynchronous method

Notice how the method retains thread affinity across awaits of normal async methods. You can switch to the main thread and it sticks. Then you switch to a threadpool thread and it likewise sticks. If you're already on the kind of thread that your code asks to switch to, the code effectively no-ops and your method continues without yielding.

The implementation of async methods you call (such as DoSomethingAsync or SaveWorkToDiskAsync) does not impact the thread of the calling method. For example suppose in the sample above, SaveWorkToDiskAsync() was implemented to switch to the UI thread for some of its work. When SaveWorkToDiskAsync() completes its work and PerformDataAnalysisAsync() resumes execution, it will be on the same type of thread it was before, which is the threadpool in our case. This is very nice for information hiding. When writing async code, you can use whatever threads you want, and your caller needn't be aware of or impacted by it.

At this point we recommend writing async code whenever you have an opportunity to. If you're writing a method that does I/O (whether disk or network access), take advantage of .NET's async APIs. Obviously if the main thread is doing async I/O the system is more responsive because the message pump can be running while the I/O is in progress. But perhaps less obvious is why async I/O is advantageous even if you're already on a threadpool thread. The threadpool is a scarce resource too. By default the CLR only provides as many threadpool threads as the user has cores. This often means 4 threads, but can be as low as 1 or 2 on netbooks (and yes, some Visual Studio customers develop on netbooks). So blocking a threadpool thread for more than a very brief time can delay the threadpool from serving other requests, sometimes for very long periods. If the main thread, which is active while you're doing I/O on a threadpool thread, processes a message that requires use of the threadpool, you may end up being responsible for the IDE freezing up on the user because you're blocking the threadpool. If you use await whenever you can even on threadpool threads, then those threads can return to the pool during your async operation and serve these other requests, keeping the overall application responsive and your extensions acting snappy, so customers don't uninstall your extension because it degrades the IDE.

Call async methods from synchronous methods without deadlocking

In a pure async world, you're home free just writing "async Task" methods. But what if one of your callers is not async and cannot be changed to be async? There are valid cases for this, such as when your caller is implementing a public interface that has already shipped. If you have ever tried to call an async method from a synchronous one, you may have tried forcing synchronous execution by calling Task.Wait() or Task.Result on the Task or Task<T> returned from the async method. And you probably found that it deadlocked. You can use the JoinableTaskFactory.Run method to avoid deadlocks in these cases:

The above Run method will block the calling thread until SomeOperationAsync() has completed. You can also return a value computed from the async method to your caller with Run<T>:

Calling the Run method is equivalent to calling the RunAsync method and then calling JoinableTask.Join on its result. This way, you can potentially kick off work asynchronously and then later block the UI thread if you need to while it completes.

The 3 threading rules

Using the JoinableTaskFactory requires that you follow three rules in the managed code that you write:

  1. If a method has certain thread apartment requirements (STA or MTA) it must either:
    1. Have an asynchronous signature, and asynchronously marshal to the appropriate thread if it isn't originally invoked on a compatible thread. The recommended means of switching to the main thread is:

      OR

    2. Have a synchronous signature, and throw an exception when called on the wrong thread.

    In particular, no method is allowed to synchronously marshal work to another thread (blocking while that work is done). Synchronous blocks in general are to be avoided whenever possible.
    See the Appendix section for tips on identifying when this is necessary.

  2. When an implementation of an already-shipped public API must call asynchronous code and block for its completion, it must do so by following this simple pattern:
  3. If ever awaiting work that was started earlier, that work must be Joined. For example, one service kicks off some asynchronous work that may later become synchronously blocking:

    Note however that this extra step is not necessary when awaiting is done immediately after kicking off an asynchronous operation.
    In particular, no method should call .Wait() or .Result on an incomplete task.

A failure to follow any of the above rules may result in your code causing Visual Studio to deadlock. Analyzing deadlocks with the debugger when synchronous code is involved is usually pretty straightforward because there are usually two threads involved (one being the UI thread) and it's easy to see callstacks that tell the story of how it happened. When writing asynchronous code, analyzing deadlocks requires a new set of skills, which we may document in a follow-up post on this blog if there is interest.

You can read more about these and related types on MSDN:

JoinableTask Interop with the VS Task Library (IVsTask)

What is the VS Task Library?

The VS Task Library, if you're not already familiar with it, was introduced in Visual Studio 2012 in order to provide multi-threaded task scheduling to native code running in Visual Studio. This is how ASL (asynchronous solution load) was built for all the project systems that were written in native code. The VS Task Library is itself based on TPL, and the COM interfaces look very similar to the TPL public surface area. In particular, the VS Task Library is based on TPL as it was in .NET 4.0, so strictly speaking the VS library doesn't help you write asynchronous tasks using async methods. But it does let you schedule synchronous methods for execution on background threads with continuations on the main thread, thereby achieving an async-like effect, just with a bit more work.

While we're on the subject, a word of caution: even though TPL and the VS Task Library look similar, they don't always behave the same way. The VS Task Library changes the behavior of some things like cancellation and task completion in subtle but important ways. If you're already familiar with TPL, don't make assumptions about how the VS Task Library works.

One important difference between TPL and IVsTask is that if you block the main thread by calling Task.Wait() or Task.Result on an incomplete task, you'll likely deadlock. But if you call IVsTask.Wait() or IVsTask.GetResult(), VS will intelligently schedule tasks that require the UI thread to avoid deadlocks in most cases. This deadlock-resolving trait is similar to the JoinableTask.Join() method.

JoinableTask and IVsTask working together

Suppose you need to implement a COM interface that requires your method to return an IVsTask. This is an opportunity for you to implement the method asynchronously to improve IDE responsiveness. But creating an IVsTask directly is tedious, especially when the work is actually asynchronous. Now with JoinableTaskFactory it's easy using the RunAsyncAsVsTask method. If all you have is a JoinableTask instance, you can readily port it to an IVsTask using the JoinableTask.AsVsTask() extension method. But let's look at some samples that use RunAsyncAsVsTask, as that is preferable since it supports cancellation.

What we've done here is created an IVsTask using the JoinableTaskFactory. Any work that requires the UI thread will be scheduled using background priority (which means any user input in the message queue is processed before this code starts executing on the UI thread). Several priorities are available and are described on MSDN. The cancellation token passed to the delegate is wired up to the IVsTask.Cancel() method so that if anyone calls it, you can wrap up your work and exit early.

If your async delegate calls other methods that return IVsTask, you can safely await IVsTask itself within your async delegate naturally, as shown here:

Here is a longer, commented sample so you can understand more of the execution flow:

Which scheduling library should I use?

When you're writing native code, the VS Task Library is your only option. If you're writing managed code in VS, you now have three options, each with some benefits:

  1. System.Threading.Tasks.Task
    1. Pros:
      1. Built into .NET 4.x so code you write can run in any managed process (not just VS).
      2. C# 5 has built-in support for creating and awaiting them in async methods.
    2. Cons
      1. Quickly deadlocks when the UI thread blocks on a Task's completion.
  2. Microsoft.VisualStudio.Shell.Interop.IVsTask
    1. Pros:
      1. Automatically resolves deadlocks.
      2. Compatible with C++ and COM interfaces and is therefore used for any public IVs* interface that must be accessible to native code.
    2. Cons:
      1. Relatively high scheduling overhead. Usually not significant though if your scheduled work is substantial per IVsTask.
      2. Hardest option for managed coders since tasks and continuations have to be manually stitched together.
      3. No support for async delegates.
  3. Microsoft.VisualStudio.Threading.JoinableTask
    1. Pros:
      1. Automatically resolves deadlocks.
      2. Allows authoring of C# 5 friendly async Task methods that are then adapted to JoinableTask.
      3. Less scheduling overhead than IVsTask.
      4. Bidirectional interop with IVsTask (produce and consume them naturally).
    2. Cons:
      1. More overhead than TPL Task.
      2. Not available to call from native code.

To sum up, if you're writing managed code, any async or scheduled work should probably be written using C# 5 async methods, with JoinableTaskFactory.RunAsync as the root caller. This maximizes the benefit of reusable async code while mitigating deadlocks.

Summary

The JoinableTaskFactory makes asynchronous code easier to write and call for code that runs within Visual Studio. Switching to the UI thread should always be done using JoinableTaskFactory.SwitchToMainThreadAsync(). The JoinableTaskFactory should be obtained from ThreadHelper.JoinableTaskFactory.

As you use these patterns, we'd like to hear your feedback. Please add comments on this post.

Appendix

It is not always obvious whether some VS API you're calling may require marshaling to the UI thread. Here is a list of types, members, or tips to help identify when you should explicitly switch to the UI thread yourself using SwitchToMainThreadAsync() before calling into VS:

  1. Package.GetGlobalService()
  2. Casting an object to any other type (object to IVsHierarchy for example). If the object being cast is a native COM object, the cast may incur a call to IUnknown.QueryInterface, and will most likely require the UI thread.
  3. Almost any call to an IVs* interface.