What timeout should I use for my BlockingCollection?

What timeout should I use for my BlockingCollection?

  • Comments 3

I have a tool which opens a file, parses the contents, filters and transforms some of the content and outputs the transformed results. The architecture is:

1. The main thread kicks off async I/O.

2. The I/O worker thread transforms the received byte array into an array of strings and adds that array to a BlockingCollection.

3. Another thread pulls from the BlockingCollection, filters unwanted strings, transforms the rest into the desired output format and adds the individual output strings into another BlockingCollection.

4. The main thread pulls from the final BlockingCollection and outputs the result.

This should keep the different cores on a computer working pretty efficiently, especially since the stage 3 transformations can be done in parallel. Of course if the different stages can consume the input faster than the file system can load it into memory, there won’t be any work to do, but there shouldn’t be a reason to bottleneck as long as there is work to do.

I ran the tool through the Visual Studio 2010 Concurrency Visualizer and was surprised at the results. Looking at the CPU Utilization view it’s obvious the tool is not getting better than single core performance:

clip_image001

That’s certainly not ideal. At least the tool is doing work the whole time, right? Looking at the Threads View of the trace I can look at the work which is actually going on:

clip_image003

It looks like the main thread is doing all of the work, which is odd given that it’s in a method which consists of:

while (!outputLines.IsCompleted)
{
   String line;
   if(outputLines.TryTake(out line))
   {
      writer.WriteLine(line);
   }
}

Another interesting point is the pattern of the pinkish looking blocks. What they represent are periods where the thread is repeatedly in a short Synchronization segment, followed by an even shorter Execution segment. What’s most interesting about these blocks is that only one thread is behaving like that at any given time, and there appears to always be one thread displaying that behavior. So there’s some task being passed around from thread to thread.

Selecting any of the red Synchronization blocks shows that they are always all blocked on the thread that’s expressing the pinkish behavior. The stack trace shows that they are blocked on clr.dll!WKS::GCHeap::WaitUntilGCComplete. Zooming into a pinkish section and selecting a Synchronization block therein shows that the thread is currently in clr.dll!Thread::SysSuspendForGC (you can read up about that method on Tess Ferrandez’s blog). At this point it appears that the GC needs to run and has suspended all but one of the threads:

clip_image005

Selecting one of the Execution blocks on the main thread shows that during this time at the top of its stack is clr.dll!ThreadNative::SpinWait. At this point I determine that the problem is that one of the consumer threads is spinning in the BlockingCollection’s TryTake method, but no data is being produced because the GC is attempting to suspend all of the threads. As a result the tool is getting a few milliseconds of work done for every few seconds of spinning and waiting!

In an attempt to change the behavior I added a timeout of 1 millisecond to the TryTake calls. This should bring the thread back to managed code every millisecond to make sure that the GC can suspend it. Looking at the threads view of a new trace shows that the tool was able to do a lot more work:

clip_image007

To see how much more work was accomplished compare the scenario markers from the first trace to the second trace. In the first trace there are three scenario markers for fifteen seconds, and in the second trace there are nearly a hundred markers for fifteen seconds. Since the same amount of work is being done between each scenario marker, this is roughly a 30x speedup!

In experimenting to find a good timeout value for TryTake I tried Timeout.Infinite (-1) and was surprised to find that it did not have the same behavior as calling TryTake with no parameter. It looks like Infinite results in pretty much the same behavior as calling TryTake with the smallest possible sleep time:

clip_image009

This caught me off guard because I thought that TryTake taking a long time was the source of the long SpinWait times. To continue the experiment I added an else statement to the if statement surrounding TryTake and had the thread sleep for one millisecond.

while (!outputLines.IsCompleted)
{   
   String line;
   if(outputLines.TryTake(out line))
   {
      writer.WriteLine(line);
   }
   else
   {
      Thread.Sleep(1); 
   }
}

I expected that this would look very much like the original due to an identical call to TryTake, but it did not:

clip_image011

Since it appears that having a parameter-less call to TryTake is not the source of the call to ThreadNative::SpinWait, and that adding an else statement to the TryTake call also changes the behavior of the application, I can only conclude that the CLR is making a looping optimization which has gone awry. Thankfully the Concurrency Visualizer allows me to investigate the behavior of my application and find out where the appropriate places to make changes are. As for what the ideal timeout is, that depends on your application. For the tool I’m working on, it certainly is not 0.

Jared Van Leeuwen – Parallel Computing Platform

  • Jared, try to rework your code such that GC calls are minimized. e.g. reduce unnecessary ctor's/dtor's and reuse objects.

    Experiment with this

    {

     String line; // ctor line once, dtor on completion

     while (!outputLines.IsCompleted)

     {

       if(outputLines.TryTake(out line))

       {

         writer.WriteLine(line);

       }

       else

       {

         // be kind to other threads

         _mm_pause(); // or Sleep(0) (not switch_to_thread())

       }

     }

    }

  • @Jim: yeah, because .NET strings have constructors and destructors... not.

  • Reworking code to call the GC less is generally a good idea. The GC is a tool that can be used appropriatly, and a tool that can be abused. There are times when creating object pools make sense and other times it makes sense to rework code to have a singleton, but most of the time the best thing to do (what ROI do I get for my time) is to use the GC as intended.

    I'm sure we can have a lively debate about an appropriate level of GC frequency, but that's not the point. The point is that something unexpected was blocking the GC from functioning, and the profiling tool revealed that.

Page 1 of 1 (3 items)
Leave a Comment
  • Please add 6 and 2 and type the answer here:
  • Post