ThreadPool performance you can see

ThreadPool performance you can see

  • Comments 10

We've spent a lot of time touting improvements to the .NET Framework in .NET 4 around threading, including core enhancements to the performance of the runtime itself.  Sometimes data is more powerful than words, however, and it's useful to be able to see exactly what kind of difference such improvements can make.  To assist with that, here is code for a small sample you can try compiling and running on your own:

using System;

using System.Linq;

using System.Threading;

using System.Diagnostics;

 

class Program

{

    static void Main(string[] args)

    {

        Console.WriteLine(

            TimeSpan.FromMilliseconds(

                Enumerable.Range(0, 6).Select(_ =>

                {

                    var sw = Stopwatch.StartNew();

                    CreateAndWaitForWorkItems(10000000);

                    return sw.ElapsedMilliseconds;

                }).Skip(1).Average()

            )

        );

    }

 

    static void CreateAndWaitForWorkItems(int numWorkItems)

    {

        using (ManualResetEvent mre = new ManualResetEvent(false))

        {

            int itemsRemaining = numWorkItems;

            for (int i = 0; i < numWorkItems; i++)

            {

                ThreadPool.QueueUserWorkItem(delegate

                {

                    if (Interlocked.Decrement(

                        ref itemsRemaining) == 0) mre.Set();

                });

            }

            mre.WaitOne();

        }

    }

}

The CreateAndWaitForWorkItems method simply launches N work items using ThreadPool.QueueUserWorkItem and then waits for all N to complete by atomically decrementing a shared counter.  The main method then times the invocation of this method with N equal to 10 million, doing so several times and taking the average.  This microbenchmark is pure overhead (with a lot of synchronization overhead), as there's no actual work being performed in each work item. In fact, we should expect that as we add more cores (or at least more threads), the time to complete this operation will increase, as more cores will contend for the data structures employed in both the ThreadPool and in my simple test. The hope is that the work done in .NET 4 decreases that overhead, especially on higher core counts where more and more threads will be contending for the shared data structures employed.

The following numbers are in no way official benchmarks, but they can give you a sense for how the work that's been done in .NET 4 really does make a difference. These are the numbers I see when I run this microbenchmark informally on .NET 3.5 and on .NET 4 on two laptops I currently have access to while writing this blog post.  The only change I made to go from .NET 3.5 to .NET 4 was modifying the "Target framework" in the project's properties in Visual Studio, taking advantage of Visual Studio 2010's multitargeting support.

Machine

.NET 3.5

.NET 4

Improvement

A dual-core box

  5.03 seconds

2.45 seconds

2.05x

A quad-core box

19.39 seconds

3.42 seconds

5.67x

Some pretty awesome performance improvements simply by upgrading to .NET 4.

Leave a Comment
  • Please add 8 and 4 and type the answer here:
  • Post
  • Wooow !

    Congratulations and thanks !!

  • Question: in previous version of the .NET, is ThreadPool able to distribute the operations between cores?

  • Damon, yes, it's always utilized whatever cores the machine has, up to 32 in a 32-bit process and 64 in a 64-bit process.

  • Why is the quad-core value for .NET 3.5 nearly 4 times the dual-core value? This seems to tell me that quad-core for a multithreaded .NET 3.5 program is a bad idea, as it will run faster on a dual-core. This is counter-intuitive. What am I interpreting wrong here?

  • Ok, I realize that this measures overhead only, but is there really that much more overhead for a quad-core box? What about a eight-core box then? Are we talking a quadrupling in overhead there as well?

  • Iàm looking forward to receive Visual Studio 2010 with the MSDN subscription DVD to enjoy this numbers.

  • Lasse-

    Good questions.  An analogy might help here.

    Consider a team working on a project.  When there's just one person, there's basically no overhead, as that person can chug along getting the job done.  When another person comes on to the team, now those two people need to start coordinating, which requires some face to face meetings.  Now additional folks come on to the project, and you not only have one-on-ones each week (potentially up to N^2 of them), but also weekly team meetings or daily scrum standups, status reports that need to get written, team reviews with management to understand project progress, people randomly stopping by your office to ask you questions, etc.  The more people on the project, the more overhead there is, and this overhead potentially grows greater than linearly with the number of team members.  If you just catalogued how much time was being spent in this overhead, and ignored the actual work being done by folks, that's similar to what this test is measuring.

    In effect, and continuing with the analogy, .NET 4 hired a good project manager. He ensures there are fewer meetings and that the meetings that are held are kept short.  He also pays keen attention to whether the team as a whole is being efficient, and may remove folks from the project if they're actually decreasing the team's productiveness / work throughput rather than increasing it.

    Again, keep in mind that this example was pure overhead.  As soon as you start adding other "real" work into the work item bodies, you very quickly see these differences level off.  The key then is that these improved efficiencies in .NET 4 allow you to break your problem apart into more smaller pieces than you previously could efficiently, which means you can scale your problem to larger and larger systems.

  • A friend of mine just told me that there is a native command for those running VS2010 that allows them to recompile their assemblies into code native for their architecture.

    He said it had to do with running NGEN with (a) specific argument(s) and it would rebuild all of .NET 4 Assemblies (and he may have mentioned 3.5 as well, cannot remember, and I ended up not saving the chat).

    Furthermore, once this was performed, any program that then installed that used 4 would also undergo an NGEN process recompilation.

    Any ideas on this?

  • John -

    Most VS 2010 and .NET 4 assemblies are already pre-compiled to native code using NGen today. Are you referring to application binaries built using VS 2010?

    The .NET Framework and Visual Studio assemblies are pre-compiled in the background when the machine is idle. You can run commands such as "ngen.exe ExecuteQueuedItems" to eagerly generate the images (synchronously)instead of waiting for the machine to be idle. You probably don't want to regenerate existing native images, although there are commands to do that as well.

    There isn't however any command that globally opts in all managed code on a given system to be compiled to native code ahead of execution time.

    If you can help me understand what you may be trying to accomplish with NGen, I may be able to provide a more informative answer.

    BTW, some information about NGen can be found here:

    http://msdn.microsoft.com/en-us/library/6t9t5wcf(VS.80).aspx &

    http://msdn.microsoft.com/en-us/magazine/cc163610.aspx.

  • Would appreciate if you can share the numbers on .NET 2.0?

Page 1 of 1 (10 items)