<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Eric Eilebrecht's blog : ThreadPool</title><link>http://blogs.msdn.com/ericeil/archive/tags/ThreadPool/default.aspx</link><description>Tags: ThreadPool</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>ThreadPool on Channel 9</title><link>http://blogs.msdn.com/ericeil/archive/2009/06/01/threadpool-on-channel-9.aspx</link><pubDate>Mon, 01 Jun 2009 20:52:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9679780</guid><dc:creator>ericeil</dc:creator><slash:comments>3</slash:comments><comments>http://blogs.msdn.com/ericeil/comments/9679780.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ericeil/commentrss.aspx?PostID=9679780</wfw:commentRss><wfw:comment>http://blogs.msdn.com/ericeil/rsscomments.aspx?PostID=9679780</wfw:comment><description>&lt;P&gt;Charles from&amp;nbsp;&lt;A href="http://channel9.msdn.com/" mce_href="http://channel9.msdn.com/"&gt;Channel 9&lt;/A&gt; stopped by my office a couple of weeks ago to chat with Erika Parsons and I about the &lt;A href="http://blogs.msdn.com/ericeil/archive/2009/04/23/clr-4-0-threadpool-improvements-part-1.aspx" mce_href="http://blogs.msdn.com/ericeil/archive/2009/04/23/clr-4-0-threadpool-improvements-part-1.aspx"&gt;ThreadPool work we're doing for&amp;nbsp;.NET 4&lt;/A&gt;.&amp;nbsp; I learned something from this: understanding something, and explaining it in person, for the first time, in front of a camera, are two very different things.&amp;nbsp;:)&amp;nbsp; If you'd like to hear a little more about the .NET 4 ThreadPool, and have about 45 minutes to spare, &lt;A href="http://channel9.msdn.com/shows/going+deep/erika-parsons-and-eric-eilebrecht--clr-4-inside-the-new-threadpool/" mce_href="http://channel9.msdn.com/shows/going+deep/erika-parsons-and-eric-eilebrecht--clr-4-inside-the-new-threadpool/"&gt;check out the video&lt;/A&gt;.&amp;nbsp; Then be sure to read a &lt;A href="http://en.wikipedia.org/wiki/ABA_problem" mce_href="http://en.wikipedia.org/wiki/ABA_problem"&gt;&lt;EM&gt;correct&lt;/EM&gt; explaination of the A-B-A problem&lt;/A&gt;.&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=9679780" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ericeil/archive/tags/ThreadPool/default.aspx">ThreadPool</category></item><item><title>CLR 4.0 ThreadPool Improvements: Part 1</title><link>http://blogs.msdn.com/ericeil/archive/2009/04/23/clr-4-0-threadpool-improvements-part-1.aspx</link><pubDate>Thu, 23 Apr 2009 20:40:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9565165</guid><dc:creator>ericeil</dc:creator><slash:comments>27</slash:comments><comments>http://blogs.msdn.com/ericeil/comments/9565165.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ericeil/commentrss.aspx?PostID=9565165</wfw:commentRss><wfw:comment>http://blogs.msdn.com/ericeil/rsscomments.aspx?PostID=9565165</wfw:comment><description>&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;This is the first in a series of posts about the improvements we are making to the CLR thread pool for CLR 4.0 (which will ship with Visual Studio 2010).&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;This post will cover changes to the queuing infrastructure in the thread pool, which aim to enable high-performance fine-grained parallelism via the Task Parallel Library.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;Future posts will cover the “thread injection” algorithm, and any other topics that readers would like to see.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt 0.5in" class=MsoNormal&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-SIZE: 10pt"&gt;&lt;FONT face=Calibri&gt;Please note that all the usual caveats apply here: I’m discussing pre-release software, and all details are subject to change before final release.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;In fact, one goal of this post is to solicit feedback, so we can know what changes we &lt;I style="mso-bidi-font-style: normal"&gt;need&lt;/I&gt; to make before we ship. &lt;/FONT&gt;&lt;/SPAN&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-FAMILY: Wingdings; FONT-SIZE: 10pt; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin; mso-char-type: symbol; mso-symbol-font-family: Wingdings"&gt;&lt;SPAN style="mso-char-type: symbol; mso-symbol-font-family: Wingdings"&gt;J&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-SIZE: 10pt"&gt;&lt;?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;A thread pool basically has two functions:&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;It maintains a queue (or queues) of work to be done, and a collection of threads which execute work from the queue(s).&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;So designing a thread pool really comes down to a) finding ways to enqueue and dequeue work items very quickly (to keep the overhead of using the thread pool to a minimum) and b) developing an algorithm for choosing an optimal number of threads to service the queues.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;This post will cover part (a).&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;In all prior releases of the CLR, the thread pool exposed a single way to queue work: ThreadPool.QueueUserWorkItem (which I will abbreviate as QUWI from here on out).&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;There are a couple of overloads of this method, and also a version called UnsafeQueueUserWorkItem, but these all amount to basically the same thing: you give us a delegate, and we stick it on a queue for later execution.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt 0.5in" class=MsoNormal&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-SIZE: 10pt"&gt;&lt;FONT face=Calibri&gt;(Really there are even more ways to queue work.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;As mentioned in my &lt;/FONT&gt;&lt;A href="http://blogs.msdn.com/ericeil/archive/2008/06/20/windows-i-o-threads-vs-managed-i-o-threads.aspx" mce_href="http://blogs.msdn.com/ericeil/archive/2008/06/20/windows-i-o-threads-vs-managed-i-o-threads.aspx"&gt;&lt;FONT face=Calibri&gt;previous post&lt;/FONT&gt;&lt;/A&gt;&lt;FONT face=Calibri&gt; we really have &lt;I style="mso-bidi-font-style: normal"&gt;two&lt;/I&gt; pools of threads – the “worker threads” and the “I/O threads.”&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;Work is queued to the I/O threads in response to the completion of asynchronous I/O, or manually via ThreadPool.UnsafeQueueNativeOverlapped.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;We currently do not plan any significant changes to the I/O pool for CLR 4.0, as our focus for this release is on enabling fine-grained computational parallelism.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;For the remainder of this post, we will only discuss the mechanisms behind the “worker threads.”)&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;QUWI conveys basically zero information about each work item, aside from that it exists.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;This places some important constraints on the execution of these items.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;For example, the thread pool does not know whether individual work items are related or not, so it has to assume they are all completely independent, implying that we cannot reorder work to optimize its execution, as independent work items typically must be executed in FIFO order to ensure fairness.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;(Imagine each work item represents a request from a user - you would not want to keep earlier requests waiting while later requests are processed, as this would result in unacceptably long latency for the users who made their requests first).&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3&gt;&lt;FONT face=Calibri&gt;This means that we are basically forced to use a single FIFO queue for all work queued via QUWI.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;In prior versions of the CLR, this queue was a simple linked list, protected by a Monitor lock.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;This incurs some overhead: we must allocate nodes for the list (and pay the cost of the GC having to traverse the list each time a GC occurs), and we must pay the cost of acquiring that lock every time we enqueue or dequeue a work item.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt 0.5in" class=MsoNormal&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-SIZE: 10pt"&gt;&lt;FONT face=Calibri&gt;(Another aside: please do not take the above to mean that we make any guarantees about strict FIFO ordering of work item execution.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;In fact, we violate this “rule” already: since .NET 3.5, the CLR thread pool has maintained separate FIFO queues for each AppDomain in the process, and an additional independent FIFO queue for “native” work items such as those queued by a host (ASP.net being the prime user of this feature).&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;We round-robin between these work queues, allowing each to execute work for some time before moving on to the next.&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt 0.5in" class=MsoNormal&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-SIZE: 10pt"&gt;&lt;FONT face=Calibri&gt;This strategy is motivated by performance concerns, as it greatly reduces the number of transitions between AppDomains, which are fairly expensive.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;But it is designed to maintain fairness, which is the chief concern which we can never completely abandon.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;We may make further changes in the future which further deviate from the strict FIFO model, but we are unlikely to ever make QUWI really unfair, which, as we will see, is crucial to achieving good performance for fine-grained workloads.)&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;This was fine for the kinds of workloads for which the thread pool was originally designed. These are relatively "coarse" workloads, where each work item represents a fairly large amount of work.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;The canonical example is an ASP.net web application, where each work item represents the generation of an entire web page.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;In such workloads, the work itself takes long enough that the overhead of allocating queue nodes and acquiring locks is barely noticeable.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3&gt;&lt;FONT face=Calibri&gt;However, in the new world of machines with rapidly increasing core counts, there is increased interest in more "fine grained" work.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;Where before the job of the thread pool was to take a large number of independent, coarse-grained, tasks and funnel them onto a few threads, we are increasingly being asked to execute many very small tasks representing tiny pieces of some larger operation.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;Anyone who has tried executing such a workload on the existing CLR thread pool has probably found that it's not a simple matter of calling QUWI for each piece of the calculation; with such tiny work items, the overhead of enqueuing and dequeuing the work can be much greater than the work itself, resulting in slower execution than if we had just run the work on a single thread to begin with!&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;It is possible to make this work, by “batching” work into a smaller number of calls to QUWI.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;There are many strategies for this, all of which are fairly complex in the general case.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;We would like to make this easy, but the current QUWI is insufficient for this goal.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;We can improve this situation in a couple of ways:&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;we can implement a more efficient FIFO queue, and we can enhance the API to allow the user to give us more information, allowing us to turn to even more efficient queuing strategies.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;For CLR 4.0, we are doing both of these.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;B style="mso-bidi-font-weight: normal"&gt;&lt;FONT size=3&gt;&lt;FONT face=Calibri&gt;Faster FIFO&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;Recall that the overhead of the existing FIFO queue comes from the expense of allocating and traversing the data structure, and the cost of acquiring the lock on each enqueue and dequeue operation.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;For 4.0, we are switching to a lock-free data structure with much lower synchronization overhead.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;More importantly, this new queue is much friendlier to the GC; we still need to allocate a new object for each call to QUWI, but these objects are smaller, and are tracked in large “chunks” which are much easier for the GC to traverse than the simple linked list used previously.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;This new queue is virtually identical to System.Collections.Concurrent.ConcurrentQueue&amp;lt;T&amp;gt;, which is also new in 4.0.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;Improving the performance of QUWI is nice, as it benefits existing applications which use the thread pool without requiring any changes to the application code.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;How much of a speedup you can expect will depend greatly on many factors, including your application’s workload and the details of the particular hardware on which it executes, but for fine-grained workloads on multi-core hardware the speedup should be significant.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;However, we are still restricted in what we can do here – we still have very little information about the work we’re executing, and so we still need to use the same basic strategy to execute it.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;We can trim overhead here and there, but QUWI will probably never be a great way to execute very fine-grained workloads.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;We need a new API.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;B style="mso-bidi-font-weight: normal"&gt;&lt;FONT size=3&gt;&lt;FONT face=Calibri&gt;The Task Parallel Library&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;The Task Parallel Library (TPL) is a collection of new classes specifically designed to make it easier and more efficient to execute very fine-grained parallel workloads on modern hardware.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;TPL has been available separately as a &lt;/FONT&gt;&lt;A href="http://blogs.msdn.com/somasegar/archive/2007/11/29/parallel-extensions-to-the-net-fx-ctp.aspx" mce_href="http://blogs.msdn.com/somasegar/archive/2007/11/29/parallel-extensions-to-the-net-fx-ctp.aspx"&gt;&lt;FONT size=3 face=Calibri&gt;CTP&lt;/FONT&gt;&lt;/A&gt;&lt;FONT size=3 face=Calibri&gt; for some time now, and was included in the &lt;/FONT&gt;&lt;A href="http://www.microsoft.com/visualstudio/en-us/products/2010/default.mspx" mce_href="http://www.microsoft.com/visualstudio/en-us/products/2010/default.mspx"&gt;&lt;FONT size=3 face=Calibri&gt;Visual Studio 2010 CTP&lt;/FONT&gt;&lt;/A&gt;&lt;FONT size=3 face=Calibri&gt;, but in those releases it was built on its own dedicated work scheduler.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;For Beta 1 of CLR 4.0, the default scheduler for TPL will be the CLR thread pool, which allows TPL-style workloads to “play nice” with existing, QUWI-based code, and allows us to reuse much of the underlying technology in the thread pool - in particular, the thread-injection algorithm, which we will discuss in a future post. &lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;I won’t discuss all of the details of the TPL API, which better covered by &lt;/FONT&gt;&lt;A href="http://blogs.msdn.com/pfxteam/" mce_href="http://blogs.msdn.com/pfxteam/"&gt;&lt;FONT size=3 face=Calibri&gt;its authors&lt;/FONT&gt;&lt;/A&gt;&lt;FONT size=3 face=Calibri&gt;. From the point of view of the performance of the thread pool, the important thing about TPL is that it is a much richer API than QUWI, giving the thread pool much more information about the work being executed.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;In particular, the new Task type exposes the notion of parent/child relationships, giving us some idea of the &lt;I style="mso-bidi-font-style: normal"&gt;structure&lt;/I&gt; of the overall computation being performed by the individual work items.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;Having this information opens up possibilities for much more efficient execution of these tasks.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt 0.5in" class=MsoNormal&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-SIZE: 10pt"&gt;&lt;FONT face=Calibri&gt;Even without parent/child relationships, Task is a major improvement over QUWI.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;QUWI returns nothing of use to the caller; it simply queues a delegate, and leaves it up to the implementation of that delegate to coordinate its activities with the rest of the application.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;QUWI provides no means of waiting for the completion of the work item, for handling exceptions, or getting the result of a computation.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;Task provides all of this in a very easy-to-use form, while adding very little overhead vs. QUWI.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt 0.5in" class=MsoNormal&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-SIZE: 10pt"&gt;&lt;FONT face=Calibri&gt;The fact that Task has a Wait method is not just a convenience; it eliminates one of the most common problems people face when using QUWI.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;It is fairly common for one work item to need to wait for the execution of another work item to complete.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;If the second work item has not yet begun executing, it will be sitting in the queue waiting for a worker thread to pick it up.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;It is possible that there are no available worker threads – maybe they’re all waiting for other work items to complete!&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;This can cause deadlock in the worst case, and very slow execution in the best, as the thread pool may be slow to add more worker threads to pick up these work items.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;Task.Wait, on the other hand, knows it’s waiting for another task, and is tightly integrated with the thread pool such that it is able to determine whether the task has started executing, and if not &lt;I style="mso-bidi-font-style: normal"&gt;it executes it immediately, in-line on the current thread&lt;/I&gt;.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;This greatly improves performance and eliminates the possibility of deadlock in this situation.&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt 0.5in" class=MsoNormal&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-SIZE: 10pt"&gt;&lt;FONT face=Calibri&gt;For new code, &lt;B style="mso-bidi-font-weight: normal"&gt;Task is now the preferred way to queue work to the thread pool&lt;/B&gt;.&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;Top-level Tasks have no parent.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;These are Tasks created by non-thread-pool threads, or with certain options specified at Task-creation time.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;These tasks are queued to the same FIFO queue we use for QUWI, and thus benefit from the improvements we’ve made there – but they are also subject to the same limitations.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;Tasks queued in this way are simply a better QUWI – but now the fun starts:&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;A parent task can create child tasks.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;This happens whenever a Task creates another Task (unless it overrides this behavior).&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;These children are implicitly treated as sub-tasks of the larger task.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;We assume that sub-tasks can be executed in any order – fairness is not necessary – because all that matters is that the overall operation be completed as fast as possible.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;This lets us throw those FIFO restrictions out the window, and opens up the possibility for much more efficient work scheduling strategies.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;B style="mso-bidi-font-weight: normal"&gt;&lt;FONT size=3&gt;&lt;FONT face=Calibri&gt;Work Stealing&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;Since a child task is just a piece of a larger task, we don’t need to worry about execution order.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;We just need to execute these things quickly.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;One well-known strategy for fast execution of unordered work items is “work stealing.” &lt;/FONT&gt;&lt;A href="http://www.bluebytesoftware.com/blog/2008/08/12/BuildingACustomThreadPoolSeriesPart2AWorkStealingQueue.aspx" mce_href="http://www.bluebytesoftware.com/blog/2008/08/12/BuildingACustomThreadPoolSeriesPart2AWorkStealingQueue.aspx"&gt;&lt;FONT size=3&gt;&lt;FONT face=Calibri&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&lt;/SPAN&gt;Joe Duffy&lt;/FONT&gt;&lt;/FONT&gt;&lt;/A&gt;&lt;FONT size=3 face=Calibri&gt; and &lt;/FONT&gt;&lt;A href="http://www.danielmoth.com/Blog/2008/11/new-and-improved-clr-4-thread-pool.html" mce_href="http://www.danielmoth.com/Blog/2008/11/new-and-improved-clr-4-thread-pool.html"&gt;&lt;FONT size=3 face=Calibri&gt;Daniel Moth&lt;/FONT&gt;&lt;/A&gt;&lt;FONT size=3 face=Calibri&gt; explain this very well; click on the links if you’re interested. &lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;The most important aspect of work-stealing is that it enables very fast enqueue and dequeue in the typical case, often requiring no synchronization at all.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;This virtually eliminates a large part of the overhead of QUWI, when working with child tasks.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;We still do need to allocate memory for the Task itself, and for the work-stealing queue, but like the improvements to the FIFO queue these data structures have been optimized for good GC performance.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;Parent tasks are fast; child tasks are much faster.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;There are still some limitations to how quickly tasks can be executed.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;If all tasks are top-level (non-child) tasks, they are subject to the FIFO ordering constraints of QUWI (albeit with much richer functionality).&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;And even with work-stealing, we need to allocate and queue Task instances for every work item.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;To get even better performance, we need even more information about the work.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;Which brings us to…&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;B style="mso-bidi-font-weight: normal"&gt;&lt;FONT size=3 face=Calibri&gt;Parallel.For and PLINQ&lt;/FONT&gt;&lt;/B&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;While not, strictly speaking, features of the CLR thread pool, the methods of the new Parallel class and PLINQ are critical new features of the public concurrency APIs in CLR 4.0.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;In fine-grained parallel applications, it is very common to need to execute the same code, over and over, with different data inputs.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;With QUWI or Task, this means allocating and queuing separate workitems for each input.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;The thread pool infrastructure does not know that all of these work items do the same thing, so it literally has to execute them, one at a time, as if they were completely different tasks.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;Parallel.For, Parallel.ForEach, and PLINQ, provide a better way.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;These are essentially different ways of expressing the same thing:&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;here is some code that needs to execute N times, as quickly as possible.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3&gt;&lt;FONT face=Calibri&gt;Just as with the parent/child relationships that Task provides, this extra information enables more aggressive optimization.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;These “parallel loops” do not need to be broken down into separate work items for each loop iteration.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;All that is needed is to break them into enough chunks (“partitions”) that they can be efficiently load-balanced across all available machine resources.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;A typical scenario might be that 1,000,000 iterations need to be broken into, say, four work items.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;There is, of course, some overhead introduced by the need to dynamically partition the data (done automatically by the framework). &lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&lt;/SPAN&gt;But this pales in comparison to the savings of not having to allocate, queue, and dequeue millions (or more!) work items. &lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&lt;/SPAN&gt;In a test I just tried, for a particular work load on one of my machines, Parallel.For executed more than 300 times as fast as the equivalent naïve usage of QUWI.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;B style="mso-bidi-font-weight: normal"&gt;&lt;FONT size=3&gt;&lt;FONT face=Calibri&gt;Where do we go from here?&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;By now you’ve probably got the general theme: the more information we have about a workload, the faster we are likely to be able to execute it.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;I expect this theme to continue in future releases.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;The challenge is to find new ways to easily express (or automatically extract) useful information about parallel workloads, which can be used by the thread pool (or higher-level abstractions like PLINQ) to enable more optimizations.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;I think this is going to be an interesting space to watch for quite some time.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;In the meantime, please try out the new mechanisms we are providing in Visual Studio 2010 Beta 1. &lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&lt;/SPAN&gt;Try it with the kinds of workloads you expect to use in production; one of the biggest challenges we face is that we can really only guess at what developers will do with this stuff in the real world, so feedback from our customers is extremely important to ensure that these new mechanisms will meet your needs in the final product.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;Feel free to post any questions you may have in the comments for this post; I’ll try to answer what I can.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;My next post will cover the thread pool’s “thread injection” algorithm, which is how we determine how many threads should be servicing the various queues I’ve discussed here.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;If there’s something else you’d like me to cover, please let me know.&lt;/FONT&gt;&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=9565165" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ericeil/archive/tags/Concurrency/default.aspx">Concurrency</category><category domain="http://blogs.msdn.com/ericeil/archive/tags/Managed/default.aspx">Managed</category><category domain="http://blogs.msdn.com/ericeil/archive/tags/ThreadPool/default.aspx">ThreadPool</category></item><item><title>CLR 4.0: Parallel Extensions and the CLR ThreadPool</title><link>http://blogs.msdn.com/ericeil/archive/2008/11/07/the-clr-4-0-threadpool.aspx</link><pubDate>Fri, 07 Nov 2008 23:03:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9052984</guid><dc:creator>ericeil</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.msdn.com/ericeil/comments/9052984.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ericeil/commentrss.aspx?PostID=9052984</wfw:commentRss><wfw:comment>http://blogs.msdn.com/ericeil/rsscomments.aspx?PostID=9052984</wfw:comment><description>&lt;P&gt;It's been a while since I've posted; for the past several months I've been working on deep changes in the CLR ThreadPool to support the Parallel Extensions to the .NET Framework.&lt;/P&gt;
&lt;P&gt;I hope to say a few things about the changes we're making in the CLR to support this new programming model over the coming months.&amp;nbsp; But for now I encourage everyone to watch &lt;A class="" href="http://channel9.msdn.com/pdc2008/TL26/" mce_href="http://channel9.msdn.com/pdc2008/TL26/"&gt;Daniel Moth's excellent PDC presentation&lt;/A&gt; on this subject.&amp;nbsp; He does a fantastic job of outlining what we're doing, and why it matters to you.&lt;/P&gt;
&lt;P&gt;And a quick note about the CTP release of the CLR that was distributed at PDC: this does &lt;EM&gt;not&lt;/EM&gt; include the new ThreadPool.&amp;nbsp; The Parallel Extensions are running on their own task scheduler in this release.&amp;nbsp; However, as Daniel mentions we do intend to ship CLR 4.0 with the Parallel Extensions deeply integrated with a much-improved CLR thread pool.&amp;nbsp; Stay tuned!&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=9052984" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ericeil/archive/tags/Concurrency/default.aspx">Concurrency</category><category domain="http://blogs.msdn.com/ericeil/archive/tags/Managed/default.aspx">Managed</category><category domain="http://blogs.msdn.com/ericeil/archive/tags/ThreadPool/default.aspx">ThreadPool</category></item><item><title>Windows I/O threads vs. managed I/O threads</title><link>http://blogs.msdn.com/ericeil/archive/2008/06/20/windows-i-o-threads-vs-managed-i-o-threads.aspx</link><pubDate>Fri, 20 Jun 2008 19:41:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:8625789</guid><dc:creator>ericeil</dc:creator><slash:comments>2</slash:comments><comments>http://blogs.msdn.com/ericeil/comments/8625789.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ericeil/commentrss.aspx?PostID=8625789</wfw:commentRss><wfw:comment>http://blogs.msdn.com/ericeil/rsscomments.aspx?PostID=8625789</wfw:comment><description>&lt;P&gt;A question recently came up on an internal discussion forum, which I'll paraphrase:&amp;nbsp; The Windows QueueUserWorkItem API has an option to queue to an I/O thread.&amp;nbsp; Why doesn't the managed ThreadPool.QueueUserWorkItem support this option?&lt;/P&gt;
&lt;P&gt;First, some background: 
&lt;P&gt;In the Windows thread pool (the &lt;A href="http://msdn.microsoft.com/en-us/library/ms686756(VS.85).aspx" target=_blank mce_href="http://msdn.microsoft.com/en-us/library/ms686756(VS.85).aspx"&gt;old one&lt;/A&gt;, not the &lt;A href="http://msdn.microsoft.com/en-us/library/ms686760.aspx" target=_blank mce_href="http://msdn.microsoft.com/en-us/library/ms686760.aspx"&gt;new Vista thread pool&lt;/A&gt;), an "I/O thread" is one that processes APCs (&lt;A href="http://msdn.microsoft.com/en-us/library/ms681951.aspx" target=_blank mce_href="http://msdn.microsoft.com/en-us/library/ms681951.aspx"&gt;Asynchronous Procedure Calls&lt;/A&gt;) queued by other threads, or by I/O initiated from the I/O threads.&amp;nbsp; One example of an I/O functions that completes via APCs is &lt;A href="http://msdn.microsoft.com/en-us/library/aa365468(VS.85).aspx" target=_blank mce_href="http://msdn.microsoft.com/en-us/library/aa365468(VS.85).aspx"&gt;ReadFileEx&lt;/A&gt;.&amp;nbsp; The "non-I/O" threads get their work from a completion port, either as a result of QueueUserWorkItem, or I/O initiated on a handle bound to the threadpool with BinIoCompletionCallback.&amp;nbsp; So they are both geared toward processing I/O completions, but they just use different mechanisms. 
&lt;P&gt;In the managed ThreadPool, we use the terms "worker thread" and "I/O thread."&amp;nbsp; In our case, an I/O thread is one that waits on a completion port; i.e., it's exactly equivalent to Windows' &lt;I&gt;non-I/O thread&lt;/I&gt;.&amp;nbsp; How confusing!&amp;nbsp; Our "worker threads" wait on a simple user-space work queue, and never enter an alertable state (unless user code does so), and so explicitly do &lt;I&gt;not&lt;/I&gt; process APCs.&amp;nbsp; Managed "worker threads" have no equivalent in the Windows thread pool, just as Windows "I/O threads" have no managed equivalent. 
&lt;P&gt;The managed &lt;A href="http://msdn.microsoft.com/en-us/library/system.threading.threadpool.queueuserworkitem.aspx" target=_blank mce_href="http://msdn.microsoft.com/en-us/library/system.threading.threadpool.queueuserworkitem.aspx"&gt;QueueUserWorkItem&lt;/A&gt; queues work to the "worker threads" only.&amp;nbsp; &lt;A href="http://msdn.microsoft.com/en-us/library/system.threading.threadpool.unsafequeuenativeoverlapped.aspx" target=_blank mce_href="http://msdn.microsoft.com/en-us/library/system.threading.threadpool.unsafequeuenativeoverlapped.aspx"&gt;UnsafeQueueNativeOverlapped&lt;/A&gt; queues to the I/O threads, as do completions for asynchronous I/O performed on kernel objects that have been bound to the ThreadPool via &lt;A href="http://msdn.microsoft.com/en-us/library/system.threading.threadpool.bindhandle.aspx" target=_blank mce_href="http://msdn.microsoft.com/en-us/library/system.threading.threadpool.bindhandle.aspx"&gt;BindHandle&lt;/A&gt;. 
&lt;P&gt;Why don't we support APCs as a completion mechanism?&amp;nbsp; APCs are really not a good general-purpose completion mechanism for user code.&amp;nbsp; Managing the reentrancy introduced by APCs is nearly impossible; any time you block on a lock, for example, some arbitrary I/O completion might take over your thread.&amp;nbsp; It might try to acquire locks of its own, which may introduce lock ordering problems and thus deadlock.&amp;nbsp; Preventing this requires meticulous design, and the ability to make sure that someone else's code will never run during your alertable wait, and vice-versa.&amp;nbsp; This greatly limits the usefulness of APCs. 
&lt;P&gt;And APCs don't scale well, except in certain very constrained scenarios, because there's no load-balancing of completions across threads; all I/O initiated by a given thread always completes with an APC queued to that same thread.&amp;nbsp; You can, of course, implement your own load balancing, by using the APC to notify another thread of the completion, but you'll never do better in user-space than the kernel does with completion ports.&amp;nbsp; So we provide a rich async I/O infrastructure based on completion ports, and nothing else. 
&lt;P&gt;The real question to answer, then, is: why does Windows allow this option in the first place?&amp;nbsp; I would guess that this is because there is a lot of unmanaged code out there that uses APCs, so the unmanaged thread pool needs to support APCs in order to support running that large body of code on the thread pool.&amp;nbsp; This doesn't apply to managed code; when the managed thread pool was first introduced there &lt;EM&gt;was no managed code in existence&lt;/EM&gt;, so there was no backward-compatibility requirement to support APCs.&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=8625789" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ericeil/archive/tags/Managed/default.aspx">Managed</category><category domain="http://blogs.msdn.com/ericeil/archive/tags/Unmanaged/default.aspx">Unmanaged</category><category domain="http://blogs.msdn.com/ericeil/archive/tags/ThreadPool/default.aspx">ThreadPool</category></item><item><title>ThreadPool changes in .NET 3.5 SP1 Beta</title><link>http://blogs.msdn.com/ericeil/archive/2008/05/12/threadpool-changes-in-net-3-5-sp1-beta.aspx</link><pubDate>Mon, 12 May 2008 22:22:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:8496393</guid><dc:creator>ericeil</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.msdn.com/ericeil/comments/8496393.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ericeil/commentrss.aspx?PostID=8496393</wfw:commentRss><wfw:comment>http://blogs.msdn.com/ericeil/rsscomments.aspx?PostID=8496393</wfw:comment><description>&lt;P&gt;&lt;A class="" href="http://blogs.msdn.com/somasegar/archive/2008/05/12/visual-studio-2008-and-net-fx-3-5-sp1-beta-available-now.aspx" mce_href="http://blogs.msdn.com/somasegar/archive/2008/05/12/visual-studio-2008-and-net-fx-3-5-sp1-beta-available-now.aspx"&gt;.NET&amp;nbsp;3.5 SP1 (A.K.A. CLR 2.0 SP2) is now available as a Beta release.&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;This beta release has the fix for the ThreadPool ramp-up issue discussed &lt;A class="" href="http://blogs.msdn.com/ericeil/archive/2008/03/14/threadpool-bug-in-clr-2-0-sp1.aspx" mce_href="http://blogs.msdn.com/ericeil/archive/2008/03/14/threadpool-bug-in-clr-2-0-sp1.aspx"&gt;here&lt;/A&gt; and &lt;A class="" href="http://www.michaelckennedy.net/blog/PermaLink,guid,708ee9c0-a1fd-46e5-8fa0-b1894ad6ce0f.aspx" mce_href="http://www.michaelckennedy.net/blog/PermaLink,guid,708ee9c0-a1fd-46e5-8fa0-b1894ad6ce0f.aspx"&gt;here&lt;/A&gt;.&amp;nbsp; It also has some other improvements to the ThreadPool's thread creation algorithm that can dramatically improve performance in some situations.&lt;/P&gt;
&lt;P&gt;The "thread injection" algorithm in the ThreadPool is a tricky beast.&amp;nbsp; There's simply no perfect solution to the problem - so every change is a complex balancing act between&amp;nbsp;improving performance for a specific customer scenario&amp;nbsp;on the one hand, and not creating more problems for other customers on the other.&lt;/P&gt;
&lt;P&gt;We've tried to make these changes as carefully as possible, running them through all the threadpool scenario tests we've accumulated over the years.&amp;nbsp; But we can't possibly test every real-world workload - we don't even know about them all! -&amp;nbsp;and we necessarily have to depend on things like beta releases to find the problems our tests will never find.&lt;/P&gt;
&lt;P&gt;If you have code that uses the thread pool, I encourage you to download this beta and try it out.&amp;nbsp; Feel free to leave a comment here about any problems you might find, or you can use the official &lt;A class="" href="http://forums.microsoft.com/MSDN/ShowForum.aspx?ForumID=2136&amp;amp;SiteID=1" mce_href="http://forums.microsoft.com/MSDN/ShowForum.aspx?ForumID=2136&amp;amp;SiteID=1"&gt;discussion forum&lt;/A&gt;&amp;nbsp;and &lt;A class="" href="https://connect.microsoft.com/VisualStudio" mce_href="https://connect.microsoft.com/VisualStudio"&gt;feedback center&lt;/A&gt;&amp;nbsp;for the beta.&amp;nbsp; &lt;/P&gt;
&lt;P&gt;And if you have any suggestions for future improvements,&amp;nbsp;and especially if you have examples&amp;nbsp;of workloads that the current ThreadPool does not&amp;nbsp;handle well, please let us know.&amp;nbsp; &lt;/P&gt;
&lt;P&gt;I look forward to any feedback you might have.&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=8496393" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ericeil/archive/tags/Managed/default.aspx">Managed</category><category domain="http://blogs.msdn.com/ericeil/archive/tags/ThreadPool/default.aspx">ThreadPool</category></item><item><title>ThreadPool bug in CLR 2.0 SP1</title><link>http://blogs.msdn.com/ericeil/archive/2008/03/14/threadpool-bug-in-clr-2-0-sp1.aspx</link><pubDate>Fri, 14 Mar 2008 18:13:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:8204966</guid><dc:creator>ericeil</dc:creator><slash:comments>3</slash:comments><comments>http://blogs.msdn.com/ericeil/comments/8204966.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ericeil/commentrss.aspx?PostID=8204966</wfw:commentRss><wfw:comment>http://blogs.msdn.com/ericeil/rsscomments.aspx?PostID=8204966</wfw:comment><description>&lt;P&gt;Shortly after the release of CLR 2.0 SP1 (a.k.a. Orcas or .NET 3.5), several customers noticed some very odd behavior in the ThreadPool.&amp;nbsp; The ThreadPool is supposed to create threads as fast as possible, up to the current setting for MinThreads - but it turns out that if you queue workitems very quickly (like in a tight loop that just calls QueueUserWorkitem), the pool will expand very slowly - we'll only create one thread every half second (which is the normal behavior once we've reached MinThreads).&lt;/P&gt;
&lt;P&gt;Michael C. Kennedy's blog has &lt;A class="" href="http://www.michaelckennedy.net/blog/PermaLink,guid,708ee9c0-a1fd-46e5-8fa0-b1894ad6ce0f.aspx" mce_href="http://www.michaelckennedy.net/blog/PermaLink,guid,708ee9c0-a1fd-46e5-8fa0-b1894ad6ce0f.aspx"&gt;sample code and some perfmon graphs&lt;/A&gt; demonstrating this behavior.&amp;nbsp; He also posted &lt;A class="" href="http://www.michaelckennedy.net/blog/PermaLink,guid,f57cf127-7bf7-445e-bef4-14c3598f92eb.aspx" mce_href="http://www.michaelckennedy.net/blog/PermaLink,guid,f57cf127-7bf7-445e-bef4-14c3598f92eb.aspx"&gt;our suggested workaround&lt;/A&gt;, which unfortunately boils down to "don't queue the work items so quickly."&amp;nbsp; &lt;/P&gt;
&lt;P&gt;We plan to fix this in CLR 2.0 SP2 (i.e., .NET 3.5 SP1).&amp;nbsp; In the meantime, if you should encounter this, please try the workaround.&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=8204966" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ericeil/archive/tags/Managed/default.aspx">Managed</category><category domain="http://blogs.msdn.com/ericeil/archive/tags/ThreadPool/default.aspx">ThreadPool</category></item><item><title>When should you call RegisteredWaitHandle.Unregister?</title><link>http://blogs.msdn.com/ericeil/archive/2008/03/13/when-should-you-call-registeredwaithandle-unregister.aspx</link><pubDate>Fri, 14 Mar 2008 03:05:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:8190884</guid><dc:creator>ericeil</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.msdn.com/ericeil/comments/8190884.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ericeil/commentrss.aspx?PostID=8190884</wfw:commentRss><wfw:comment>http://blogs.msdn.com/ericeil/rsscomments.aspx?PostID=8190884</wfw:comment><description>&lt;P&gt;The managed ThreadPool provides a way to asynchronously wait for WaitHandles, via ThreadPool.RegisterWaitForSingleObject. This method returns a new instance of RegisteredWaitHandle, which has a single method: Unregister.&amp;nbsp; It's obvious from the name that this will cancel a pending wait operation.&amp;nbsp; What's not obvious is that even after a registered wait has completed, you should still call Unregister.&lt;/P&gt;
&lt;P&gt;The reason is that RegisteredWaitHandle holds a reference to your WaitHandle, and holds a refcount on the underlying SafeHandle.&amp;nbsp; If you don't call Unregister, then RegisteredWaitHandle will have to run its finalizer sometime later, before it can release the WaitHandle.&amp;nbsp; Calling Unregister prevents the finalizer from running (which helps perf), and may get your handle closed faster.&lt;/P&gt;
&lt;P&gt;RegisteredWaitHandle really, really should implement IDisposable.&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=8190884" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ericeil/archive/tags/Managed/default.aspx">Managed</category><category domain="http://blogs.msdn.com/ericeil/archive/tags/ThreadPool/default.aspx">ThreadPool</category></item></channel></rss>