<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Memory Model</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx</link><description>One of the suggestions for a blog entry was the managed memory model. This is timely, because we’ve just been revising our overall approach to this confusing topic. For the most part, I write about product decisions that have already been made and shipped.</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>RE: Memory Model</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#51446</link><pubDate>Sun, 18 May 2003 11:17:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:51446</guid><dc:creator>Jeroen Frijters</dc:creator><description>Chris you're my hero ;-) I believe that specifying a stronger memory model is absolutely the right thing to do. The disconnect between the memory model subtleties and the relatively highlevel worldview that the CLR otherwise provides is just too big, especially since this doesn't even result in better performance on today's (and probably tomorrow's) mainstream hardware.</description></item><item><title>RE: Memory Model</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#51447</link><pubDate>Sun, 18 May 2003 17:16:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:51447</guid><dc:creator>Ian Ringrose</dc:creator><description>There is one common case when I would like the full speed benefit of stores being reordered.  This is when I have code that needs to be fast and ALL objects the method use are only accessed from one thread.  E.g. think about inverting a matrix, I may call some other methods from within my method and then the CLR will not know that the array is only accessed by a single thread.

MAYBE what I am asking for is to be able to mark an object instance as being only used by a single thread, and have the CLR give me an error if the object is used by another when running in debug mode.

Remember for most people, a single thread on a single CPU is the normal case,  It is only a few low level types like us that every have to use more then one thread.

Is this a new version of the C/Fortran arrays rules,  e.g. due to the fact that array can not overlap in Fortran, some maths code is a LOT faster in Fortran.
</description></item><item><title>RE: Memory Model</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#51448</link><pubDate>Sun, 18 May 2003 18:11:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:51448</guid><dc:creator>Chris Brumme</dc:creator><description>Ian,

Presumably you are talking about store reordering performed by the native code generator.  The IL generator can follow its language rules in reordering and the CPUs that we care about won't reorder anyway.

We would have to enable store reordering in the native generator at the assembly, type or method level.  We don't have a good way to enable this per-instance.

I'm not opposed to our adding this relaxation in the future.  But I'm not convinced that the performance benefits will justify the risk and extra programming effort for anyone.  Finalization, subtyping, resurrection, statics and many other constructs often leave you exposed to multi-threaded access, even in the &amp;quot;single-threaded&amp;quot; server request case.

You would have to be absolutely sure that none of your instances could escape before you would be able to safely enable this behavior.</description></item><item><title>RE: Memory Model</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#51449</link><pubDate>Sun, 18 May 2003 20:00:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:51449</guid><dc:creator>Ian Griffiths</dc:creator><description>Did you really mean it when you said that hitting main memory is only 1/10th the speed of hitting disk?  I find this extremely hard to believe.  While I'm totally prepared to believe that the relative cost of a L0 cache hit, and a main memory hit is several orders of magnitude, I can't believe that there is only a single order of magnitude difference between main memory and disk.

Disk takes *milliseconds* to find and load data.  Main memory looks pretty slow from the inside of a 3GHz CPU, but it's not going to take milliseconds is it?

Or is there something I'm missing?</description></item><item><title>RE: Memory Model</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#51450</link><pubDate>Sun, 18 May 2003 20:56:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:51450</guid><dc:creator>Chris Brumme</dc:creator><description>Ian,

I tried a few different ways to write that sentence, but still failed to make my point.

Say that the ratio of an L0 hit to a main memory access on your computer is X.  Then the ratio of a main memory access to a disk fetch on your computer might be about 10X.

Developers work hard to avoid paging on virtual memory systems.  Increasingly, they should also work hard to avoid cache misses that cause them to access memory.  And if that memory is in a cache line that is held by a different CPU, the penalty is even worse.
</description></item><item><title>RE: Memory Model</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#51451</link><pubDate>Mon, 19 May 2003 21:34:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:51451</guid><dc:creator>Phil Hochstetler</dc:creator><description>Chris,

Great info on an issue that careful programmers need.  I might add that the issue on multiprocessors is more complicated that even your description may seem.  I was at Sequent Computer Systems in the 80's and 90's when we built large (30 proc) multiprocessors systems.  The bus protocol was a &amp;quot;copy back&amp;quot; cache protocol where a particular &amp;quot;cache line&amp;quot; of memory could be owned by a cache.  When an access occured to memory, a cpu would snoop the bus and abort the memory transfer from main memory and send a copy of its cache line instead (and mark it &amp;quot;write shared&amp;quot;) iff it's cache owned it (this could only occur if the cpu had written on it but had not yet written it back to memory).  

You can imagine all the cache to cache traffic that can be generated by having two heavy contested locks in the same cache line!  In any case, to get good performance, one had to allocate memory locks at least a cache line apart so they would appear in separate units of memory cache.  Issues like this are subtle details of the underlying hardware that make it hard to create portable code that performs well across multiple implementations.   Some of these hw (mis)features can be detected at runtime.

I'm sure lots of the folks who worked on NT HAL are aware of such issues.  I know Ken Reneris (kenr@microsoft.com once up a time) who was Mr Hal and I talked about such issues in the mid 90's.  Some of this goes back to the old &amp;quot;Code Locking&amp;quot; versus &amp;quot;Data Locking&amp;quot; discussion but that is another discussion entirely.

Keep up the good work!</description></item><item><title>RE: Memory Model</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#51452</link><pubDate>Tue, 20 May 2003 01:24:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:51452</guid><dc:creator>Dmitriy Zaslavskiy</dc:creator><description>Chris, what about Interlocked.XXX functions they come in different flavors (i.e. acquire/release semantics) can I understand your comments to mean that CLR will always use full barrier on all those functions ?

</description></item><item><title>RE: Memory Model</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#51453</link><pubDate>Tue, 20 May 2003 03:26:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:51453</guid><dc:creator>Chris Brumme</dc:creator><description>I'm not sure what the official word is here.  It seems to me that each of the various Interlocked operations is a combination of a load.acquire and a store.release.  The combination of these two operations would effectively form a full fence.  If the official word seems to be heading in a different direction, I'll post here to warn you.</description></item><item><title>RE: Memory Model</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#51454</link><pubDate>Thu, 29 May 2003 03:55:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:51454</guid><dc:creator>Tim Sweeney</dc:creator><description>.NET took a VERY conservative route with these sorts of decisions; the shared heap model is already showing signs of weakness: Getting multithreading right in C++ or C# in real-world programs is just as hard as getting static type safety right in assembly language programs of comparable line counts.

At some point, you realize that synchronization has become the limiting factor in development and debugging, and you move to a deterministic and statically safe model, accepting a 2X or so performance hit for synchronization.

The necessity of this move isn't obvious with today's single-CPU, single-threaded PC's.  But in a few years, when Intel is shipping quad-code, quad-hyperthreaded CPU's and you're trying to manage 16 simultaneous threads in a client application with complex thread communication dependencies, you' are just not going to do that kind of thing in C#.</description></item><item><title>RE: Memory Model</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#51455</link><pubDate>Thu, 29 May 2003 19:33:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:51455</guid><dc:creator>Chris Brumme</dc:creator><description>Tim,

It stands to reason that affinitizing threads and even processes (like web gardens) to specific CPUs will be important in getting good scaling in the future.  SQL Server uses fibers to avoid a trip through the kernel on a context switch, and these fibers are effectively affinitized to CPUs.  But it's still the case that their warmed cache will need refilling when they non-preemptively context switch to a completely different request.  A pipelined server can address this for constrained applications, but we still have a lot to learn on how to efficiently execute general-purpose application code on boxes with 128+ CPUs.

Having said that, we're seeing scaling numbers in the low 20's for a 32-way box, running managed C# code on a simulated business work load.  This includes some XML transforms, so you can be sure the GC is kicking in.

I think that C# is an excellent language for writing most apps, including highly concurrent ones.  And I think we have some opportunities with a managed execution environment like the CLR to achieve high degrees of scaling on behalf of naively written applications.  But we've got a very long way to go.</description></item><item><title>RE: Memory Model</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#51456</link><pubDate>Mon, 23 Jun 2003 20:01:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:51456</guid><dc:creator>Chris Brumme</dc:creator><description>Since I wrote this, some folks at Intel have shown me a piece of code that reveals out-of-order stores on IA64.  (In fact, the piece of code was a spin lock from some code at Microsoft).  The out-of-order execution was very evident on a 32-way box from one manufacturer -- though it never occurred on a 32-way box from a different manufacturer.  Presumably the difference is in the chip set used for memory management in the caching hierarchy.  Thank goodness that managed developers don't have to be aware of this sort of thing!</description></item><item><title>RE: Memory Model</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#51457</link><pubDate>Wed, 25 Jun 2003 03:41:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:51457</guid><dc:creator>Hans Boehm</dc:creator><description>I have two concerns with Chris Brumme's suggestion to program to a
stronger memory model than is required by the current language standard:

1) I think that when you look closer, the &amp;quot;simplified&amp;quot; stronger memory model may actually complicate matters.

As Chris stated, if two threads may concurrently access a variable,
and one of them writes it, in almost all cases the accesses should be
protected by a lock.  If you follow this simple rule the detailed
memory model doesn't matter.

Any violations of the above rule are inherently dangerous.  If they are
necessary, I think any reader of the source code deserves at least a hint
that something subtle is going on.  Purely from a legibility perspective,
I don't think that a volatile declaration is unreasonable if a field
is used for thread synchronization.

Just in case you weren't already convinced of the danger of shared
variables without locking, I'll point out some further subtleties with
Chris' double-checked locking example, even with the stronger memory
model.  Consider the slightly completed code:

if (a == null)
         {
           lock(obj)
           {
             if (a == null) a = new A();
           }
         }
x = a.field;
 ...

(Clearly we're performing the initialization because we're about to
access a field.  Thus I added the field reference.)

With absolutely no ordering requirement on loads, thread 1 may first
load a.field, and then perform the load of a to do the a == null check.
This will fail if a second thread acquires the lock and initializes a
in between the two operations.

Admittedly, it seems very counterintuitive to load a.field before loading
a.  But some processors may in fact do such &amp;quot;data dependent&amp;quot; reordering.
(Itanium doesn't.  Alpha may. Compilers might, if they have a profile
or some other ways to guess the expected value of a pointer.)

Based on an email discussion with Chris, and on earlier parts of the article,
I believe Chris intends to allow reordering only if there are no
data dependencies.  But now consider the slightly modified example:

if (!a_is_initialized)
         {
           lock(obj)
           {
             if (!a_is_initialized)
	       { a = new A(); a_is_initialized = true; }
           }
         }
x = a.field;
...

This is unambiguously incorrect with Chris' proposal, since the reads of
a_is_initialized may be reordered with the read of a.field, and again
the actual initialization may occur between the two in another thread.
Furthermore, no modern processor of which I'm aware enforces ordering
due to an intervening conditional branch.  If loads may be reordered
at all, they can generally be reordered even if there is an intervening
branch as in this case.  (Modern processors generally predict the
outcome of the branch, and recover if they guessed wrong.  If they guessed
right execution proceeds basically as if the branch weren't there.)
Thus this is likely to fail on many kinds of multiprocessors, Itanium
among them.

Is it really easier for a programmer to understand the distinction between
these two examples than to declare a or a_is_initialized volatile
because they are used for thread communication?

Consider also that it's quite tricky to define &amp;quot;data dependent&amp;quot;, as you
would have to in Chris' proposal.  Are the loads in the first example still
data dependent if the compiler fails to combine the two references to &amp;quot;a&amp;quot;
into a single load instruction, e.g. if you turn off optimization?

2) Portable Performance.

Many newer processor designs have well-defined, but weaker, memory models, which would make it appreciably more expensive to implement the memory model that Chris proposes.  It's hard to quantify the cost, but in many
cases a slowdown of at least a factor of two seems plausible.  (This is likely to occur for two reasons:  Explicit memory barriers are expensive on some processors.  And such a memory model is likely to seriously inhibit compiler instruction scheduling, which is again important on some processors and not others.)

 These slowdowns are likely to impact even code that accesses no shared data, simply because the compiler will often have a hard time proving that.  As far as I can tell, among the major processor families only X86 variants may get away with a relatively small penalty.

Although it appears that such a slowdown is not necessary for current
X86 processors, even there the official spec seems to state differently.
(See IA-32 Intel Architecture Software Developer's Manual, Volume 3,
section 7.2.2: &amp;quot;Reads can be carried out speculatively and in any order&amp;quot;,
no mention of data dependence, i.e. no guarantee that the original example
works without barriers.)

At best this is likely to make X86 the only really viable execution platform
for C# code.  It is unclear in my mind where this leaves efforts such as Mono.

I agree that the apparently stronger memory ordering properties of X86
make it harder to test code against robustness with respect to a weaker
memory model.  But I believe that can be addressed with a combination
of static tools (notably for race detection) and a debugging interpreter that
tries to reorder aggressively.  There is no real way to avoid the need
for such tools, since even X86 allows some reordering of loads and stores,
and not every runtime implementation will do the same reordering in the
compiler, no matter how strong the ordering properties of the hardware.
</description></item><item><title>RE: Memory Model</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#51458</link><pubDate>Mon, 30 Jun 2003 15:36:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:51458</guid><dc:creator>Jon Skeet</dc:creator><description>I'd like to take some issue with Chris's presentation of the double-check locking, followed by:

[quote]
I hope that this example has convinced you that you don’t want to try writing reliable code against the documented CLI model.
[/quote]

It doesn't convince me of that at all - it convinces me that I shouldn't try to use any &amp;quot;tricks&amp;quot; to attempt to get lock-free performance unless I'm absolutely sure that I need it, and in that case I need to think very, very hard about it.

It doesn't take much to convince me of that though - I've thought that way for a long time.

I believe that most developers aren't going to suffer significant performance loss due to synchronization operations themselves, compared with how much time it takes to (say) query a database, load a file, write some data to the network, etc. Even fewer developers will see much performance loss due to not being able to access single variables reliably - most of the time multi-threaded applications need effectively &amp;quot;transactional&amp;quot; behaviour - see many changes or none of them, and that's exactly what synchronization gives, expensive as it is. I can't see how that would be helped significantly by a stronger memory model.

I'm not saying the current ECMA model is perfect by any means, but I don't agree that it's unrealistic to write to it - not for most developers, writing applications rather than device drivers etc, where every tiny, tiny bit of performance can make a significant difference.

I realise I'm amongst rather more experienced developers in this discussion, however, so I'm quite prepared to be swayed by argument :)

Jon Skeet
</description></item><item><title>RE: Memory Model</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#51459</link><pubDate>Sat, 19 Jul 2003 19:30:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:51459</guid><dc:creator>Chris Brumme</dc:creator><description>Hans &amp;amp; I have been over this a few times in private emails.  We're unlikely to persuade each other.  When I look at the typical managed developer, and particularly a typical JScript or VB.NET developer, I cannot envision asking them to sprinkle 'volatile' through their code at the appropriate spots.  This would presumably include after constructing objects but before publishing them.  For general-purpose components that might be used in single-threaded applications or multi-threaded applications, this would also include any updates to potentially shared objects in the GC heap that aren't protected by locks on the read *and* write paths.

If it were necessary to sprinkle 'volatile' for correct execution, the default behavior must be that the system (e.g. compiler, CLR or CPU) does that sprinkling.  Particularly if the developer cannot be expected or trusted to adequately test on what today is exotic hardware.

It's reasonable for developers to opt-in to the more stringent model that Hans prefers.  Such developers are trading off productivity against performance (on some architectures) and reach to those architectures.

This remains a controversial subject.</description></item><item><title>RE: Memory Model</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#51460</link><pubDate>Tue, 22 Jul 2003 18:42:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:51460</guid><dc:creator>Arch Robison</dc:creator><description>I believe the problem is not the memory model, but the way it is explained.  The explanations usually concentrate on the detailed reordering rules, and don't give the programmer a simple rule of thumb on where to put &amp;quot;volatile&amp;quot;.

My article &amp;quot;Memory Consistency &amp;amp; .NET&amp;quot; in Dr. Dobb's Journal, Apr2003, Vol. 28 Issue 4, p46-50 gives a simple rule of thumb.   When using shared memory, you are effectively sending a message from one thread to another.  The last write of the message by the sender, and the first read of the message by the receiver, must both be volatile.   That simple rule covers most cases in my experience.

The double-check idiom is simple to write in CLI - just declare &amp;quot;a&amp;quot; volatile in the example and it is fixed.  This follows immediately from the rule of thumb: the first thread through the double-check region is the sender; the other threads are the receivers.

A typical JScript or VB.NET developer should use locks, and not conjure their own synchronization primitives.  In that case,  the locks in CLI already imply the necessary fences.


</description></item><item><title>RE: Memory Model</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#51461</link><pubDate>Sun, 14 Sep 2003 10:53:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:51461</guid><dc:creator>RichardH</dc:creator><description>Hi All,
I am a VB6 application developer and beginner to C#/.NET, what I am wondering about is why we need the CLR in the first place, is it like 90% for garbage collection only? I mean is, is the CLR mainly there to give us relief from allocation &amp;amp; deallocation of heap memory, and preventing from 'illegal memory' access like reading outside array bounds etc. </description></item><item><title>Double Check Locking In The News Again</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#131215</link><pubDate>Thu, 13 May 2004 19:09:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:131215</guid><dc:creator>K. Scott Allen's Blog</dc:creator><description /></item><item><title>re: volatile and MemoryBarrier()... </title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#133016</link><pubDate>Mon, 17 May 2004 08:33:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:133016</guid><dc:creator>Brad Abrams </dc:creator><description /></item><item><title>re: High-performance multithreading is very hard</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#143888</link><pubDate>Fri, 28 May 2004 19:54:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:143888</guid><dc:creator>The Old New Thing</dc:creator><description /></item><item><title>re: Memory Model</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#150042</link><pubDate>Mon, 07 Jun 2004 16:43:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:150042</guid><dc:creator>William Stacey</dc:creator><description>Related to memory barriers and concurrency, can anyone say if I implemented this non-blocking queue correctly?&lt;br&gt;&lt;br&gt;/// &amp;lt;summary&amp;gt;&lt;br&gt;/// Summary description for NonBlockingQueue.&lt;br&gt;/// Modeled after: &lt;a target="_new" href="http://www.cs.rochester.edu/u/scott/synchronization/pseudocode/queues.html"&gt;http://www.cs.rochester.edu/u/scott/synchronization/pseudocode/queues.html&lt;/a&gt;&lt;br&gt;/// &amp;lt;/summary&amp;gt;&lt;br&gt;public class NonBlockingQueue&lt;br&gt;{&lt;br&gt;	Queue Q;&lt;br&gt;	int count;&lt;br&gt;&lt;br&gt;	public int Count&lt;br&gt;	{&lt;br&gt;		get { return count; }&lt;br&gt;	}&lt;br&gt;&lt;br&gt;	internal class Node&lt;br&gt;	{&lt;br&gt;		public object value;		// User object.&lt;br&gt;		public object next;			// Next Node.&lt;br&gt;&lt;br&gt;		public Node(object value, Node next)&lt;br&gt;		{&lt;br&gt;			this.value = value;&lt;br&gt;			this.next = next;&lt;br&gt;		}&lt;br&gt;	}&lt;br&gt;&lt;br&gt;	internal class Queue&lt;br&gt;	{&lt;br&gt;		public object Head = null;	// Dequeue from head.&lt;br&gt;		public object Tail = null;  // Enqueue to tail.&lt;br&gt;	}&lt;br&gt;&lt;br&gt;	public NonBlockingQueue()&lt;br&gt;	{&lt;br&gt;		Q = new Queue();&lt;br&gt;		Node node = new Node(null, null);&lt;br&gt;		Q.Head = node;&lt;br&gt;		Q.Tail = node;&lt;br&gt;	}&lt;br&gt;&lt;br&gt;	public void Enqueue(object value)&lt;br&gt;	{&lt;br&gt;		object node = new Node(value, null);&lt;br&gt;		object tail;&lt;br&gt;		object next;&lt;br&gt;		while(true)&lt;br&gt;		{&lt;br&gt;			tail = Q.Tail;&lt;br&gt;			next = ((Node)tail).next;&lt;br&gt;			if ( tail == Q.Tail )	// Does tail equal Queue Tail.&lt;br&gt;			{&lt;br&gt;				if ( next == null )&lt;br&gt;				{&lt;br&gt;					// Try to link node at the end of the linked list&lt;br&gt;					if ( next == Interlocked.CompareExchange(ref ((Node)tail).next, node, next) )&lt;br&gt;						break;		// enqueue is done; exit.&lt;br&gt;				}&lt;br&gt;				else // Tail was not pointing to null; Swing tail to next node.&lt;br&gt;					Interlocked.CompareExchange(ref Q.Tail, next, tail);&lt;br&gt;			}&lt;br&gt;		} // End loop&lt;br&gt;		// Enqueue is done.  Try to swing Tail to the inserted node.&lt;br&gt;		Interlocked.CompareExchange(ref Q.Tail, node, tail);&lt;br&gt;		Interlocked.Increment(ref count);&lt;br&gt;	}&lt;br&gt;&lt;br&gt;	public object Dequeue()&lt;br&gt;	{&lt;br&gt;		object value;&lt;br&gt;		object head;&lt;br&gt;		object tail;&lt;br&gt;		object next;&lt;br&gt;		while(true)&lt;br&gt;		{&lt;br&gt;			head = Q.Head;&lt;br&gt;			tail = Q.Tail;&lt;br&gt;			next = ((Node)head).next;&lt;br&gt;			if ( head == Q.Head )&lt;br&gt;			{&lt;br&gt;				if ( head == tail )		// Is queue empty or Tail falling behind?&lt;br&gt;				{&lt;br&gt;					if ( next == null )	// Is queue empty?&lt;br&gt;						return null;	// Queue is empty.&lt;br&gt;					Interlocked.CompareExchange(ref Q.Tail, next, tail);&lt;br&gt;				}&lt;br&gt;				else // No need to deal with Tail.&lt;br&gt;				{&lt;br&gt;					// Read user value before exchange.&lt;br&gt;					// Otherwise, another dequeue might free the next node.&lt;br&gt;					value = ((Node)next).value;&lt;br&gt;					if (Interlocked.CompareExchange(ref Q.Head, next, head) == head)&lt;br&gt;					{&lt;br&gt;						Interlocked.Decrement(ref count);&lt;br&gt;						return value;&lt;br&gt;					}&lt;br&gt;				}&lt;br&gt;			}&lt;br&gt;		} // End loop.&lt;br&gt;	} // End Dequeue&lt;br&gt;} // End Class&lt;br&gt;-- William</description></item><item><title>re: Memory Model</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#152655</link><pubDate>Thu, 10 Jun 2004 17:59:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:152655</guid><dc:creator>William Stacey</dc:creator><description>Does this spin version work?  Why or why not?  Cheers!&lt;br&gt;&lt;br&gt;public sealed class Singleton&lt;br&gt;{&lt;br&gt;	private static int spinLock = 0;	// lock not owned.&lt;br&gt;	private static Singleton value = null;&lt;br&gt;	private Singleton() {}&lt;br&gt;&lt;br&gt;	public static Singleton Value()&lt;br&gt;	{&lt;br&gt;		// Get spin lock.&lt;br&gt;		while ( Interlocked.Exchange(ref spinLock, 1) != 0 )&lt;br&gt;			Thread.Sleep(0);&lt;br&gt;&lt;br&gt;		// Do we have any mbarrier issues?&lt;br&gt;		if ( value == null )			&lt;br&gt;			value = new Singleton();	&lt;br&gt;&lt;br&gt;		Interlocked.Exchange(ref spinLock, 0);&lt;br&gt;		return value;&lt;br&gt;	}&lt;br&gt;}&lt;br&gt;&lt;br&gt;This would help answer a few related questions for me on how Interlocked works with mem barriers and cache, etc.  TIA  -- William</description></item><item><title>re: Memory Model</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#170201</link><pubDate>Wed, 30 Jun 2004 23:43:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:170201</guid><dc:creator>Chris Brumme</dc:creator><description>Doing busy waiting on an Interlocked operation can be a problem.  If multiple threads are piled up on the spin lock, they will cause a lot of bus traffic as they fight for ownership of the cache line containing the spin lock.  In your case, the construction of the singleton object is fairly fast, so perhaps this isn't an issue.&lt;br&gt;&lt;br&gt;In this particular case, there is no strong reason to delay construction of the Singleton beyond first use of the class.  So you might prefer to create the Singleton inside the class constructor -- in which case the CLR is responsible for synchronization.&lt;br&gt;&lt;br&gt;Theoretically you should yield the CPU inside your busy loop in case you are executing on a hyper-threaded CPU.  Thread.SpinWait can be used to ensure this happens.  If you call Interlocked.Exchange inside the loop as you currently do, the SpinWait call may not be necessary.  I would have to check the Intel manuals to be sure.  But if you perform busy waiting without stealing the bus, as I suggested in the first paragraph, then SpinWait will become important.&lt;br&gt;&lt;br&gt;Since you have a trivial constructor for Singleton, no memory barrier is required before the publishing write to 'value' inside the lock.  But if you had a non-trivial constructor, IA64 would require a barrier in unmanaged code.  That's because assignments you performed inside the constructor might be deferred until after your publishing write.  The memory barrier would prevent this.&lt;br&gt;&lt;br&gt;In managed code the IA64 memory barrier issue is more subtle.  I hope that in Whidbey the memory barrier isn't required, even for IA64, for all the reasons I discussed in this blog.&lt;br&gt;&lt;br&gt;You are asking whether your spinlock sample works or not.  I tried to answer that above.  Beyond the issue of whether it actually works is the issue of whether it's a good way to solve this particular problem.  It's not how I would have coded it.</description></item><item><title>re: Concurrency, Part 13 - Concurrency and the CLR</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#402317</link><pubDate>Fri, 25 Mar 2005 22:33:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:402317</guid><dc:creator>Larry Osterman's WebLog</dc:creator><description /></item><item><title>mcobrien.org  &amp;raquo; Blog Archive   &amp;raquo; PDC05: Double-check locking fixed in 2.0</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#4507211</link><pubDate>Wed, 22 Aug 2007 12:20:55 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:4507211</guid><dc:creator>mcobrien.org  » Blog Archive   » PDC05: Double-check locking fixed in 2.0</dc:creator><description>&lt;p&gt;PingBack from &lt;a rel="nofollow" target="_new" href="http://mcobrien.org/wordpress/2005/09/pdc05_doubleche"&gt;http://mcobrien.org/wordpress/2005/09/pdc05_doubleche&lt;/a&gt;&lt;/p&gt;
</description></item><item><title>Which managed memory model?</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#6536096</link><pubDate>Tue, 27 Nov 2007 01:26:46 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:6536096</guid><dc:creator>Eric Eilebrecht's blog</dc:creator><description>&lt;p&gt;In this article , Vance Morrison describes some of the issues involved in writing managed multithreaded&lt;/p&gt;
</description></item><item><title>Which managed memory model?</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#6536189</link><pubDate>Tue, 27 Nov 2007 01:33:47 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:6536189</guid><dc:creator>Noticias externas</dc:creator><description>&lt;p&gt;In this article , Vance Morrison describes some of the issues involved in writing managed multithreaded&lt;/p&gt;
</description></item><item><title>Desktop Computers &amp;raquo; cbrumme&amp;#8217;s WebLog : Memory Model</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#8325977</link><pubDate>Wed, 19 Mar 2008 20:43:25 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:8325977</guid><dc:creator>Desktop Computers » cbrumme’s WebLog : Memory Model</dc:creator><description>&lt;p&gt;PingBack from &lt;a rel="nofollow" target="_new" href="http://desktopcomputerreviewsblog.info/cbrummes-weblog-memory-model/"&gt;http://desktopcomputerreviewsblog.info/cbrummes-weblog-memory-model/&lt;/a&gt;&lt;/p&gt;
</description></item><item><title>Memory Model &amp;laquo; Jimmy69&amp;#8217;s Weblog</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#8332480</link><pubDate>Sun, 23 Mar 2008 21:10:12 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:8332480</guid><dc:creator>Memory Model « Jimmy69’s Weblog</dc:creator><description>&lt;p&gt;PingBack from &lt;a rel="nofollow" target="_new" href="http://jimmy69.wordpress.com/2008/03/23/memory-model/"&gt;http://jimmy69.wordpress.com/2008/03/23/memory-model/&lt;/a&gt;&lt;/p&gt;
</description></item><item><title>The Interlocked on the Edge of Forever - Page 2 | keyongtech</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#9362238</link><pubDate>Thu, 22 Jan 2009 06:44:40 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9362238</guid><dc:creator>The Interlocked on the Edge of Forever - Page 2 | keyongtech</dc:creator><description>&lt;p&gt;PingBack from &lt;a rel="nofollow" target="_new" href="http://www.keyongtech.com/432159-the-interlocked-on-the-edge/2"&gt;http://www.keyongtech.com/432159-the-interlocked-on-the-edge/2&lt;/a&gt;&lt;/p&gt;
</description></item><item><title> cbrumme s WebLog Memory Model | Wood TV Stand</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#9672073</link><pubDate>Mon, 01 Jun 2009 01:46:57 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9672073</guid><dc:creator> cbrumme s WebLog Memory Model | Wood TV Stand</dc:creator><description>&lt;p&gt;PingBack from &lt;a rel="nofollow" target="_new" href="http://woodtvstand.info/story.php?id=9156"&gt;http://woodtvstand.info/story.php?id=9156&lt;/a&gt;&lt;/p&gt;
</description></item><item><title> cbrumme s WebLog Memory Model |  Portable Greenhouse</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#9681500</link><pubDate>Mon, 01 Jun 2009 23:01:43 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9681500</guid><dc:creator> cbrumme s WebLog Memory Model |  Portable Greenhouse</dc:creator><description>&lt;p&gt;PingBack from &lt;a rel="nofollow" target="_new" href="http://portablegreenhousesite.info/story.php?id=9706"&gt;http://portablegreenhousesite.info/story.php?id=9706&lt;/a&gt;&lt;/p&gt;
</description></item><item><title> cbrumme s WebLog Memory Model | Uniform Stores</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#9681701</link><pubDate>Mon, 01 Jun 2009 23:11:53 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9681701</guid><dc:creator> cbrumme s WebLog Memory Model | Uniform Stores</dc:creator><description>&lt;p&gt;PingBack from &lt;a rel="nofollow" target="_new" href="http://uniformstores.info/story.php?id=14136"&gt;http://uniformstores.info/story.php?id=14136&lt;/a&gt;&lt;/p&gt;
</description></item><item><title> cbrumme s WebLog Memory Model | Insomnia Cure</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#9709756</link><pubDate>Tue, 09 Jun 2009 01:00:41 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9709756</guid><dc:creator> cbrumme s WebLog Memory Model | Insomnia Cure</dc:creator><description>&lt;p&gt;PingBack from &lt;a rel="nofollow" target="_new" href="http://insomniacuresite.info/story.php?id=8705"&gt;http://insomniacuresite.info/story.php?id=8705&lt;/a&gt;&lt;/p&gt;
</description></item><item><title> cbrumme s WebLog Memory Model | Cellulite Creams</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#9711763</link><pubDate>Tue, 09 Jun 2009 05:41:56 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9711763</guid><dc:creator> cbrumme s WebLog Memory Model | Cellulite Creams</dc:creator><description>&lt;p&gt;PingBack from &lt;a rel="nofollow" target="_new" href="http://cellulitecreamsite.info/story.php?id=7827"&gt;http://cellulitecreamsite.info/story.php?id=7827&lt;/a&gt;&lt;/p&gt;
</description></item><item><title> cbrumme s WebLog Memory Model | fix my credit</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#9763822</link><pubDate>Wed, 17 Jun 2009 04:32:15 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9763822</guid><dc:creator> cbrumme s WebLog Memory Model | fix my credit</dc:creator><description>&lt;p&gt;PingBack from &lt;a rel="nofollow" target="_new" href="http://fixmycrediteasily.info/story.php?id=14117"&gt;http://fixmycrediteasily.info/story.php?id=14117&lt;/a&gt;&lt;/p&gt;
</description></item><item><title> cbrumme s WebLog Memory Model | low cost car insurance</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#9766307</link><pubDate>Wed, 17 Jun 2009 08:00:18 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9766307</guid><dc:creator> cbrumme s WebLog Memory Model | low cost car insurance</dc:creator><description>&lt;p&gt;PingBack from &lt;a rel="nofollow" target="_new" href="http://lowcostcarinsurances.info/story.php?id=5116"&gt;http://lowcostcarinsurances.info/story.php?id=5116&lt;/a&gt;&lt;/p&gt;
</description></item><item><title> cbrumme s WebLog Memory Model | garden decor</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#9781686</link><pubDate>Fri, 19 Jun 2009 10:21:10 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9781686</guid><dc:creator> cbrumme s WebLog Memory Model | garden decor</dc:creator><description>&lt;p&gt;PingBack from &lt;a rel="nofollow" target="_new" href="http://gardendecordesign.info/story.php?id=4908"&gt;http://gardendecordesign.info/story.php?id=4908&lt;/a&gt;&lt;/p&gt;
</description></item><item><title> cbrumme s WebLog Memory Model | porch swing</title><link>http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx#9781864</link><pubDate>Fri, 19 Jun 2009 10:29:53 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9781864</guid><dc:creator> cbrumme s WebLog Memory Model | porch swing</dc:creator><description>&lt;p&gt;PingBack from &lt;a rel="nofollow" target="_new" href="http://fancyporchswing.info/story.php?id=2804"&gt;http://fancyporchswing.info/story.php?id=2804&lt;/a&gt;&lt;/p&gt;
</description></item></channel></rss>