<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>volatile, acquire/release, memory fences, and VC2005</title><link>http://blogs.msdn.com/kangsu/archive/2007/07/16/volatile-acquire-release-memory-fences-and-vc2005.aspx</link><description>One of the more common questions I get about VC2005 code generation relates to the code generation of volatile on x86/x64. If we take a look at MSDN we see that it defines the semantics for volatile in VC2005 as : o A write to a volatile object (volatile</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>re: volatile, acquire/release, memory fences, and VC2005</title><link>http://blogs.msdn.com/kangsu/archive/2007/07/16/volatile-acquire-release-memory-fences-and-vc2005.aspx#3992551</link><pubDate>Sun, 22 Jul 2007 00:03:06 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:3992551</guid><dc:creator>Peter Ritchie</dc:creator><description>&lt;p&gt;&lt;a rel="nofollow" target="_new" href="http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=288218"&gt;http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=288218&lt;/a&gt; seems to suggest that ld.acq/st.rel aren't always generated.&lt;/p&gt;
</description></item><item><title>re: volatile, acquire/release, memory fences, and VC2005</title><link>http://blogs.msdn.com/kangsu/archive/2007/07/16/volatile-acquire-release-memory-fences-and-vc2005.aspx#4030455</link><pubDate>Tue, 24 Jul 2007 19:06:20 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:4030455</guid><dc:creator>lbargaoanu</dc:creator><description>&lt;p&gt;Typo :)&lt;/p&gt;
&lt;p&gt;&amp;quot;must remain about Store V&amp;quot; is &lt;/p&gt;
&lt;p&gt;&amp;quot;must remain above Store V&amp;quot;&lt;/p&gt;
</description></item><item><title>re: volatile, acquire/release, memory fences, and VC2005</title><link>http://blogs.msdn.com/kangsu/archive/2007/07/16/volatile-acquire-release-memory-fences-and-vc2005.aspx#4195554</link><pubDate>Fri, 03 Aug 2007 02:02:46 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:4195554</guid><dc:creator>kanggatl</dc:creator><description>&lt;p&gt;Peter, you're right. &amp;nbsp;On Itanium there is a bug...&lt;/p&gt;
</description></item><item><title>re: volatile, acquire/release, memory fences, and VC2005</title><link>http://blogs.msdn.com/kangsu/archive/2007/07/16/volatile-acquire-release-memory-fences-and-vc2005.aspx#4295036</link><pubDate>Wed, 08 Aug 2007 19:35:28 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:4295036</guid><dc:creator>JAG</dc:creator><description>&lt;p&gt;From the original post &amp;quot;But Loads with respect to other Loads will remain ordered.&amp;quot; - I don't see how to resolve this assertion with Intel 3a 7.2.2 &amp;quot;Memory Ordering in P6 and More Recent Processor Families&amp;quot; which states &amp;quot;1. Reads can be carried out speculatively and in any order.&amp;quot;&lt;/p&gt;</description></item><item><title>re: volatile, acquire/release, memory fences, and VC2005</title><link>http://blogs.msdn.com/kangsu/archive/2007/07/16/volatile-acquire-release-memory-fences-and-vc2005.aspx#4312043</link><pubDate>Thu, 09 Aug 2007 19:11:54 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:4312043</guid><dc:creator>kanggatl</dc:creator><description>&lt;p&gt;JAG, that's a good question. &amp;nbsp;You can think of it this way: if you were following the traffic of memory reads you may see the read (for example for validation of the hardware), but read reordering should not be made apparent to the programmer. &lt;/p&gt;
</description></item><item><title>re: volatile, acquire/release, memory fences, and VC2005</title><link>http://blogs.msdn.com/kangsu/archive/2007/07/16/volatile-acquire-release-memory-fences-and-vc2005.aspx#4313457</link><pubDate>Thu, 09 Aug 2007 23:32:01 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:4313457</guid><dc:creator>JAG</dc:creator><description>&lt;P&gt;Mmm, this seems to imply that Intel’s Rule 1 (cited above) is referring to load operations as they might appear on the bus, and not to actual instructions (and their dependents). &amp;nbsp;That these can be done out-of-order, but always appear to the program in-order, seems to imply an implicit attachment to any other program/CPU which might be storing. &amp;nbsp;Possibly this is attached to the cache coherence mechanism? &amp;nbsp;That is, assuming “speculative” refers to any instruction beyond the one at “current EIP” then it would mean that none of these instructions are committed (retired) until they are the target of the “current EIP” AND no intervening cache invalidate has arrived. &amp;nbsp;Presumably if a invalidate arrives then the speculated instruction (and its dependents) would be discarded/re-executed. &amp;nbsp;As in (putting aside compiler actions for the moment):&lt;/P&gt;
&lt;P&gt;CPU0 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;CPU1&lt;/P&gt;
&lt;P&gt;a=0; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;... &lt;/P&gt;
&lt;P&gt;b=0; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;...&lt;/P&gt;
&lt;P&gt;... &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ...&lt;/P&gt;
&lt;P&gt;a=1; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;while (b!=2);&lt;/P&gt;
&lt;P&gt;b=2; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;c=a;&lt;/P&gt;
&lt;P&gt;... &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; e=c+1;&lt;/P&gt;
&lt;P&gt;CPU1 could speculatively process c=a, and possibly e-c+1 (etc), but could not retire c=a (etc) until it became the target of “current EIP”. &amp;nbsp;Assume this speculation observes a==0. &amp;nbsp;At a later time, CPU1 observes b==2. &amp;nbsp;Because stores are ordered, this will always have been preceded by CPU0 making an exclusive fetch for the line containing a, which will be observed as an “invalidate” by CPU1, which would discard the observation a==0. &amp;nbsp;&lt;/P&gt;
&lt;P&gt;Is this the kind of play that allows loads to be unordered and ordered at the same time??&lt;/P&gt;</description></item><item><title>re: volatile, acquire/release, memory fences, and VC2005</title><link>http://blogs.msdn.com/kangsu/archive/2007/07/16/volatile-acquire-release-memory-fences-and-vc2005.aspx#4416557</link><pubDate>Thu, 16 Aug 2007 18:02:44 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:4416557</guid><dc:creator>kanggatl</dc:creator><description>&lt;p&gt;Jag, it looks something like that. &amp;nbsp;There is something by Frey that describes the general algorithm that is often cited. &amp;nbsp;For specific details though, you'd need to contact each hardware vendor.&lt;/p&gt;
</description></item><item><title>re: volatile, acquire/release, memory fences, and VC2005</title><link>http://blogs.msdn.com/kangsu/archive/2007/07/16/volatile-acquire-release-memory-fences-and-vc2005.aspx#4549305</link><pubDate>Sat, 25 Aug 2007 02:09:36 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:4549305</guid><dc:creator>JAG</dc:creator><description>&lt;p&gt;Okay, thanks. &amp;nbsp;I found some words from Andy Glew on the topic, reinforcing your comments.&lt;/p&gt;
&lt;p&gt;&lt;a rel="nofollow" target="_new" href="http://groups.google.com/group/comp.arch/browse_frm/thread/64bc52823607b2c0/e120cb283153d58a?lnk=st&amp;amp;q=load+order+x86+fence+glew&amp;amp;rnum=3&amp;amp;hl=en#e120cb283153d58a"&gt;http://groups.google.com/group/comp.arch/browse_frm/thread/64bc52823607b2c0/e120cb283153d58a?lnk=st&amp;amp;q=load+order+x86+fence+glew&amp;amp;rnum=3&amp;amp;hl=en#e120cb283153d58a&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I've another related question, this one about the xxxBarrier intrinsics (as in 7.1). &amp;nbsp;The way I read it (which could be whacky of course), given &lt;/p&gt;
&lt;p&gt; &amp;nbsp; load a&lt;/p&gt;
&lt;p&gt; &amp;nbsp; __ReadBarrier&lt;/p&gt;
&lt;p&gt; &amp;nbsp; load b&lt;/p&gt;
&lt;p&gt;the &amp;quot;load a&amp;quot; is not permitted to be moved below the barrier, but there are no constraints on moving the &amp;quot;load b&amp;quot; above the barrier. &amp;nbsp;Similarly, given&lt;/p&gt;
&lt;p&gt; &amp;nbsp; store a&lt;/p&gt;
&lt;p&gt; &amp;nbsp; __WriteBarrier&lt;/p&gt;
&lt;p&gt; &amp;nbsp; store b&lt;/p&gt;
&lt;p&gt;the &amp;quot;store a&amp;quot; may not move below the barrier, but the &amp;quot;store b&amp;quot; is not constrained.&lt;/p&gt;
&lt;p&gt;This seems odd to me (eg, it doesn't match acquire/release semantics), so I'm guessing I'm misreading somewhere...&lt;/p&gt;
&lt;p&gt;TIA&lt;/p&gt;
&lt;p&gt;JAG&lt;/p&gt;</description></item></channel></rss>