<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>DirectXMath: F16C and FMA</title><link>http://blogs.msdn.com/b/chuckw/archive/2012/09/11/directxmath-f16c-and-fma.aspx</link><description>In our last installment in this series, we cover a few additional instructions that extend the AVX instruction set. These instructions make use of the VEX prefix and require the OS implement &amp;ldquo;OXSAVE&amp;rdquo;. Without this support, these instructions</description><dc:language>en-US</dc:language><generator>Telligent Evolution Platform Developer Build (Build: 5.6.50428.7875)</generator><item><title>re: DirectXMath: F16C and FMA</title><link>http://blogs.msdn.com/b/chuckw/archive/2012/09/11/directxmath-f16c-and-fma.aspx#10368801</link><pubDate>Thu, 15 Nov 2012 08:15:03 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:10368801</guid><dc:creator>Chuck Walbourn - MSFT</dc:creator><description>&lt;p&gt;Any use of intrinsics assumes the programmer knows what they are doing. It&amp;#39;s basically like inline assembly.&lt;/p&gt;
&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=10368801" width="1" height="1"&gt;</description></item><item><title>re: DirectXMath: F16C and FMA</title><link>http://blogs.msdn.com/b/chuckw/archive/2012/09/11/directxmath-f16c-and-fma.aspx#10368698</link><pubDate>Wed, 14 Nov 2012 23:26:18 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:10368698</guid><dc:creator>Alecazam</dc:creator><description>&lt;p&gt;What is the recommended safe method of working with intrin.h? &amp;nbsp;This can cause a crash if I don&amp;#39;t have the appropriate SSE processor for an intrinsic, but they&amp;#39;re all defined by that header (AVX, etc). &amp;nbsp;In VS2012, intrin.h is also included by &amp;lt;string&amp;gt; now since it wants the InterlockedIncrement/Decrement calls in there. &amp;nbsp; &amp;nbsp;I much preferred explicitly including the SSE headers, but I see no way to mask out those in intrin.h.&lt;/p&gt;
&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=10368698" width="1" height="1"&gt;</description></item><item><title>re: DirectXMath: F16C and FMA</title><link>http://blogs.msdn.com/b/chuckw/archive/2012/09/11/directxmath-f16c-and-fma.aspx#10350015</link><pubDate>Mon, 17 Sep 2012 05:49:05 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:10350015</guid><dc:creator>Chuck Walbourn - MSFT</dc:creator><description>&lt;p&gt;Many of them are Intel codenames. &amp;#39;pmmintrin.h&amp;#39; which houses SSE3 was originally codenamed &amp;quot;Prescott New Instructions (PNI)&amp;quot;, hence the &amp;#39;p&amp;#39;. As such the early ones tend to be a little unintuitive.&lt;/p&gt;
&lt;p&gt;All the more recent stuff (AVX, F16C, FMA3) is in &amp;#39;immintrin.h&amp;#39; which is Intel or (FMA4, XOP, etc.) in &amp;#39;ammintrin.h&amp;#39; which is AMD, although that&amp;#39;s of course not indicating which ones are vendor-specific and which have been adopted by the other vendor.&lt;/p&gt;
&lt;p&gt;In theory most intrinsics should all be in intrin.h.&lt;/p&gt;
&lt;table border="1"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;Header&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Description&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;intrin.h&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;General intrinsics, notably &lt;code&gt;__cpuid&lt;/code&gt; and various intrinsics forms of the CRT routines&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ammintrin.h&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;SSE5, FMA4, and XOP instrinsics&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;xmmintrin.h&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions"&gt;SSE&lt;/a&gt; intrinsics and the &lt;code&gt;__m128&lt;/code&gt; type (single-precision float SIMD)&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;emmintrin.h&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/SSE2"&gt;SSE2&lt;/a&gt; intrinsics and the &lt;code&gt;__m128i/__m128d&lt;/code&gt; types (double-precision float and integer SIMD)&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;pmmintrin.h&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/SSE3"&gt;SSE3&lt;/a&gt; intrinsics (horizontal adds and subtracts float/double operations, specific &amp;lsquo;dup&amp;rsquo; operations)&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tmmintrin.h&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/SSSE3"&gt;SSSE3&lt;/a&gt; intrinsics (more horizontal ops, integer abs,&amp;nbsp; &amp;lsquo;byte&amp;rsquo; shuffle to augment SSE2)&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;smmintrin.h&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/SSE4"&gt;SSE4&lt;/a&gt;.1 intrinsics (dot-product, rounding, augmented min/max support for SSE2)&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;nmmintrin.h&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;SSE4.2 intrinsics&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;immintrin.h&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/Advanced_Vector_Extensions"&gt;AVX&lt;/a&gt;, FMA3, F16C/CVT16, and AVX2 intrinsics&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;wmmintrin.h&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;AES intrinsics&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=10350015" width="1" height="1"&gt;</description></item><item><title>re: DirectXMath: F16C and FMA</title><link>http://blogs.msdn.com/b/chuckw/archive/2012/09/11/directxmath-f16c-and-fma.aspx#10349986</link><pubDate>Mon, 17 Sep 2012 02:38:52 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:10349986</guid><dc:creator>clayman</dc:creator><description>&lt;p&gt;Nice series !&lt;/p&gt;
&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=10349986" width="1" height="1"&gt;</description></item><item><title>re: DirectXMath: F16C and FMA</title><link>http://blogs.msdn.com/b/chuckw/archive/2012/09/11/directxmath-f16c-and-fma.aspx#10349716</link><pubDate>Sat, 15 Sep 2012 07:25:57 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:10349716</guid><dc:creator>barbie</dc:creator><description>&lt;p&gt;Thanks for this series Chuck. I was wondering, do you know if there is any method to the names of the intrinsic headers? They seem to be using a bit of a random distribution, making remembering them ... hard!&lt;/p&gt;
&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=10349716" width="1" height="1"&gt;</description></item></channel></rss>