AVX2 Support in Visual Studio C++ Compiler

AVX2 Support in Visual Studio C++ Compiler

Rate This
  • Comments 7

AVX2 is yet another extension to the venerable x86 line of processors, doubling the width of its SIMD vector registers to 256 bits, and adding dozens of new instructions.  AVX2 shipped with Intel's latest processor micro-architecture, codenamed "Haswell".  (Its official name is "4th generation Intel® Core™ processor family").  As well as AVX2, Haswell supports other features to help make your code run faster:  FMA (Fused Multiply Add) and BMI (Bit Manipulation Instructions), in particular.  Haswell chips appear in many of the latest PCs, laptops and tablets (including our own Surface Pro 2).

This extra silicon opens a new corner of the playing field for compiler-writers - to take your C++ source and generate these new instructions, making your code run faster than before.  And so we are releasing our initial support for AVX2 with the CTP2 of Visual Studio Update.

How do you tell the VC++ compiler to generate AVX2 instruction?  From the command line, include the /arch:AVX2 switch.  If you work within Visual Studio, the screenshot below shows how to set this option.   

 

Yes, we will add an option to enable AVX2 in the drop-down menu at: Project Property Pages | Configuration Properties | C/C++ | Code Generation | Enable Enhanced Instruction Set.  Likewise, we will add brief help to the command line usage for cl /?

Further points worth calling out on AVX2 support:

The compiler will generate code that includes AVX2 and FMA instructions.  The resulting binaries will only run on PCs that support these instructions.  If you run the binaries on an older PC, the program will crash, and display a popup like this:

If you specify /arch:AVX2, then it also enables /arch:AVX - we try to keep those /arch switches 'monotonic':  the capabilities of each switch in the sequence {IA32, SSE, SSE2, AVX, AVX2} subsumes its predecessor (not sure I've explained this well - is it clear?)

The AVX2 support in this CTP is just a start.  We have more work to do!  This really should not come as a surprise - in a sense, compiler optimizations are never "finished".  Compiler engineers have been improving optimizations since they were started in the mid-50s by John Backus in the FORTRAN I project.  But it's worth emphasizing, if only to fend off a little flurry of advice saying we could improve parts of our AVX2 codegen :-)

If you want to write code that checks for whether the machine you are running is "Haswell-capable", then you need to check 5 configuration bits, via the CPUID instruction. This Intel page explains the details. 

As always, we are interested in your feedback. Post your comments below. Thanks!

 

 

 

  • In this CTP, will the autovectorizer use AVX2 as well? Further will it recognize code that could be vectorized using the GATHER instruction?

  • @Chris

    Yes, I forgot to say that explicitly - the auto-vectorizer continues to analyze code, looking for places it can speed execution by using the chip's SIMD registers & instructions.  But now, with /arch:AVX2, it will generate AVX2 instructions, using the YMM (256-bit wide) registers.  And yes, we perform gather operations - but using instructions other than GATHER (which does the job, but doesn't yet seem to benefit performance; this may change with revisions to the micro-architecture).

  • @Kevin Frei, any plans on branch prediction (en.wikipedia.org/.../Branch_prediction) and loop-invariant-hoisting (en.wikipedia.org/.../Loop-invariant_code_motion) optimizations.

    In case you guys didn't noticed the most viewed question on StackOverflow //stackoverflow.com/q/11227809

  • @Kevin Frei, any plans on loop interchange (en.wikipedia.org/.../Loop_interchange) and loop-invariant-hoisting (en.wikipedia.org/.../Loop-invariant_code_motion) optimizations for branch prediction.

    In case you guys didn't noticed the most viewed question on StackOverflow //stackoverflow.com/q/11227809

  • if your using windows 7 you need the SP1 installed or there's no avx avx2 available as per using the highest quality  hand optimised AVX2 x264/x265 etc...

  • @Jim, I find the post confusing.  It sounds like you're implying that 256bit registers are only now available with the introduction of AVX2, but weren't they used before with AVX ?  Or do you mean specifically for the integer instructions that AVX2 now enables ?

  • @Richard

    Yes, fair point.  256-bit SIMD registers have indeed been around a while, since AVX was introduced.  AVX2 uses those same 256-bit registers, but provides extra instructions; so the compiler can now improve performance of 'everyday' code (not just scientific/engineering code that makes heavy use of floating-point data.

Page 1 of 1 (7 items)