SimpleMath - a simplified wrapper for DirectXMath

SimpleMath - a simplified wrapper for DirectXMath

Rate This
  • Comments 7

Feb 2013 update: SimpleMath has now been merged into the DirectX Tool Kit on Codeplex.


SimpleMath, created by my colleague Chuck Walbourn, is a header file that wraps the DirectXMath SIMD vector/matrix math API with an easier to use C++ interface.  It provides the following types, with similar names, methods, and operator overloads to the XNA Game Studio math API:

  • Vector2
  • Vector3
  • Vector4
  • Matrix
  • Color
  • Plane
  • Quaternion
  • Ray
  • BoundingSphere
  • BoundingBox

Download SimpleMath here.


Why wrap DirectXMath?

DirectXMath provides highly optimized vector and matrix math functions, which take advantage of SSE SIMD intrinsics when compiled for x86/x64, or the ARM NEON instruction set when compiled for an ARM platform such as Windows RT or Windows Phone.  The downside of being designed for efficient SIMD usage is that DirectXMath can be somewhat complicated to work with.  Developers must be aware of correct type usage (understanding the difference between SIMD register types such as XMVECTOR vs. memory storage types such as XMFLOAT4), must take care to maintain correct alignment for SIMD heap allocations, and must carefully structure their code to avoid accessing individual components from a SIMD register.  This complexity is necessary for optimal SIMD performance, but sometimes you just want to get stuff working without so much hassle!

Enter SimpleMath...

These types derive from the equivalent DirectXMath memory storage types (for instance Vector3 is derived from XMFLOAT3), so they can be stored in arbitrary locations without worrying about SIMD alignment, and individual components can be accessed without bothering to call SIMD accessor functions. But unlike XMFLOAT3, the Vector3 type defines a rich set of methods and overloaded operators, so it can be directly manipulated without having to first load its value into an XMVECTOR.  Vector3 also defines an operator for automatic conversion to XMVECTOR, so it can be passed directly to methods that were written to use the lower level DirectXMath types.

If that sounds horribly confusing, the short version is that the SimpleMath types pretty much Just Work™ the way you would expect them to.

By now you must be wondering, where is the catch?  And of course there is one.  SimpleMath hides the complexities of SIMD programming by automatically converting back and forth between memory and SIMD register types, which tends to generate additional load and store instructions.  This can add significant overhead compared to the lower level DirectXMath approach, where SIMD loads and stores are under explicit control of the programmer.


Who is SimpleMath for?

You should use SimpleMath if you are:

  • Looking for a C++ math library with similar API to the C# Microsoft.Xna.Framework types
  • Porting existing XNA code from C# to C++
  • Wanting to optimize for programmer efficiency (simplicity, readability, development speed) at the expense of runtime efficiency

You should go straight to the underlying DirectXMath API if you:

  • Want to create the fastest possible code
  • Enjoy the lateral thinking sometimes needed to express an algorithm in terms of SIMD operations

This need not be a global either/or decision.  The SimpleMath types know how to convert themselves to and from the corresponding DirectXMath types, so it is easy to mix and match.  You can use SimpleMath for the parts of your program where readability and development time matter most, then drop down to DirectXMath for performance hotspots where runtime efficiency is more important.



Here is a simple object movement calculation, implemented using DirectXMath.  Note the skullduggery to make sure the PlayerCat instance will always be 16 byte aligned (and I didn't even include the implementation of the AlignedNew helper here!)

    #include <DirectXMath.h>

    using namespace DirectX;

    __declspec(align(16)) class PlayerCat : public AlignedNew<PlayerCat>
        void Update()
            const float cFriction = 0.99f;

            XMVECTOR pos = XMLoadFloat3A(&mPosition);
            XMVECTOR vel = XMLoadFloat3A(&mVelocity);

            XMStoreFloat3A(&mPosition, pos + vel);
            XMStoreFloat3A(&mVelocity, vel * cFriction);

        XMFLOAT3A mPosition;
        XMFLOAT3A mVelocity;

Using SimpleMath, the same math is, well, a little more simple :-)

    #include "SimpleMath.h"

    using namespace DirectX::SimpleMath;

    class PlayerCat
        void Update()
            const float cFriction = 0.99f;

            mPosition += mVelocity;
            mVelocity *= cFriction;

        Vector3 mPosition;
        Vector3 mVelocity;

Here is the x86 SSE code generated for the DirectXMath version of the Update method:

     movaps      xmm2,xmmword ptr [ecx+10h]
     movaps      xmm1,xmmword ptr [ecx]
     andps       xmm2,xmmword ptr [?g_XMMask3@DirectX@@3UXMVECTORI32@1@B]
     andps       xmm1,xmmword ptr [?g_XMMask3@DirectX@@3UXMVECTORI32@1@B]
     movaps      xmm0,xmmword ptr [__xmm@3f7d70a43f7d70a43f7d70a43f7d70a4]
     addps       xmm1,xmm2
     mulps       xmm0,xmm2
     movq        mmword ptr [ecx],xmm1
     shufps      xmm1,xmm1,0AAh
     movss       dword ptr [ecx+8],xmm1
     movq        mmword ptr [ecx+10h],xmm0
     shufps      xmm0,xmm0,0AAh
     movss       dword ptr [ecx+18h],xmm0

The SimpleMath version generates slightly more than twice as many machine instructions:

     movss       xmm2,dword ptr [ecx]
     movss       xmm0,dword ptr [ecx+4]
     movss       xmm1,dword ptr [ecx+0Ch]
     unpcklps    xmm2,xmm0
     movss       xmm0,dword ptr [ecx+8]
     movlhps     xmm2,xmm0
     movss       xmm0,dword ptr [ecx+10h]
     unpcklps    xmm1,xmm0
     movss       xmm0,dword ptr [ecx+14h]
     movlhps     xmm1,xmm0
     addps       xmm2,xmm1
     movss       dword ptr [ecx],xmm2
     movaps      xmm0,xmm2
     shufps      xmm0,xmm2,55h
     movss       dword ptr [ecx+4],xmm0
     shufps      xmm2,xmm2,0AAh
     movss       dword ptr [ecx+8],xmm2
     movss       xmm1,dword ptr [ecx+0Ch]
     movss       xmm0,dword ptr [ecx+10h]
     unpcklps    xmm1,xmm0
     movss       xmm0,dword ptr [ecx+14h]
     movlhps     xmm1,xmm0
     mulps       xmm1,xmmword ptr [__xmm@3f7d70a43f7d70a43f7d70a43f7d70a4]
     movaps      xmm0,xmm1
     movss       dword ptr [ecx+0Ch],xmm1
     shufps      xmm0,xmm1,55h
     shufps      xmm1,xmm1,0AAh
     movss       dword ptr [ecx+10h],xmm0
     movss       dword ptr [ecx+14h],xmm1

Most of this difference is because I was able to used aligned loads and stores in the DirectXMath version, while the SimpleMath code must do extra work to handle memory locations that might not be properly aligned.  Also note how the SimpleMath version loads the mVelocity value from memory into SIMD registers twice, while the extra control offered by DirectXMath allowed me to do this just once.

But hey, sometimes performance isn't the most important goal.  If you care more about optimizing for developer efficiency, SimpleMath could be for you.



  • Note that SimpleMath assumes you are using a right handed coordinate system, the same as the XNA Game Studio math library.

    This is unlike DirectXMath, which includes alternative methods for both right and left handed computations.

  • I did the same sort of thing a few months back and released it on Github (, although it doesn't use the name DirectXMath.

    Also the coding style is a bit different.

  • thanks shawn and  Chuck Walbourn for this , i check if it is faster than mine

    if it is you get full credits for the math lib,,

    have a look at my physics and my realtime mathematical beauty for the tegra 3 chip surface rt

    happy newyear and keepup the good work,, and i got somthing for you all,, you will see in the comming months

    best regards


  • Sorry mine is faster , but i had to check it is allways the simple things you forget

    and will slow your program down

    allso ran some test we are in the year 1998 riva tnt 128 grapchic card  = tegra 3 in small form factor

    and an upscaler and more pixel bandwidth also

    well the first version of NT now there are only 2 years to the break true, in small form factores

  • Hi, can someone explain me why DirectxMath dont use move semantics? I feel the urge to edit that .h file adding move constructors and assignment operators..

  • What does it mean to move as opposed to copy the value in a SIMD CPU register?

    Move semantics only make sense when there is some worthwhile optimization that can be done for move but not for copy.  I don't understand what this would be for any of the DirectXMath or SimpleMath types?

  • Nice! When looking at the source code, I ask myself why intermediate results of packed vectors are returned as packed vectors (e.g. adding two Vector2 values results in a Vector2 value). Wouldn't it be more efficient for these intermediate results to make new C++ structs that hold an XMVECTOR and know what the original packed dimension was (ie as template integer argument), or even use a 'swizzle' mask (again a template argument)? And allow implicit conversion from such an intermediate structure to a packed vector? Together with the "auto" C++ keyword, this could result in better performance, since the compiler cannot automatically infer these type of optimizations I guess.

Page 1 of 1 (7 items)
Leave a Comment
  • Please add 3 and 8 and type the answer here:
  • Post