Parallel Programming in Native Code

Parallel programming using C++ AMP, PPL and Agents libraries.

Julia fractal using C++ AMP

Julia fractal using C++ AMP

  • Comments 3

In this blog post, I will share a C++ AMP implementation of a fractal generator, rendering 4 dimensional Quaternion Julia fractals. I’ll show you screenshots of the app, then we’ll dive into the code, and then I’ll share a pointer of where you can get the Visual Studio project.

Credits

The original sample was written using DirectCompute by Jan Vlietinck. The shader code in the DirectCompute version was a port of a Cg version written by Keenan Crane. I have ported the DirectCompute portions to use C++ AMP.

The app

The application presents an animation of Julia fractals by changing the generation parameters by a small amount at each time quantum. Here are some snapshots of the fractal animation.

  • Use the mouse wheel to zoom in and out
  • Hold down the left mouse button to select and rotate the fractal
  • Use ‘space’ to turn the animation ‘on’ or ‘off’

 

clip_image002clip_image004

clip_image006clip_image008

 

You can also change the parameters to the Julia fractal during the animation.

  • Use w/x to increase/decrease the x component of the quaternion constant (MuC.x)
  • Use q/z to increase/decrease the y component of the quaternion constant (MuC.y)
  • Use a/d to increase/decrease the z component of the quaternion constant (MuC.z)
  • Use e/c to increase/decrease the w component of the quaternion constant (MuC.w)
  • Use -/+ to increase/decrease the precision of intersection (epsilon)

Porting to C++ AMP

The porting of this sample was achieved using the guide C++ AMP for DirectCompute programmer. The main win32 application is implemented in QJulia4D.cpp.

We first start by removing most of the boilerplate code to create constant buffers, resource views and to compile and launch the shader since this is not needed when using C++ AMP. C++ AMP containers automatically take care of creating the buffers and resource views on the accelerator. Constants can be directly captured into the parallel_for_each. This takes care of creating and setting the required constant buffers. Finally the code to explicitly compile and launch the shader is not required when using C++ AMP. This is automatically taken care by the call to parallel_for_each.

The shader was ported to use C++ AMP in julia4DAMP.cpp. The compute shader is replaced with a parallel_for_each call as shown below.

Original Compute Shader:

 1: [numthreads(4, 64, 1)]
 2: //****************************************************************************
 3: void CS_QJulia4D( uint3 Gid : SV_GroupID, uint3 DTid : SV_DispatchThreadID, uint3 GTid : SV_GroupThreadID, uint GI : SV_GroupIndex )
 4: //****************************************************************************
 5: { 
 6:     float4 coord = float4((float)DTid.x, (float)DTid.y, 0.0f, 0.0f);
 7:  
 8:     float2 size     = float2((float)c_width, (float)c_height);
 9:     float scale     = min(size.x, size.y);
 10:     float2 half     = float2(0.5f, 0.5f);
 11:     
 12:     float2 position = (coord.xy - half * size) / scale *BOUNDING_RADIUS_2 *zoom;    
 13:  
 14:     float3 light = float3(1.5f, 0.5f, 4.0f);
 15:     float3 eye   = float3(0.0f, 0.0f, 4.0f);
 16:     float3 ray   = float3(position.x, position.y, 0.0f);
 17:     
 18:     // rotate fractal
 19:     light = mul(light, rotation);
 20:     eye   = mul(  eye, rotation);
 21:     ray   = mul(  ray, rotation);
 22:  
 23:     // ray start and ray direction
 24:     float3 rO =  eye;
 25:     float3 rD =  ray - rO;
 26:     
 27:     float4 color = QJulia(rO, rD, c_mu, c_epsilon, eye, light, c_selfShadow);
 28:     output[DTid.xy] = color;
 29: }

C++ AMP kernel:

 1: // Launch kernel using C++ AMP
 2: void ampCompute(const writeonly_texture_view<unorm4, 2>& tv, const QJulia4DConstants& mc)
 3: {
 4:     parallel_for_each(tv.accelerator_view, tv.extent.tile<64, 4>().pad(), 
 5:                      [&,tv,mc](tiled_index<64,4> ti) restrict(amp)          
 6:     {
 7:         float4 coord = float4((float)ti.global[1], (float)ti.global[0], 0.0f, 0.0f);
 8:  
 9:         float2 size     = float2((float)mc.c_width, (float)mc.c_height);                
 10:         float scale     = fminf(size.x, size.y);
 11:         float2 half     = float2(0.5f, 0.5f);
 12:         float2 position = (coord.xy - half * size) / scale *BOUNDING_RADIUS_2 *mc.zoom;
 13:  
 14:         float3 light = float3(1.5f, 0.5f, 4.0f);
 15:         float3 eye   = float3(0.0f, 0.0f, 4.0f);
 16:         float3 ray   = float3(position.x, position.y, 0.0f);
 17:  
 18:         // rotate fractal 
 19:         light = mul(light, mc.orientation);        
 20:         eye   = mul( eye, mc.orientation);
 21:         ray   = mul(  ray, mc.orientation);
 22:  
 23:         // ray start and ray direction
 24:         float3 rO =  eye;
 25:         float3 rD =  ray - rO;
 26:     
 27:         bool selfshadow = mc.selfShadow == 0  ? false : true;
 28:         float4 color = QJulia(rO, rD, mc.mu, mc.epsilon, light, selfshadow, mc);    
 29:         unorm_4 u_color(unorm(color.x), unorm(color.y), unorm(color.z), unorm(color.w));
 30:         tv.set(ti.global,u_color);
 31:     });
 32: }

 

Any exceptions from launch and execution of the shader, including TDR, are bubbled up to the user using C++ AMP runtime exceptions. The rest of the shader code remains unchanged in this port. Note that you can also use other C++ constructs when porting since C++ AMP allows us to express the kernel using C++ with some restrictions. You are welcome to try this out, I just did a vanilla port instead of writing the code in C++ the way I would if I was writing it from scratch.

The parallel_for_each kernel stores the computation results in a writeonly_texture_view created from the interop texture object of the back buffer associated with the window using the IDXGISwapChain.

 1: ...
 2: ID3D11Texture2D* pTexture;
 3: hr = g_pSwapChain->GetBuffer( 0, __uuidof( ID3D11Texture2D ), ( LPVOID* )&pTexture );
 4:  
 5: if(FAILED(hr))
 6: {
 7:     throw std::exception("Failed to get buffer");
 8: }
 9:  
 10: // create amp texture using interop
 11: accelerator_view av = concurrency::direct3d::create_accelerator_view(g_pd3dDevice);
 12: texture<unorm4, 2> tex = make_texture<unorm4, 2>(av, pTexture);
 13: g_pAmpTextureView.reset(new writeonly_texture_view<unorm4, 2>(tex));

 

And that’s it! We now have a sample that is written entirely in C++.

Download

You can download the zipped file containing the sample. Please note that the attached sample code in the “.zip” is released under BSD license. You will need Visual Studio 2012 to build the sample.

As always, please feel free to share your thoughts and ask questions below or in our MSDN concurrency forum.

Attachment: Julia.zip
Blog - Comment List MSDN TechNet
  • Loading...
Leave a Comment
  • Please add 4 and 5 and type the answer here:
  • Post