Parallel Programming in Native Code

Parallel programming using C++ AMP, PPL and Agents libraries.

Using C++ AMP to Build the Aviary Photo Editor

Using C++ AMP to Build the Aviary Photo Editor

  • Comments 0

This is a guest post from Aviary that details their experience using C++ AMP.

 

Hi all,

 

My name is Chevon Christie, and I am the lead Windows and Windows Phone engineer at Aviary. If you haven't heard of us yet, you're in for a treat. Aviary creates the world's best photo editing SDK offers it to developers for free. We recently released the Windows 8 Aviary Photo Editing SDK, the newest member of Aviary SDK family, and along with it, our showcase application, Aviary's Photo Editor for Windows 8. See below:

 

 

 

When we were building the Windows 8 Aviary SDK, it was a priority to keep our image processing as fast possible and do so in a manageable way that would scale across the unpredictable variations of PC hardware configurations. We took advantage of C++ AMP (C++ Accelerated Massive Parallelism) to meet these goals.

 

C++ AMP is an open specification for GPGPU (General Purpose GPU) programing in C++, with implementations currently available for Windows (via DirectX). The technology allows developers to write programs/'GPU kernels' in a familiar language like C++, as opposed to HLSL or GLSL, that take advantage of GPU hardware acceleration.

 

The Aviary Photo Editor benefits from the use of C++ AMP to perform GPU computations of our photo filters and other features in our SDK. The use of C++ AMP accelerates some of our computations by as much as 16 times relative to the same computations performed on the CPU. What this means then is a user of the Aviary Windows 8 SDK or Photo Editor can edit and apply filters to very hi-resolution images without much delay in processing.

 

In our experience, the hardest initial snag with C++ AMP is being sure that your 'GPU kernels' (the code that processes your data on the GPU) or the data itself is a multiple of the size of a tile which in turn should be a multiple of 64 in order to make the most efficient use of current hardware. Now, you're probably wondering why this matters. Well, it's a C++ AMP best practice that allows for optimal usage of hardware resources in GPU parallelization. It permits data to be split, parallelized, and read in predictable blocks. This is a standard practice in GPGPU programming.

 

As an example, consider an image of size 1024 (W) x 1000 (H), to process which we select a tile of size 16 (W) x 16 (H) (note that the tile size is 256 which is a multiple of 64).

 

The width of the source image (1024) here is a multiple of the tile width (16) but the image's height (1000) is not evenly divisible by the tile height (16). This can be addressed by simply padding the image along the height dimension such that it is a multiple of 16; i.e. instead of creating (ARGB) 4 * 1024 * 1000 bytes of data on the GPU, create a block of size 4 * 1024 * 1024 and copy all the data from the 1024 * 1000 source image. The padded regions of the block of data need not be initialized. This region will participate in reading/writing during the kernel's processing, but will be ignored after processing - only the 4 * 1024 * 1000 region of data from the image will be copied back.

 

Once we had C++ AMP accelerating our image processing, we built the codebase into a Windows Runtime (WinRT) Component to allow us (and any developer using our Windows 8 SDK) to directly call our C++ code from a C++/XAML or C#/XAML. The WinRT component was quite easy to create. However, there are restrictions on what can be publicly exposed, so nicely wrapping up and exposing your C++ code takes a little thinking. For instance, all publicly exposed classes in WinRT components must be marked 'sealed' (cannot be inherited from). Another large plus for WinRT components is that they are portable with very little work to the Windows Phone Runtime, since the Windows Runtime largely overlaps with its mobile sibling. Developers looking to target both Windows 8 and Windows Phone should consider WinRT as the primary option.

 

Overall, using C++ AMP to power the Aviary Windows 8 SDK and Photo Editor was a solid choice, as the code runs and scales on various GPU hardware without any configuration on the developer's end. For more about C++ AMP, please read this: http://msdn.microsoft.com/en-us/library/vstudio/hh265137.aspx

 

Download the Aviary Windows 8 SDK here: http://aviary.com/w8

Download our free Windows 8 Photo Editor app here: http://apps.microsoft.com/windows/en-US/app/cdd22d88-c0c4-4fff-a741-fe5ea3692b22

 

 

Until next time,

 

Chevon

http://www.aviary.com

Blog - Comment List MSDN TechNet
  • Loading...
Leave a Comment
  • Please add 1 and 5 and type the answer here:
  • Post