Parallel Programming in Native Code

Parallel programming using C++ AMP, PPL and Agents libraries.

Read-only array_view/array in C++ AMP – Part 2 of 2

Read-only array_view/array in C++ AMP – Part 2 of 2

  • Comments 0

In the previous post we looked at how you can specify the read-only restriction for array_view/array data collections in your C++ AMP code. In this part, we will look at the benefits of specifying the read-only restriction for your data collections when they are only read-from (not written-to) in a C++ AMP kernel or parts of your host code.

Benefits of specifying read-only

It is advisable to use read-only array_view/array object types when the data collection is purely used an input (only read-from) in a C++ AMP kernel or part of your host code. As explained earlier, the read-only restriction need not necessarily be attributed at the point of construction of the array_view/array object and the duration/scope of read-only access restriction for an array_view/array data collection can be limited to specific C++ AMP kernels or parts of your code.

Following are some of the main benefits of specifying read-only restrictions when a data collection is purely referenced as an input; mostly these benefits apply equally to array_view, array and texture objects, though I will only be using array_view objects in my code illustrations.

Self-documenting code

Stating the obvious, attributing the read-only access restriction to input-only data referenced in a C++ AMP kernel or some host function, self-documents the intent of using the data purely as an input for that part of the code and helps detect any inadvertent modifications to the data at compile time. The ability to specify at compile time that some data is read-only, is a powerful tool to help you write correct programs and detect errors early (at compile time) – use it to your advantage.

Explicitly casting away the constness from a pointer or reference in C++ AMP code may result in undefined behavior and a warning is issued by the C++ AMP compiler for any such casting operations detected in amp restricted functions.

Aids compiler optimizations

The knowledge that an array_view/array object is read-only can help the compiler generate better code by enabling optimizations that may not be otherwise possible. Granted the compiler analyzes your code to infer if the data collection is only read-from, but such analysis is not fool-proof and explicitly specifying the read-only access restriction through the type system provides much stronger guarantees for the compiler to leverage.

Avoid unnecessary implicit data movement when using array_view objects

As described in one of our earlier posts the runtime relies on the type of the array_view to determine whether the contents of an array_view captured in a parallel_for_each may be modified on the accelerator_view by the parallel_for_each invocation. When the programmer does not specify that a captured array_view is read-only, the runtime may end up assuming that the array_view contents are modified on the target accelerator_view by the parallel_for_each invocation, and may result in unnecessary data movement when the array_view is subsequently accessed on the host or another accelerator.

Note that this also applies to array_view accesses in your host code. Using a writable array_view when you only read the array_view contents on the host, causes the runtime to assume that the contents have been modified on the host. This results in unnecessary transfer of data when the array_view is subsequently accessed on an accelerator_view even though a valid cached copy already exists on that accelerator_view. The cached copy is unnecessarily trashed by accessing the array_view on the host through a read-write array_view object.

Enables hardware optimizations

Some accelerators have special caching mechanisms for read-only data – it is cheaper to implement caching for read-only data in parallel architectures due to the absence of coherence concerns. Specifying the input array_view/array data in a C++ AMP kernel is read-only, may enable these hardware optimizations which can translate into runtime performance gains.

Limited number of writable array_view/array/texture/writeonly_texture_view objects allowed per kernel

C++ AMP supports a limited number of writable array_view/array/texture/writeonly_texture_view objects per kernel. Specifically, the total number of writable array_view + array + texture + writeonly_texture_view per kernel should not exceed 8 on DirectX 11 and 64 on DirectX11.1. The total number of allowed read-only array_view/array/texture objects per kernel is 128 and specifying the read-only restriction can help you avoid hitting the limit on maximum number of allowed writable array_view/array/texture/writeonly_texture_view objects per kernel.

Increased runtime concurrency opportunities

Specifying the read-only restriction on your C++ AMP data collections, can expose additional concurrency opportunities at runtime which may translate into performance gains. When an array_view/array/texture is captured in a read-only fashion in a kernel, it is safe for other kernels or copy commands to concurrently access the same data for reading. For example, suppose that following a parallel_for_each invocation, a synchronize operation is initiated to transfer the contents of the output array_view to the CPU followed by invocation of another kernel which only reads from the results of the previous parallel_for_each kernel. If the array_view is captured as read-only in the 2nd parallel_for_each kernel, concurrent execution of the copy operation and the 2nd kernel is safe and may happen on certain accelerators that are capable of performing concurrent data transfers and kernel execution.

int *pInput = new int[size];
 
// Initialize pInput contents
...
 
int *pOutput = new int[size];
 
array_view<const int> inputArrView(size, pInput);
array_view<int> outputArrView(size, pOutput);
 
parallel_for_each(outputArrView.extent, [=](index<1> idx) restrict(amp) {
    outputArrView[idx] = inputArrView[idx] + constantValue;
});
 
int *pOutput2 = new int[size];
array_view<const int> inputArrView2(outputArrView);
array_view<int> outputArrView2(size, pOutput2);
 
parallel_for_each(outputArrView2.extent, [=](index<1> idx) restrict(amp) {
    // The output of the first kernel is captured as read-only in the 2nd kernel
    outputArrView2[idx] = inputArrView2[idx] * inputArrView2[idx];
});
 
// Synchronizing the inputArrView2 content to the CPU can be executed
// concurrently with the kernel above since both the copy operation and 
// the kernel are accessing the data in a read-only fashion
outputArrView.synchronize();

 

A grain of salt

I would like to share with you the knowledge that in some specific scenarios (particularly memory bound kernels), we have empirically found some GPU hardware to exhibit lower performance for read operations from read-only array_view/array compared to read-write array_view/array. This is a bug in our opinion and we are actively working with the hardware vendors to understand and address this issue. So, despite all the aforementioned benefits of using read-only array_view/array/texture objects for read-only access to your data collections, I would advise measuring the performance of your C++ AMP kernels (particularly memory bound kernels) both with and without the read-only restriction specified for input-only data in your kernels.

In closing

Hopefully, this post gives you enough motivation to inculcate the recommended practice of specifying read-only restriction for input-only array_view/array/texture data when possible. This is a powerful programming tool to help you write correct and faster code and detect inadvertent errors early in the development cycle, besides several potential performance benefits as discussed above.

I would love to hear your thoughts, comments, questions and feedback below or on our MSDN forum.

Blog - Comment List MSDN TechNet
  • Loading...
Leave a Comment
  • Please add 8 and 8 and type the answer here:
  • Post