Parallel Programming in Native Code

Parallel programming using C++ AMP, PPL and Agents libraries.

concurrent_vector and concurrent_queue explained

concurrent_vector and concurrent_queue explained

  • Comments 6

As a user of Visual Studio, you might have had a chance to use the concurrent data structures introduced in the Beta 2 release – concurrent_vector<T> and concurrent_queue<T>. A detailed discussion about the design, usage, and methods of concurrent_queue can be found here. A similar discussion of concurrent_vector will be posted in the near future.

In this post, we will go over the inner workings of these two containers, specifically, how they are laid out in memory.

concurrent_vector

The concurrent_vector class can be thought of as a parallel version of std::vector that allows thread safe growth. However, unlike std::vector, iterators to concurrent_vector are not invalidated when its storage is grown, and it does not support the pop_back() operation.

In std::vector, storage is dynamically allocated, and its elements follow a strict linear sequence. Due to the fact that elements are guaranteed to be in contiguous memory locations, they are all relocated to new locations when the container grows. However, concurrent_vector does not make the guarantee that all elements are stored in contiguous memory locations. Consequently, existing elements are not relocated when more memory needs to be allocated. Instead, elements are stored in a series of different contiguous arrays of elements. When an empty concurrent_vector is instantiated, no memory is allocated to start with. As elements are added to the container, arrays of increasing sizes are allocated as and when required. The sizes of the allocated arrays are always powers of 2, and increase as the container grows.

If you look at the raw view of concurrent_vector in a debugger, you will see that it contains a segment table member called _My_segment, which looks like an array of arrays. (To enable raw view, go to Tools -> Options -> Debugging and check the “Show raw structure of objects in variables windows” checkbox). In the segment table, _My_segment[0] points to the first 2 elements of the container, _My_segment[1] to the next 2 elements, _My_segment[2] to the next 4 elements, _My_segment[3] to the next 8 elements, etc. See Figure 1 (not to scale).

Figure 1

Figure 1

This arrangement of the segment table is independent of the sizes of the actual storage arrays contained within concurrent_vector, and is simply a logical view of the elements. In other words, the sizes of the logical arrays as seen by the segment table are always 2, 2, 4, 8, 16, 32, 64, 128, etc. However, the sizes of the actual arrays allocated from the heap may very well be 16, 16, 32, 64, 128, etc. The size of the first array is determined by the first reservation, growth or assignment operation.

For example, take a concurrent_vector<int> v that is instantiated with an initial size of 10. The actual size of the first array is the lowest power of 2 that can fit 10 elements, i.e., 16. As the container grows, the sizes of the subsequent arrays that are allocated are 16, 32, 64, etc. (increasing powers of 2). The segment table of v is populated such that _My_segment[0] points to the first 2 elements of the first array, _My_segment[1] points to the next 2 elements of the same first array, _My_segment[2] points to the next 4 elements, again in the first array, _My_segment[3] points to the next 8 elements, still in the first array, _My_segment[4] points to the second array (of size 16), _My_segment[5] points to the third array (of size 32), etc. See Figure 2.

Figure 2

Figure 2

When shrink_to_fit() is called, the storage is defragmented and the internal layout is optimized. However, it is not guaranteed that all elements of v are in one contiguous array. When clear() is called, all arrays are deallocated.

This, in short, is how the elements of concurrent_vector are laid out in memory.

concurrent_queue

concurrent_queue can be thought of as a parallel version of std::queue that offers thread safe addition and removal of elements. The elements contained in a concurrent_queue are held in blocks of memory called pages. As elements are enqueued and dequeued from the container, pages are added and deleted. The size of a single page and the number of elements a page can hold vary with the size of the data structure being held. See figure 3.

Figure 3

Figure 3

The concurrent_queue class is implemented as an array of 8 micro-queues, where each micro-queue is a linked list of pages. This layout is illustrated in Figure 4. Take the example of a concurrent_queue<int> q, to which we enqueue integers in the order 0, 1, 2, 3, etc. such that q[i] is equal to i. When q is instantiated, it is empty and no pages are allocated at first. As elements are enqueued, they are added to the first available spot in the first available page in a specific micro-queue. The formula used to select which micro-queue the ith element is added to is i*3%8. Therefore, element 0 is added to the micro-queue at _Array[0], element 1 to _Array[3], element 2 to _Array[6], element 3 to _Array[1], and so on. If there is no available page in the micro-queue that was selected, a new page is allocated and linked to the end of the micro-queue. The _Page data structure has a member variable that keeps track of the next page in the micro-queue (denoted by N in Figure 4). For a concurrent_queue<int>, the number of integers stored in each page is 32 (from Figure 3). Since the number of micro-queues in a concurrent_queue is always fixed at 8, the first layer of pages in q will hold the first 256 integers that are enqueued (0 through 255). When more integers are added to q, a new layer of pages is allocated.

Figure 4

Figure 4

There are 2 member variables in concurrent_queue that keep track of the number of elements enqueued and dequeued through its lifetime –_Tail_counter and _Head_counter. When an element is enqueued, _Tail_counter is incremented. When an element is dequeued, _Head_counter is incremented. The number of elements in the queue at any given time is the difference between _Tail_counter and _Head_counter.

When enough elements are dequeued from q such that a particular page becomes empty, the page is destroyed and its memory is deallocated. In the above example, when elements 0 – 247 are dequeued, each page in the first layer of pages ends up containing exactly 1 element. When element 248 is dequeued, the first page in _Array[0] becomes empty, and is therefore deallocated. When element 249 is dequeued, the first page in _Array[3] is deallocated. See Figure 5.

Figure 5

Figure 5

As more elements are enqueued and dequeued from q, new pages are added and older pages are deallocated. This, in short, is how concurrent_queue is laid out in memory.

A note on visualizers

So far, we have discussed the internals of concurrent_vector and concurrent_queue. Chances are that when you are using one of them in a program, you are less interested in the details of implementation of the data structure, and more interested in seeing the elements contained within them, similar to how you would see std::vector or std::queue under a debugger in the locals window. In the Beta 2 release, we have included debugger visualizers for concurrent_vector and concurrent_queue so that they appear like their corresponding STL data structures.

That’s it for now. Feel free to post a comment if there’s something more you would like to know about concurrent data structures or visualizers in Visual Studio 2010.

- Raghu Simha.

Blog - Comment List MSDN TechNet
  • Loading...
Leave a Comment
  • Please add 8 and 1 and type the answer here:
  • Post