Parallel Programming in Native Code

Parallel programming using C++ AMP, PPL and Agents libraries.

Norms and unorms in C++ AMP textures

Norms and unorms in C++ AMP textures

  • Comments 2

Norms and unorms are commonly used as texture types; especially when each texture element represents 32-bit RGBA color data. Each 8-bit integer component copied in from the host is interpreted as an 8-bit fixed-point number on the accelerator using textures. In this blog post, we will dive deeper into the behavior of norm and unorm when used in textures and learn more about these fixed-point semantics.

Fixed-point semantics inside textures

Outside a texture, variables of type norm and unorm are wrappers over 32-bit floating point numbers clamped to [-1.0, 1.0] and [0.0, 1.0] respectively. For example, if you store floating point values into a unorm on the host, you will notice that the values are clamped but there is no loss of precision.

vector<unorm_4> unorm_vec(1);
unorm_vec[0] = unorm_4(1.0f, 0.5f, 0.3f, 1.3f);
printf("vector data: %1.5f %1.5f %1.5f %1.5f \n", 
        (float)unorm_vec[0].x, //stored as 1.0f
       
(float)unorm_vec[0].y, //stored as 0.5f
       
(float)unorm_vec[0].z, //stored as 0.3f
        (float)unorm_vec[0].w); //clamped to 1.0f

OUTPUT
vector data: 1.00000 0.50000 0.30000 1.00000

This holds true even when norms and unorms are used on the device and in C++ AMP containers other than texture i.e. array and array_view. Norms and unorms continue to have 32 bits of storage as do normal floats and there is no loss of precision.

array<unorm_4, 1> unorm_array(1);
parallel_for_each(unorm_array.extent, [&](index<1> idx) restrict(amp)
{
   
unorm_array[idx] = unorm_4(1.0f, 0.5f, 0.3f, 1.3f);
   
direct3d_printf("array data: %1.5f %1.5f %1.5f %1.5f \n",
                   
(float)unorm_array[idx].x, 
                   
(float)unorm_array[idx].y,
                   
(float)unorm_array[idx].z,
                   
(float)unorm_array[idx].w);
});

unorm_array.accelerator_view.wait();

OUTPUT
array data: 1.00000 0.50000 0.30000 1.00000

When written to a texture, norms and unorms have only 8 or 16 bits of storage available depending on the parameter passed to the texture constructor. In fact this is the only location where they have 8-bit or 16-bit storage. To represent the original 32 bit floating point number, the texture converts it to a lower precision 8 or 16-bit fixed-point number.

texture<unorm_4, 1> unorm_tex(1, 8U); // creating 8-bit texture
writeonly_texture_view<unorm_4, 1> unorm_tex_view(unorm_tex);

parallel_for_each(unorm_tex.extent, [=](index<1> idx) restrict(amp)
{
   
unorm_tex_view.set(idx, unorm_4(1.0f, 0.5f, 0.3f, 1.3f));
});

parallel_for_each(unorm_tex.extent, [&](index<1> idx) restrict(amp)
{
   
direct3d_printf("texture data: %1.5f %1.5f %1.5f %1.5f \n",
                   
(float)unorm_tex[idx].x, 
                   
(float)unorm_tex[idx].y,
                   
(float)unorm_tex[idx].z, 
                   
(float)unorm_tex[idx].w);
});

unorm_tex.accelerator_view.wait();

OUTPUT
texture data: 1.00000 0.50196 0.30196 1.00000

As we see, 0.5f and 0.3f cannot be exactly represented as an 8-bit fixed-point number. Instead a value which is the closest approximation to the original floating point number is used when storing to the texture. Packing to 8-bit occurs only when writing to a texture and unpacking to 32-bit occurs only when reading from a texture. Outside the texture, norms and unorms behave as normal 32 bit floating point numbers.

Rounding semantics

We can now ask the question about how the rounding occurred in the previous sample. To understand this, let us look at the range of values supported by an 8 bit fixed-point number. An 8-bit unorm allows for 2^8 = 256 unique real values in the range [0.0f, 1.0f]. Each 8-bit pattern maps to one of the 256 unique real values shown below:

Bit
Pattern

0x00

0x01

0x02

0xFE

0xFF

Fixed-
point
number

0/255

1/255

2/255

254/255

255/255

When storing a single precision floating point number into a fixed-point, the texture maps it to closest fixed-point representation. This process of mapping a larger set of values to smaller set is called quantization.

norm

In the code snippet from the previous section, 0.3f is represented as the closest fixed-point, 77/255 (0.30196f) with an error ~ 0.00196f. See table below.

Bit Pattern

0x00

0x4C

0x4D

0x4E

0xFF

Fixed-point number

0/255

76/255

77/255

78/255

255/255

Real number

0.0f

~0.29803f

~0.30196f

~0.30588f

1.0f

The other value in the code snippet above, 0.5f is equidistant from two fixed-point numbers, 127/255 and 128/255. The loss of precision would be the same in both cases. In this scenario, the larger of them, 128/255(0.50196f), is chosen to represent 0.5f.

Bit Pattern

0x00

0x7F

0x80

0xFF

Fixed-point number

0/255

127/255

128/255

255/255

Real number

0.0f

~0.49803f

~0.50196f

1.0f

Rounding semantics for norms are similar to unorms except that the range of numbers that can be represented is different. 8-bit norms can support 255 unique values. The maximum value of 1.0f maps to 0x7F, and the minimum value of -1.0f has two representations: 0x80 and 0x81. Positive numbers are evenly spaced floating point values in the range [0.0f...1.0f], and also a complementary set of representations for negative numbers in the range [-1.0f...0.0f]. The table below shows the range of values represented by 8-bit signed normalized numbers.

Bit
Pattern

0x80

0x81

0x82

0xFF

0x00

0x01

..

0x7E

0x7F

Fixed-point number

-127/127

-127/127

-126/127

-1/127

0/127

1/127

 

126/127

127/127

Interpreting fixed-point numbers outside textures

Fixed-point numbers are not natively supported outside textures. So, how should we modify the previous example to copy the texture data back and print it from the host? We cannot use a unorm because its underlying data storage is a 32-bit floating point value. If we copy the texture back to a unorm, the raw data in the fixed-point might map to a completely different floating point value as we can see from the code snippet below (and of course we would be copying only 32-bits of data into a 128-bit entity!)

unorm_4 result_on_host;
copy(unorm_tex, // texture stores 1.00000f 0.50196f 0.30196f 1.00000f
    
&result_on_host, // 32x4 bits long
    
// unorm_tex.data_size is different
     // than the size of the result_on_host.
     // This would work but the copy would be invalid!
     sizeof(result_on_host)); // 8x4 bits long

cout << "unorm result on host: "
    
<< result_on_host.x << " "
     << result_on_host.y << " "
     << result_on_host.z << " "
     << result_on_host.w << endl;

OUTPUT
unorm result on host: -2.73162e+038 0 0 0

A common solution is to interpret the bits as integers outside the texture. Since our sample needs to represent an 8-bit unorm with 256 unique values, we could use an unsigned char to read out the data. This will allow us to simulate the same fixed-point arithmetic outside the unorm textures. Let’s change the copy out code to use unsigned char instead of unorm.

unsigned char result_on_host[4];

// texture stores 1.00000f 0.50196f 0.30196f 1.00000f
copy(unorm_tex, &result_on_host, unorm_tex.data_length);

cout << "integral value\tfixed-point value" << endl
     << (unsigned int) result_on_host[0] << "\t\t"
     << (float)result_on_host[0]/255.0f << endl
     << (unsigned int) result_on_host[1] << "\t\t"
     << (float)result_on_host[1]/255.0f << endl
     << (unsigned int) result_on_host[2] << "\t\t"
     << (float)result_on_host[2]/255.0f << endl
     << (unsigned int) result_on_host[3] << "\t\t"
     << (float)result_on_host[0]/255.0f << endl;

OUTPUT
integral value     fixed-point value

255                1
128                0.501961
77                 0.301961
255                1

We now get a more reasonable result, [255, 128, 77, 255] which can be mapped to [255/255, 128/255, 77/255, 255/255] stored by the texture.

The reverse of this experiment also holds true i.e. the texture hardware interprets the bit pattern to create the corresponding fixed-point similar to the table above. Let’s see how this works with some code. In the code snippet below, we copy an 8 bit unsigned char set into a unorm_4 texture:

// represents 4 fixed-point values 1/255, 2/255, 3/255 and 4/255
unsigned char fixedData[4] = {0x01, 0x02, 0x03, 0x04};

texture<unorm_4, 1> tex(1, 8U);
copy(&fixedData, sizeof(fixedData), tex); // copy in character data

parallel_for_each(tex.extent, [&](index<1> idx) restrict(amp)
{
   
direct3d_printf("texture data: %1.5f %1.5f %1.5f %1.5f \n", 
                     (float)tex[idx].x,
                     (float)tex[idx].y,
                     (float)tex[idx].z,
                     (float)tex[idx].w);
});

tex.accelerator_view.wait();

OUTPUT
texture data: 0.00392 0.00784 0.01176 0.01569

As we can see this the texture correctly reinterprets the bit pattern 0x01, 0x2, 0x03, and 0x04 as 1/255, 2/255, 3/255 and 4/255 or 0.00392f, 0.00784f, 0.01176f and 0.01569f on the accelerator.

This concludes our deep dive into norm and unorms when used inside textures. I hope this demystifies norm and unorm types further. As always, please feel free to share your thoughts and ask questions below or in our MSDN concurrency forum.

Blog - Comment List MSDN TechNet
  • Loading...
Leave a Comment
  • Please add 7 and 7 and type the answer here:
  • Post