Eric Fleegal's WebLog

. . . .

Method for retaining intermediate results in extended precision

I thought I’d post a brief entry on how to retain intermediate results in extended precision.

 

Consider the following direct summation algorithm:

               

double sum(double[] a, int c)

{

1:    double result = 0.0;

2:    for (int i=0; i<c; i++)

3:        result = result + a[i];

4:    return result;

}

 

Under fp:precise, the intermediate value of the addition operation (line 3) is held in register precision.  However, the assignment operation explicitly forces the compiler to narrow the intermediate result to the target precision of the left-hand-side of the assignment—in this case to “double”. 

 

To retain the extra precision of the in-register result it would be convenient to declare result as:

                long double result = 0.0;

The summation would then be retained in long double precision and narrowed to double precision only when the function returns.  Unfortunately this doesn’t work in VC++ because long doubles are stored at the same precision as doubles.  Although this is not a violation of C++ typing rules[1], it is an inconvenience that many developers find rather annoying (myself included).  Until recently, very few Visual C++ customers requested extended precision long doubles (though that’s begun to change…  I promise I’ll write about this more in a later blog entry).

 

Even though extended precision long doubles aren’t yet supported in VC++, there are some “tricks” we can use to retain intermediate results in extended precision using VC++ in Visual Studio 8.0.  Let’s look at how to do this for the summation algorithm above.

 

For this algorithm what we really want is for the summation to be accumulated in a register—that is, we want the compiler to enregister the variable result.  It would be especially nice if we could explicitly imply this semantic in our source code by adding the register keyword to the declaration of result (see my earlier post).  However, there is a round-about way achieve this.  Under fp:fast semantics, the compiler is allowed to ignore the precision narrowing implication of the assignment operation on line 3; this relaxation of the narrowing rule permits the optimizer to enregister the variable result.   We can enable fp:fast semantics for a single function by wrapping it in float_control pragmas:

 

                #pragma float_control(except, off, push)

      #pragma float_control(precise, off)

double sum(double[] a, int c)

{

      double result = 0.0;

      for (int i=0; i<c; i++)

            result = result + a[i];

      return result;

}

      #pragma float_control(precise, off, pop)

 

 With this change, the variable result is allowed to be retained in a register at the full register precision. 

 

BEWARE: there are caveats to this solution, specific to each target architecture:

 

Targeting x86

When targeting x86 with VC++, the default FPU register precision is set to use 53bit significands (essentially a double).  To obtain a higher precision summation on x86, we need to set the register precision to use the full 64bit FPU significand provided by the x87 FPU.  This is achieved by calling the _controlfp function, as follows, at some point before calling the function sum. 

                _controlfp(_PC_64, _MCW_PC);

This instructs the x87 FPU to use extended precision semantics in-register. 

 

This trick works with the summation algorithm and for many other tight-loop algorithms (e.g. Kahan’s summation algorithm I showed in my FP whitepaper).  Keep in mind that it will not work for every algorithm on x86 because any “spilled” registers will still be truncated to 53bit double precision (more info).  It’s also important to be aware of any unsafe side-effect optimizations that might take place in the body of the loop.  I recommend this trick ONLY for x86 developers who are willing and able to inspect the assembly listing to check for the correct behavior.  This trick subverts the intended semantics of fp:precise and should be used with considerable care.

 

Targeting ia64

We have three advantages when using this trick on ia64.  Firstly, ia64 intermediate results are always in extended precision (64bits of precision in an 80bit result).  This means that the call to _controlfp required on x86 is not required on ia64. The second advantage is that the VC++ compiler for ia64 preserves the full 64bits of precision when spilling FPU registers to memory.  This means we don’t have to worry about the issues involved with register spilling.  The third advantage is that the ia64 has many more floating-point registers than the x86 so there’s more opportunity for the compiler to enregister variables under fp:fast semantics.

 

There are however some distinct disadvantages.  Assembly code on ia64 can be very difficult to read making it extremely hard to verify that the compiler produced the desired code.  The other disadvantage is that the compiler may attempt to perform unsafe optimizations that it wouldn’t under fp:precise.  The scalar reduction optimization in particular is an example of this:

 

The scalar reduction optimization would effectively transform the sum function into

 

double sum(double[] a, int c)

{

      double s0, s1, s2, s3;

      s0 = s1 = s2 = s3 = 0.0;

      int c4 = c & ~0x3;

      int i;

      for (i=0; i<c4; i+=4)
      {

            s0 = s0 + a[i];

            s1 = s1 + a[i+1];

            s2 = s2 + a[i+2];

            s3 = s3 + a[i+3];

      }

      for (; i<c; i++)

            s0+=a[i];

      return s0+s1+s2+s3;

}

 

This optimization essentially reorders the operands.  Its important to note, however, that the extra-precision afforded by enabling fp:fast-semantics for this function may make up for the loss of strict accuracy caused by the reordering incurred with scalar optimization.  Moreover, many datasets and algorithms may not be sensitive to operand reordering so the speed advantage of the scalar optimized code may be compelling.

 

As with x86, ia64 developers need to be careful when mixing in fp:fast semantics. 

 

Targeting amd64

 

Unfortunately, this trick won’t work when targeting amd64 because the compiler will only use SSE registers for floating-point operations (which have a maximum 53-bit significand).

 

ALTERNATIVE TO THE “TRICK“:

This kind of a trick can work pretty well in many cases.  However, if you feel that it’s just too much work to validate the generated assembly code, or if you think it’s too risky, you can always use a compiler from another vendor for those particular functions needing extended precision long doubles; once compiled you can simply link the routines back into your VC++ application.  I much prefer VC++’s toolset over the competition and wouldn't care to abandon it wholesale over its lack of extended precision long doubles.    

 

 

 

 

[1] C/C++ standards only require that long doubles be stored in at least the precision of double (ISO/IEC 9899:1999 §6.2.5.10)

 

Published Wednesday, July 21, 2004 5:30 PM by ericflee

Comments

 

hammad said:

hi i am hammad
August 6, 2004 9:43 PM
Anonymous comments are disabled

© 2009 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker