Eric Fleegal's WebLog

. . . .

Fused Multiply Add Question

Sergey asks "Is there any chance that in MSVC++ 2005 Fused Multiply Add (FMA) function will be available as a part of runtime library on all supported platforms?"

To my knowledge FMAs won't be supported on all platforms since not all platforms have FMA instructions (some platforms don't even have floating point units!).  However, I think we're moving in that direction.  I'll try to find out for you with respect to the x86 and x64 platforms.  In VC++, FMAs will be used by default under the fp:precise when they're available on the architecture.  I believe that the fp:strict model precludes FMAs since it potentially violates strict FPU status semantics and exception semantics.  For instance, there are values for a*b+c where the fused operation returns a valid answer but the unfused operation overflows.   

An interesting side point since I'm on the subject.  On ia64's FPU and many other modern FPU architectures, separate multiply and add instructions aren't provided.  Two floating point registers are reserved to hold the values 0 and 1 respectively.  Simple addition is accomplished by using the 1 valued register as one of the multiply arguments; similarly simple multiplication is accomplished by using the 0 valued register as the addition argument. 


 

Published Friday, December 17, 2004 5:45 PM by ericflee

Comments

 

Sergey said:

Thank you very much for the answer!
FMA is also a quite useful for adaptive precision robust geometric predicates, etc.
The typical usage is to calculate exactly a multiplication result.
Let a,b and c are floating-point values. fma(a,b,c) returns (a*b) + c, rounded as one ternary operation: it computes the value (as if) to infinite precision and rounds once to the result format.
Therefore the value z=fma(a,b,-a*b) is a difference between a*b calculated in floating-point arithmetic and a*b calculated to infinite precision! If the fma function was available even on the platforms without hardware (x86), it would simplify implementation of a number of interesting algorithms.
Currently to workaround the lack of hardware support both a and b have to be split into aHi/aLo and bHi/bLo and multiplied in the following way:

x=a*b;
err1 = x-aHi*bHi;
err2 = err1-aLo*bHi;
err3 = err2-aHi*bLo;
return aLo*bLo-err3;

If implemented as a part of runtime library the algorithm would not suffer from compiler optimization. :(
December 21, 2004 12:31 PM
Anonymous comments are disabled

© 2009 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker