I often think about what I would change in C++ were I given carte blanche to update the language. One thing in particular has been on mind in recent months—the register keyword.
The register keyword was originally intended to allow C programmers to hint that particular values be stored in registers rather than in memory. Because early C compilers had fairly primitive optimization heuristics, this keyword provided a useful hint that could sometimes significantly improve performance. Variables marked with the register keyword were semantically the same as those without it (with the exception of address-of semantics for it makes no sense to take the address of a register; C++ does not have this restriction).
Modern C++ compilers typically ignore the register keyword entirely. Today’s optimizers are quite capable of deciding for themselves which values to enregister and which to store in memory. With the Visual C++ 8.0 compiler, I have yet to see a case where a programmer’s register hints would improve on the optimizers choices. The truth is that modern compiler technologies render moot the original semantics of the register keyword.
The Problem
C++ currently supports three intrinsic binary floating-point (BFP) formats:
float, for single precision numbers
double, for double precision numbers
long double, for double or extended precision numbers
It is generally more desirable to compute intermediate results in as high as precision as is practical. For instance, even though every operand in an expression might be single precision its preferable to compute the expression in double or extended precision, rounding to single precision only when assigning the final result to storage. Consider a simple summation expression:
const int n = 50;
double A[n];
double sum = A[0] + A[1] + A[2] + . . . + A[n-1];
When this expression is compiled for IA64 using VC++ 8.0, all of the intermediate results from each step in the addition will be accumulated at extended precision. This avoids the additional error that would otherwise be incurred if at each sub-expression results were narrowed. However, writing out the summation algorithm like this isn’t very convenient or flexible. It’s much more natural to program summation using a loop.
for (size_t i=0; i<n; i++)
sum = sum + A[i];
Unfortunately this doesn’t have exactly the same semantics because the assignment operation inside the loop body will force intermediate results to be narrowed to double precision. In fact, because VC++ 8.0 doesn’t have an extended precision type (long double maps to double precision), there’s currently no way to write this summation in a loop without losing the extra accuracy provided by the extended precision results.
A Solution
What if we added the register keyword to floating-point declarations? For example, what might a programmer intend with:
register double x;
I propose that this use of the register keyword mean the following: the variable x has register precision that is at least as precise as double. This would allow us to write the summation loop while retaining intermediate precision:
register double sum; // on IA64, this is now an extended precision variable
for (size_t i=0; i<n; i++)
sum = sum + A[i];
This semantic enables programmers to declare functions and variables with the same precision as is used for intermediate expression values. The type of sum would be “register double” instead of “double”. For example, the IA64’s extended precision registers would mean that the sizeof(register double) would be ten bytes instead of the eight bytes as with sizeof(double). On AMD64, operators with double precision operands are computed using double precision registers; thus sizeof(register double) in AMD64 would be eight-bytes.
These new types introduce an interesting question. Consider the following code:
double a, b, c, d;
d = a + b + c;
It seems natural that the types of the intermediate values in the sub-expressions (a+b) and ((a+b)+c) would implicitly now be “register double”. The assignment operator implicitly narrows the “register double” value to “double”. What should happen if we modify the code as follows?
register double a, b, c;
double d;
d = a + b + c;
In this code, the types of the intermediate expression are now explicit rather than implicit. Should the assignment from an explicit “register double” value to a “double” variable require an explicit cast? To answer this question let’s look at the expected behavior of the following code:
double a, b, c;
float d;
d = a + b + c;
With the existing language semantics, the assignment operation implicitly narrows the results of the expression to single precision without requiring a cast, though the compiler may issue a warning (C4244). This is perfectly reasonable since it would be quite inconvenient to require an explicit cast in such cases. Its natural to have the same behavior when we assign register-double values to double variables. Programmers can easily eliminate any compilation warning either by introducing an explicit cast or by disabling the warning with a compiler switch or pragma.
We should also be able to modify floating-point parameters and function return types with the register keyword. Consider a function that computes the length of the hypotenuse of a right triangle:
inline double hypotenuse(double a, double b)
{
return a*a + b*b;
}
This simple function has two serious flaws. The values passed into the function and the return value are all narrowed to double precision (even when the function is inlined). This means that following code
long double c;
double w,x,y,z;
...
c = hypotenuse(w+x,y+z);
is effectively transformed into
c = (double)( double(w+x)*double(w+x)
+ double(y+z)*double(y+z) )
Because the higher precision intermediate results are lost entirely, overall accuracy of the expression is reduced. Moreover, this semantic can lead to poor performance because the compiler must introduce explicit operations to ensure that the intermediate results are properly narrowed. The register keyword semantics that I’m proposing ameliorate this problem. By rewriting the function as
inline register double hypotenuse(register double a, register double b)
{
return a*a + b*b;
}
the programmer can now explicitly avoid the implicit narrowing of intermediate results. Thus, the inlined expression effectively becomes
c = (w+x)*(w+x) + (y+z)*(y+z);
wherein all intermediate values are retained at register precision.
I’m tired now, so I’ll pick up this topic a little later. Please feel free to post comments and criticism.