Eric Fleegal's WebLog

. . . .

  • Performance as a Design Consideration

    For many years now I've been frustrated by industry development practices that approach software performance in an ad hoc manner rather than as a design issue.  Every time I argue in favor of considering a feature’s potential performance issues up front rather than after the fact, the response is consistently one or more of the following:

     

    1. Moore's law will solve most performance issues; its not worth our time
    2. All performance problems are bugs; find and fix them at the end
      Or, “premature optimization is the root of all evil” (Hoare’s dictum)
    3. The performance trade-offs are worth the gains

    If you’re a programmer, you’ve no doubt heard these arguments before.  They are especially engrained in the culture of commercial software development where performance need only be “good enough”; where time-to-market and new feature development take priority over all else. 

     

    In my early career at Microsoft, I joined a team building a WYSIWYG editor designed for programming. The concept was very cool, but the performance of the early prototype was so atrocious that it was practically unusable. In early 2000, a few of us on the team suggested that we needed a fundamental redesign to address performance issues.  The immediate response from the “architects” was brutal.  We were called myopic for not recognizing that Moore’s law would more efficiently solve any performance problems we could fix.  They insisted that the performance tradeoffs of their design were worth the benefits they offered.  And they harshly admonished us for not recognizing that performance problems are “bugs” to be discovered and dealt with at the end of a product cycle.  Naturally I left the team within weeks of that meeting.

     

    By the end of 2000, after a nearly a decade of very expensive research and development, the project was cancelled owing to a failure to demonstrate a usable prototype.  Performance was important after all, and while I lamented the death of a promising idea, I must admit to a little schadenfreude over its demise.

     

    Customers care about performance.  They care a lot about performance, and consequently we developers should care about it too, and we should do so from the start.  There are critical problems with the three common arguments against an a priori approach to developing efficient software.

     

    Moore’s Law No Longer to the Rescue

    In the 1990s, successful development houses typically didn’t invest much design time on performance improvements because Moore’s law often rendered such efforts superfluous.  In the early 90s I was on a development team that spent months improving the performance of our software. By the time we finished, CPUs had more than doubled in speed and we lost out to competitors who had instead focused their efforts on additional features.  That lesson was hard learned by many in the field, and it accounts for why it’s been so difficult to unlearn.

     

    The problem with depending on Moore’s law is that in recent years we’ve not seen the speed doubling effects of circuit density improvements that we saw in the nineties.  Circuit density continues to double every year and a half, or so, but the concomitant growth of CPU speed no longer seems to hold[1]. We can no longer rely on processor improvements to make up for our poor design choices. Moore’s law can no longer rescue us from bad design.

     

    Hoare’s Dictum Misconstrued

    Tony Hoare’s dictum, “premature optimization is the root of all evil”, has become the excuse of choice for putting off optimization until the end of a product cycle.  To many developers, the dictum has an almost scriptural quality to it, with the power to cut off debate whenever it’s quoted.  Unfortunately it’s been misconstrued.

     

    The dictum was popularized by Donald Knuth’s 1974 article “Structured Programming with Goto Statements”[2] in which he explored the use of structured programming to improve program readability and correctness.  Although the Böhm-Jacopini theorem had previously proven the equivalence of structured programs, in 1974 most programmers still eschewed structured methods because of perceived performance costs.  Knuth’s study showed that in most cases the benefits of readability and correctness outweigh the trivial performance gains of programming unstructured code. 

     

    The full version of Hoare’s dictum is "forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil" and I generally agree with this.  But 97% is not 100%, and 3% of code in a two-million line project is still about sixty thousand lines of code.  It’s virtually impossible for a team, at the end of a product cycle, to identify and fix performance problems with sixty thousand lines of code without enormous effort and extensive redesigns.  If they instead consider efficiency during early design and throughout coding, their efforts at the end of a product cycle can be focused on more productive tasks like fixing real bugs.  Charles Cook, the redoubtable English blogger, put it most aptly:

    “A good software developer will do this automatically, having developed a feel for where performance issues will cause problems. An inexperienced developer will not bother, misguidedly believing that a bit of fine tuning at a later stage will fix any problems.” [3]

     

    Put another way, small inefficiencies can really add up and are costly to fix after the fact.  In the mid-nineties I joined an enterprise software team that had developed an excellent product that was to save the company millions of dollars.  Unfortunately its release was delayed because it was too slow on the computers already deployed in the field.  I was hired, in part, to help identify and fix these problems.  Upper management was convinced that there was something wrong with the team.  After a few weeks of careful analysis, it became clear that the performance problems were NOT the result of algorithmic bottlenecks, poor programming, bad programmers, nor of bad overall design.  Indeed, the code was some of the cleanest I’d ever seen, the algorithms solid and the team very competent and bright.  The performance problems were instead caused by hundreds of small coding trade-offs, consciously made by developers to improve readability or structure.  No single trade-off was measurably bad but in aggregate they created substantial performance problems.  It took several months for the team to redesign and rework their individual feature areas to eliminate the overall performance problems. 

     

    A Trade-Off Is Not Always A Trade-Up

    On large projects, I’ve noticed that every feature team thinks that their ideas are essential and that their particular performance tradeoffs are well worth the overall advantages they offer.  While it’s often true that a single design choice may have only a very small cost, in aggregate all those little tradeoffs often add up to huge performance barriers.  It’s critically important to measure the cost of each performance trade-off, not only relative to the feature itself but in the context of overall product execution speed.  A single design choice may cost only a few thousand cycles every now and then, but a few hundred of these trade-offs can quickly compromise overall performance.

     

     

     

    References

    [1] http://wi-fizzle.com/compsci/cpu_speed_Page_2.png

    [2] http://pplab.snu.ac.kr/courses/adv_pl05/papers/p261-knuth.pdf

    [3] http://www.cookcomputing.com/blog/archives/000084.html

     

    Randall Hyde wrote an excellent article for the ACM, titled “The Fallacy of Premature Optimization”.  http://www.acm.org/ubiquity/views/v7i24_fallacy.html

     

    I also offer the following as examples where the modern software design practice of putting off efficiency has resulted in very unhappy customers.

     

     

    • Vista really is slow unless you throw a lot of hardware at it,
      Vista: Slow and Dangerous, Business Week Mar 15th, 2007
    • I have only one question for Bill Gates. Does he recall a book he wrote, called Business@the Speed of Thought. Because the title is suggestive, and if that’s what he indeed believes in, I would like to know whether or how Microsoft’s latest operating system “Vista” subscribes to this vision.
      Why is Vista so slow on uptake?, Business Standard, July 17th 2007
    • Vista is actually fairly bug free. Where it falls down is in the performance category. 
      Top 5 Things Microsoft Must Fix In Windows Vista In 2008,
      Information Week, Nov 28th 2007

     

  • Simplifying C++ NULL terminated string handling

    I wrote the following short article several years ago.  I've reproduced it here by request 

     

    Once of the curious features of the C language is its lack of an integrated string type.  Most programming languages developed in the 1960/70s included a basic string type. Strings in C, however, are just a special case of array data and the only direct language support involves initialization of pointers and array values with string literals. 

     

    Fortunately, the C++ standard library introduces a standardized string type, std::string.  Although its implementation is very well thought out, naïve use of std::string can fragment the heap, reduce performance and create unexpected bottlenecks.  However, the same can be said for naïve use of strings in languages like Java and C#, where the String type is fundamental and most of the implementation details are hidden from the programmer.  In C++ there are ways to mitigate potential performance problems. When I employ std::string, I prefer to use a memory model very similar to boost’s segregated storage; this reduces memory fragmentation and keeps all the string data within a sandbox.  It’s fairly easy to do this once you know how to write a standard allocator.  The strstream class in C++ is an efficient and elegant solution to complex string construction, and I prefer it over Java’s somewhat clumsy StringBuilder class.  If you prefer the printf way of formatting strings, the boost library offers an excellent type-safe alternative that’s built to work efficiently with standard stream classes.

     

    Systems programmers and developers of high performance applications typically use C style strings.  There are a number of reasons for this, but chief amongst them are efficiency and interoperability.  C strings are efficient precisely because they are simple -- they can be allocated on the stack or as part of a larger structure or in the free-store, and operations on them can be specifically tailored for maximum efficiency.  Moreover, C strings are usually not optional when interacting with operating system APIs, drivers, low level libraries and legacy code. 

     

    Being able to program with C strings is a fairly fundamental skill for most Microsoft developers; indeed, a good number of the coding questions we ask during technical interviews involve some sort of C string manipulation.  Being conversant with C string manipulation and the concomitant standard library functions is a point pride for many Microsoft programmers, especially with those who cut their teeth in C instead of C++.  Indeed, these C string "fanboys" are sometimes critical of those developers who prefer std::string over character arrays and raw character buffers (I am not one of them).

     

    I find that the worst part of programming with C strings is the string library itself.  Take a very basic function like strcpy, for instance.   In the early days the length of a symbol name was limited to just a few characters, so we can forgive the designers for selecting a somewhat less than human readable name.  When there was only one way to copy a string, the name strcpy wasn’t a bad choice.  However today there are dozens of different variations on the string copy function name available to Visual C++ programmers.  Here are 24 of them from <string.h> and <mbstring.h>:

    strcpy, wcscpy, _mbscpy, _tcscpy,

    strcpy_s, wcscpy_s, _mbscpy_s, _tcscpy_s,

    strncpy, wcsncpy, _mbsncpy, _tcsncpy,

    _strncpy_l, _wcsncpy_l, _mbsncpy_l, _tcsncpy_l,

    strncpy_s, wcsncpy_s, _mbsncpy_s, _tcsncpy_s,

    _strncpy_s_l, _wcsncpy_s_l, _mbsncpy_s_l, _tcsncpy_s_l

    There are versions for four different character sets: ASCII, Mbcs, Unicode and TCHAR.  There are safe and unsafe versions, locale specific versions, and versions with additional semantics.  Now multiply those semantic variations against the dozen or so basic string operations and you have hundreds of different names to try to remember!

     

    The function names in <strsafe.h> are a little more regular, but there are still dozens of names to remember.  There are 24 of them for copying a string:

    StringCbCopy, StringCbCopyA, StringCbCopyW,

    StringCbCopyEx, StringCbCopyExA, StringCbCopyExW,

    StringCbCopyN, StringCbCopyNA, StringCbCopyNW,

    StringCbCopyNEx, StringCbCopyNExA, StringCbCopyNExW,

    StringCchCopy, StringCchCopyA, StringCchCopyW,

    StringCchCopyEx, StringCchCopyExA, StringCchCopyExW,

    StringCchCopyN, StringCchCopyNA, StringCchCopyNW,

    StringCchCopyNEx, StringCchCopyNAEx, StringCchCopyNWEx

    To complicate matters, Strsafe.h is incomplete; it has neither multi-byte character support nor support for locale specific functions [locale specific functions have been added since the time this article was authored].  Moreover, the strsafe naming style is only available for those functions needing to prevent buffer overruns.  String operations such as comparison and collation, which are already safe, have no implementation in this library.

     

    I think there’s really only one name I should have to remember for each basic string operation.  In this case, that name would be “Copy”—overloaded for each different semantic variation of the operation, but with a uniform scheme for parameterization and return values.  The compiler should do all the work of figuring out which variation to use.  Since the name “Copy” is applicable to more than just strings, we should declare the name within a “Strings” namespace.  Since some variations are unsafe, we should introduce a counterpart namespace “UnsafeStrings” to make it very explicit when choosing to use an unsafe version of a string function.  We declare the functions in a namespace instead of a class so that the library can be extensible.  This also makes factoring the implementation code into different files a little easier.

     

    For an initial example, the basic safe Copy operations for each of three string types would be declared as follows:

     

          namespace LibraryName

          {

    namespace Strings

          {

                errno_t Copy(char* destination, size_t destinationSize, const char* source);

                errno_t Copy(unsigned char* destination, size_t destinationSize, const unsigned char* source);

                errno_t Copy(wchar_t* destination, size_t destinationSize, const wchar_t* source);

          }

          }

     

    These are the basic copy functions for ASCII, Mbcs and Unicode respectively.  Each of these overloaded versions of Copy simply dispatches to its counterpart in the standard library.  For example, the ASCII version is:

     

          inline errno_t Strings::Copy(char* destination, size_t destinationSize, const char* source)

          {

                return ::strcpy_s(destination, destinationSize, source);

          }

     

    The counterpart unsafe Copy functions would be declared as follows:

     

          namespace LibraryName

          {

          namespace UnsafeStrings

          {

                using Strings::Copy;

                errno_t Copy(char* destination, const char* source);

                errno_t Copy(unsigned char* destination, const unsigned char* source);

                errno_t Copy(wchar_t* destination, const Unicode::Char* source);

          }

          }

     

    Notice that the safe versions of Copy are composited into the UnsafeStrings namespace with a using declaration.  This is done for convenience and makes both Strings or UnsafeStrings a complete name-set.   

     

    The Unsafe definitions of Copy will need additional parameter checking to ensure that the function semantics are uniform with the safe versions.  In practice this doesn’t usually introduce much of a performance barrier since, depending on context, the compiler can often optimize away these additional parameter checks when the function gets expanded inline.  The cost of the parameter checking is trivial compared to the cost of the copy.

     

          inline errno_t UnsafeStrings::Copy(char* destination, const char* source)

    {

          if (!destination || !source)

                return EINVAL;

          ::strcpy(destination, source);

    return 0;

    }

     

    The different semantic variations take on a very regular form.  For instance, to copy a limited number of characters we simply declare additional overloads as follows

           

           namespace LibraryName

           {

    namespace Strings

           {

                  errno_t Copy(char* destination, size_t destinationSize, const char* source, size_t maxCount);

                  errno_t Copy(unsigned char* destination, size_t destinationSize, const unsigned char* source, size_t maxCount);

                  errno_t Copy(wchar_t* destination, size_t destinationSize, const wchar_t* source, size_t maxCount);

           }

     

    namespace UnsafeStrings

           {

                  using Strings::Copy;

                  errno_t Copy(char* destination, const char* source, size_t maxCount);

                  errno_t Copy(unsigned char* destination, const unsigned char* source, size_t maxCount);

                  errno_t Copy(wchar_t* destination, const wchar_t* source, size_t maxCount);

           }

           }

     

    As with the earlier variations, the implementation of these simply dispatch to the correct counterpart function in the standard library.  Again, the unsafe versions will need a little additional code to perform some parameter checking. 

     

    The locale specific variations of Copy are similarly implemented. 

     

    It’s convenient to add safe versions of Copy specifically for arrays.

     

    namespace Strings

    {

    template <size_t TSize>

    inline errno_t Copy(char (&destination)[destinationSize], const char *source)

    {

    return Copy(destination, destinationSize, source);

    }

    }

     

    Other string operations are similarly easy to define.  Consider, for instance, the 28 different name variations for functions comparing two strings:

    strcmp, wcscmp, _mbscmp, _tcscmp,

    _stricmp, _wcsicmp, _mbsicmp, _tcsicmp,

    _stricmp_l, _wcsicmp_l, _mbsicmp_l, _tcsicmp_l,

    strncmp, wcsncmp, _mbsncmp, _mbsncmp_l ,

    _tcsnccmp, _tcsncmp, _tccmp,

    _strnicmp, _wcsnicmp, _mbsnicmp,

    _strnicmp_l, _wcsnicmp_l, _mbsnicmp_l,

    _tcsncicmp, _tcsnicmp, _tcsncicmp_l

    As with the copy functions, the compare functions have versions for four different character sets, ASCII, Mbcs, Unicode and TCHAR.  There are locale specific versions, versions with case insensitive comparison semantics and some versions with different names but identical semantics. 

     

    As with Strings::Copy, the string comparison operation should have only one name, Compare.  The declaration for the basic Compare functions are

     

          namespace LibraryName

          {

    namespace Strings

          {

                int Compare(const char* string1, const char* string2);

                int Compare(const unsigned char* string1, const unsigned char* string2);

                int Compare(const wchar_t* string1, const wchar_t* string2);

          }

     

          namespace UnsafeStrings

          {

                using Strings::Compare;

    // there are no unsafe specific versions of Compare

          }

          }

     

    Each of these overloaded versions of Compare simply dispatches to its counterpart in the standard library.  For example:

     

          inline int Strings::Compare(const char* string1, const char* string2)

          {

                return ::strcmp(string1, string2);

          }

     

    The function strcmp and other standard string comparison functions have undefined behavior when passed bad parameters.  This makes them unsuitable as predicate operations for sorting algorithms and ordered containers.  Historically this was done for performance reasons since parameter checking was considered “expensive” due to the extra branch operations – this defense is somewhat dubious since comparing two strings is relatively much more expensive than the parameter checking.  Accordingly, we redefine the Compare functions with an alternate semantic—one that is well-ordered for any two string arguments, NULL or not.

     

    // function Compare(a,b)

    // Compares two strings by lexicographically

    // returns 

    //     <0 when a < b

    //      0 when a == b

    //     >0 when a > b

    // except when either a or b are NULL, then

    //     <0 when a==NULL && b!=NULL

    //      0 when a==NULL && b==NULL

    //     >0 when a!=NULL && b==NULL

    inline int Strings::Compare(const char* a, const char* b)

    {

        if (!a)

            return b ? -1 : 0; 

        if (!b)

            return +1;           

        return ::strcmp(a, b);

    }

     

    The new added semantics are: two NULL strings are equal, and a NULL string is considered “less than” one that isn’t NULL.  This means that if the Strings::Compare function is used in a predicate operator, the NULL strings will be sorted forward.  The compiler can sometimes optimize away the parameter checking when the function is inlined.

     

    Case insensitive comparison requires an additional tag type.  Its introduced in the following code as enumCaseInsensitive, and the tag names CaseInsensitive and CASE_INSENSITIVE.

     

          namespace LibraryName

          {

    namespace Strings

          {

                enum enumCaseInsensitive { CaseInsensitive, CASE_INSENSITIVE };

     

                int Compare(const char* string1, const char* string2, enumCaseInsensitive);

                int Compare(const unsigned char* string1, const unsigned char* string2, enumCaseInsensitive);

                int Compare(const wchar_t* string1, const wchar_t* string2, enumCaseInsensitive);

          }

     

          namespace UnsafeStrings

          {

                using Strings::Compare;

          }

          }

     

    As with the case sensitive version, this version of Compare simply dispatches to the appropriate counterpart in the standard library, adding the same NULL semantics as before.  For example:

     

    inline int Strings::Compare(const char a, const char b, enumIgnoreCase)

    {

        if (!a)

            return b ? -1 : 0; 

        if (!b)

            return +1;           

        return ::stricmp(a,b);

    }

     

    To use the case insensitive version of Compare, the calling code simply passes in the Strings::CaseInsensitive tag.  I usually bring the identifiers “CaseInsensitive” or “CASE_INSENSITIVE” into the current namespace with a using directive. 

     

    using Strings::CaseInsensitive;

    . . .

    if ( Strings::Compare(name1, name2, CaseInsensitive) < 0 )

    {

          . . .

    }

     

    Conclusion

    The completed Strings library contains the following functions: Append, Collate, Compare, CompareOrdinal, Convert, Copy, Find, IsEqual, IsLessThan, IsGreaterThan, Length, PrintF/VPrintF, PrintFLength, Replace, ScanF/VScanF, and Tokenize.  In all, this Strings library has hundreds of functions but only sixteen function names to remember. 

     

    Incidentally, the similarity to the function naming convention in the C# String class is no coincidence. 

  • How to do Object Properties in C++

    One of the many useful features of modern languages like C# are object properties, as they provide a higher level of encapsulation than public fields.  The field-like syntax is far easier to read and write than traditional C++ GetXXX and SetXXX functions. 

    It’s surprising how many people don’t know that Visual C++ has properties too.  Microsoft added property fields into C++ as a language extension back in the days when COM programming was all the rage.  As with most C++ language extensions, the syntax is a bit clumsy; this one uses a Microsoft specific __declspec compiler directive.  The syntax is:

    __declspec ( property ( get=nameOfGetFunction, put=nameOfSetFunction ) ) typeExpressing propertyName

    When the compiler sees a data member declared with this attribute on the right of a member-selection operator ("." or "->"), it converts the operation to a get or put function, depending on whether such an expression is an l-value or an r-value. In more complicated contexts, such as "+=", a rewrite is performed by doing both get and put.  A property can also be declared read-only or write-only by specifying only the get or put function respectively. 

    To make life a little easier, I introduce a header file “C++ Properties.h” with the following macros:

    #define PROPERTY(TYPE, NAME) __declspec(property(get=Get##NAME,put=Set##NAME)) TYPE NAME

    #define READONLY_PROPERTY(TYPE, NAME) __declspec(property(get=Get##NAME)) TYPE NAME

    #define WRITEONLY_PROPERTY(TYPE, NAME) __declspec(property(put=Set##NAME)) TYPE NAME

    Notice that these macros use preprocessor token pasting so that the get and put functions always map to GetXXX and SetXXX, where XXX is the name of the property.  This allows us to declare classes with properties in a very readable form; for example:

    class GamePad

    {

    public:

            . . .

            READONLY_PROPERTY(bool, IsConnected);

            bool GetIsConnected() const;

            . . .

    };

    While not quite as elegant as the built in property syntax in C#, it’s not a bad substitute.

    You can declare virtual C++ properties simply by making the getter and/or setter methods virtual.  Similarly, abstract properties can be defined by using pure virtual getter and/or setter methods.  For example:

    class GamePad

    {

    public:

            . . .

            READONLY_PROPERTY(bool, IsConnected);  // virtual property

            virtual bool GetIsConnected() const;

            READONLY_PROPERTY(float, PollingRate);  // abstract property

            virtual float GetPollingRate() const = 0;

            . . .

    };

    Note that the const semantics for the property are determined by the getter or setter method.

    Array semantics are also supported. The syntax is basically the same, but with an added “[]”, as follows:

    __declspec ( property ( get=nameOfGetFunction, put=nameOfSetFunction ) ) typeExpressing propertyName[]

    The accessor function simply needs to take an index argument.  Although we can use our existing macros for arrays, as follows

    class GamePad

    {

    public:

            . . .

            READONLY_PROPERTY(ButtonState, Buttons)[];

            ButtonState GetButtons(size_t buttonIndex);

            . . .

    };

    // And used like:

    GamePad gamePad;

    . . .

    ButtonState buttonState = gamepad.Buttons[3];

    I find it somewhat less confusing to have additional macros in “C++ Properties.h”

    #define ARRAY_PROPERTY(TYPE, NAME) __declspec(property(get=Get##NAME,put=Set##NAME)) TYPE NAME[]

    #define READONLY_ARRAY_PROPERTY(TYPE, NAME) __declspec(property(get=Get##NAME)) TYPE NAME[]

    #define WRITEONLY_ARRAY_PROPERTY(TYPE, NAME) __declspec(property(put=Set##NAME)) TYPE NAME[]

    Changing the above class into:

    class GamePad

    {

    public:

            . . .

            READONLY_ARRAY_PROPERTY(ButtonState, Buttons);

            ButtonState GetButtons(size_t buttonIndex);

            . . .

    };

    The array access functions can also be multidimensional:

    class Picture

    {

    public:

            . . .

            READONLY_ARRAY_PROPERTY(Color, Pixels);

            Color GetPixels(unsigned int x, unsigned int y);

            . . .

    };

    // And used like:

    Picture picture;

    . . .

    Color colorAt = picture.Pixels[x][y];

    Because Properties provide a higher level of encapsulation than public fields, I often find myself exposing private fields through const properties.

    class GamePad

    {

    public:

            . . .

            READONLY_PROPERTY(bool, IsConnected);

            bool GetIsConnected() const { return isConnected_; }

            . . .

    private:

            bool isConnected_;

    };

    Just like traditional accessor functions, this enables internal members to change the isConnected_ state while exposing the state to the public scope as a const property.  This pattern is so very common that I introduce an explicit property macro for it:

    #define READONLY_PROPERTY_RVALUE(TYPE, NAME, RVALUE_EXPR) \

    __declspec(property(get=Get##NAME)) TYPE NAME; \

    TYPE Get##NAME() const { return RVALUE_EXPR; }

    For symmetry I also add the following two macros, though admittedly they’re rarely used (and many of my colleagues hate them).

    #define PROPERTY_VALUE(TYPE, NAME, RVALUE_EXPR, LVALUE_EXPR) \

    __declspec(property(get=Get##NAME,put=Set##NAME)) TYPE NAME; \

    TYPE Get##NAME() const { return RVALUE_EXPR; } \

    void Set##NAME(TYPE newValue) { LVALUE_EXPR = newValue; }

    #define WRITEONLY_PROPERTY_LVALUE(TYPE, NAME, LVALUE_EXPR) \

    __declspec(property(put=Set##NAME)) TYPE NAME; \

    void Set##NAME(TYPE newValue) { LVALUE_EXPR = newValue; }

    Although it would preferable for C++ properties to have a cleaner built-in syntax, these macros provide enough of an abstraction to enable use of properties without sacrificing readability.

     

  • #pragma once

    Most C++ compilers now support the non-standard #pragma once compiler directive.  This directive instructs the compiler to #include the file only once in a single compiland, and replaces the old C-style header sentinels (often called #include guards).

    The central problem with preprocessor based header sentinels is that they require the user to create a unique symbol to identify each and every header file that might be #included by a single compiland.   On very large projects, this burden becomes somewhat painful.  Consider also the distinct possibility that two libraries might contain public header files with the exact same name; using the typical __FILENAME_H__ convention, its very possible to run into name collision between the two libraries.   Some project teams attempt to avoid this problem by imposing a strict sentinel naming standard, usually including a file's path as part its sentinel name.  I'm not fond of this solution because if a header file needs to be moved, as occurs when refactoring a library, it requires that the header file be edited to conform to its new location.  The #pragma once directive avoids all this nonsense entirely.

    A secondary problem is one of efficiency.  The #pragma once directive allows the compiler to avoid opening and preprocessing a header file after its been seen once.  Although it is technically possible for a compiler to implement a similar mechanism when it encounters the header sentinel pattern (GCC can do this for instance), the mechanism is a bit fragile because it depends on a user following an exact coding pattern for the optimization to work correctly.  I’ve encountered code patterns in library header files that appear to be the header-sentinel pattern, but are in fact not.  Moreover, the compiler must account for the fact that preprocessor symbols can be explicitly #undef’d.  I much prefer the explicit directive because it states exactly the intention of the programmer -- "include this header only once".

    Unfortunately for users of GCC, this compiler directive has been deprecated (although it’s still supported last time I checked).  It’s my personal opinion that this is yet another case of Gnu’s pervasive NIH syndrome (they got it bad), however the official reason is that the construct is not portable (which I admit is technically true).  I do not understand why the standards committee hasn’t adopted it into the ISO standard.  It’s a trivial compiler feature to implement.  Although its not an official language feature, most C/C++ compilers support it.

  • Eschew Obfuscation

    While some may think that naming conventions are much ado about nothing, no other subject of coding standards evokes as much fervent discourse.

     

    When I first started programming for Windows back in college (ca 1990), I was baffled that all these really bright programmers at Microsoft would use cryptic symbol names like LPCWSTR and crgpcsz (called Hungarian notation).  In short, it’s a naming convention that incorporates a symbol’s type information into its name using a series of short prefix identifiers.  At the time, this seemed anathema to everything I was being taught about producing readable, maintainable code.  Indeed, this notation presented a particularly difficult barrier for me when entering the world of Windows programming, and played no small part in my choosing to be a Unix developer for many years.

     

    In 2000 I had the privilege of working in Microsoft Research for Dr. Charles Simonyi, the inventor of Microsoft Word and yes, the infamous Hungarian notation.  When I asked Charles about his popular notation and why he proposed such a confusing convention, he got this amused look on his face and told me that most people had actually missed the point entirely.  His intention, he explained, was not simply to conflate type information into the name of a symbol; rather, he wanted to free the developer from the burden of name selection, a “frustrating and time consuming task”.  His premise was that if two programmers, using the same convention, would independently choose the same name for the same program text, then both goals of readability and write-ability have been served.  Readability, he argued, becomes a natural artifact of write-ability, and thus emphasis on the latter is rightly placed.  Although I remained skeptical, I had to admit that given the historically weak type safety of C and its lack of name encapsulation, it wasn’t difficult to understand the broad appeal of Charles’ proposal amongst early Microsoft programmers.  Indeed, to this day some developers at Microsoft adhere to Hungarian notation with near religious fervor.

     

    My personal experience is that Hungarian notation tends to obfuscate rather than to illuminate; that different programmers using Hungarian do not independently choose the same names for the same program text for the same reason that different programmers don’t usually choose to write algorithms in identical ways.  There are often many ways to implement the same algorithm using different structures and types.  To make matters worse, independent teams inevitably choose subtle style variations, further increasing confusion and inhibiting long term maintainability.  Over time even a single team’s notation will evolve such that each successive generation of code will look progressively different from legacy code.  I recently joined a team at Microsoft with a continuous product line that’s more than twenty years old.  This team’s codebase includes some legacy C components so old that they look entirely different than code within recent years.  If they had instead chosen plain English words and phrases for symbol names, their old code would be just as readable as the new code (though still in C instead of C++).  And while I agree with Dr. Simonyi that naming a symbol with just the right words can be difficult, even frustrating at times, I think the effort pays off in more readable, maintainable code.

     

    In recent years I’ve had the joy of working with increasingly more programmers from Generation Y.  These brilliant “kids” cut their teeth on object oriented programming, have never had reason to use ancient editors like vi or Emacs, nor have they ever programmed without the aid of basic semantic tools like Intellisense.  To them, Hungarian notation is not just an anachronism; it’s a pedantic scheme that gets in the way of their efficiency and creativity.

     

    In twenty years of programming, I’ve found one thing to be universally true:  consistency, above all else, is crucial to writing readable, maintainable code.  Consistency between programmers on the same team as well as consistency for the same programmer from year to year.

     

    Last year I proposed as simple naming convention for my team’s C++ development.  It can be summarized quite succinctly.  “Use consistent, meaningful English names that reflect the object described or action being taken.  Name types, functions, properties and namespaces LikeThis, variables and parameters likeThis, private fields likeThis_ and C++ macros LIKE_THIS.”  Notice the intentional similarity to the very practical naming convention for CLR development.  However, C++ is different enough from C# to necessitate a few changes.

     

    The details of my C++ naming proposal are as follows:

     

    Casing Styles Defined

     

    UpperCamelCase : the first letter in the identifier and the first letter of each subsequent, concatenated word are capitalized.  You can use UpperCamelCase for identifiers of three or more characters.  No underscores are used.  For example: DeviceLock, Scene, TabScene

     

    camelCase : The first letter of an identifier is lowercase and the first letter of each subsequent concatenated work is capitalized.  No underscores are used. For example: deviceLock, scene, tabScene

     

    UPPER_CASE : All letters in the identifier are capitalized.  Concatenated words are separated by an underscore.

     

    Type Names

    Type names in C++ include class, struct and interface identifiers, enum typenames, and typedefs.  In general, type names should be noun phrases, where the noun is the entity represented by the type.  For example, Button, Stack and File each have names that identify the entity represented by the type.  Choose names that identify the entity from the developer’s perspective; names should reflect usage scenarios.  Use these guidelines:

     

      • Use UpperCamelCase
      • Use nouns, noun phrases or occasionally adjective phrases.  Do not use verbs.
      • Consider ending the name of a derived class with the name of the base class.
      • Prefix interfaces with the letter I.  Do not prefix class names with the letter C, nor structs with the letter S, nor template class names with the letter T.
      • For a class that simply implements an interface, consider ending the class with the interface name, sans the I prefix.
      • Do not use abbreviations, except those that are commonly recognized (Io, Ctrl etc).

    Template parameter names

    Template parameter names are a bit of a special case.  Choose descriptive names for template parameters, unless a single-letter name is completely self explanatory and a descriptive name would not add value (consider using the letter T in such cases).

     

      • Use UpperCamelCase
      • Prefix the parameter name with the letter T.  Although template parameter names are usually types, it’s usually important to differentiate them from non parameterized names.  Someday our integrated development environments may provide a nice method of doing this without mangling the name; say, by displaying templated name in italics for instance.
      • Consider indicating semantic constraints placed on a type parameter in the naming of the parameter.  For instance, a parameter constrained to the type ISignInMessageReceiver may be called TSignInMessageReceiver. 
      • Use nouns or noun phrases for object types and object instances and verbs for functor or function object parameters.
      • Do not use abbreviations except those that are commonly recognized.

    Enumeration value names

    In C#, references to enumeration value names must be prefixed by the enumeration type name.  Unfortunately, C++ has no such requirement; only the value name is referenced.  To accommodate this in our naming convention, follow these rules:

     

      • Use camelCase
      • Declare enum types within a scope appropriate to its usage
      • Use nouns, noun phrases or occasionally adjective phrases.  Do not use verbs.
      • Do not use abbreviations except those that are commonly recognized.
      • Optional: many C++ developers prefer to prefix enumeration value names with the name of the enumeration type. 
        For instance, enum State { stateIdle, stateReading, stateWriting }. 
        The prefix should not be an abbreviation of the enum typename.

    Preprocessor symbol names.

    Preprocessor symbols (macros) do not exist in C#, so the CLR naming convention offers little guidance.  The C++ industry standard is to declare macros LIKE_THIS.

     

      • Use UPPER_CASE
      • #undef temporary macro names
      • Do not use abbreviations except those that are commonly recognized.
      • Macro names should be at least three characters

     

    Method and function names

    Methods are actions upon an object and their names should employ verbs and verb phrases.  Do not select a name that describes how the method operates; in other words, do not use implementation details for your method names.

     

      • Use UpperCamelCase
      • Use verbs or verb phrases
      • Do not use implementation details in a method’s name
      • Consider prefixing event handling methods with “On”, such as “OnInitialize”

     

    Field names

    Although the C# naming convention proscribes exposing any fields with public or protected protection, and recommends UpperCamelCase for private fields, I have found this to be a bit clumsy to employ in C++.  Moreover, its inconsistent with C#'s variable naming rules.  Instead, I propose the following for field names.

     

      • Use camelCase
      • Use descriptive names, typically nouns (though function object fields may be verbs)
      • Use plural names for collection fields rather than suffixing with the container type.  Ex: “names” instead of “nameList”
      • Some developer prefix private class fieldnames with “m_” (as with MFC classes). I personally prefer to suffix private class fieldnames with an underscore likeThis_, as does Alexandrescu and other prominent C++ developer/authors.  Perhaps someday most IDEs allow us to visually identify member field names (say, with italics for instance) to differentiate them from other names; until then this kind of name decoration has shown to be useful.
      • Do not decorate field names with type information (as in Hungarian notation). 

    C++ Property Names

    Did you know that Microsoft’s C++ has properties?  Well it does, though the syntax is a bit clumsy.  It uses a special __declspec directive. I usually define a set of MACROs that ameliorate the clunky syntax (I will write on that more in another blog posting).

     

      • Use UpperCamelCase
      • Use nouns, noun phrases or adjectives.
      • Prefix Boolean property names with Is, Can, Has etc. where it contributes to readability.
      • Prefer Boolean property names with affirmative phrases (CanSeek instead of CantSeek).
      • Properties imply simple value lookups or trivial computations so do not use a property when non-trivial computations are involved.  Instead use a method.
      • Prefer to name the "getter" and "setter" functions with GetPropertyName and SetPropertyName respectively (though this is not strictly necessary). 

    Parameters and auto/local variable names

     

    ·         Use camelCase

    ·         Use descriptive names which reflect how the variable will be used

    ·         Do not decorate names with type information (as in Hungarian notation)

    ·         Use nouns, noun phrases or an adjective, except for function objects

    ·         Use plural names for collection/container fields rather than suffixing with the container type.  Ex: “names” instead of “nameList”

    ·         Avoid declaring variables in the global scope; instead declare them as variables within an appropriate namespace and follow the naming convention for C++ Properties. 

    ·         Do not declare a variable with the static keyword at global scope; this is a deprecated language feature.  Instead, use an anonymous namespace.

     

    Namespaces

     

      • Use UpperCamelCase
      • Use nouns or noun phrases
      • Do not use generic type names that might conflict with class names (eg. Element, Node, Log, Message)
      • Consider using plural names where appropriate: eg. Strings instead of String
      • Do not use the same name for a namespace as a type within the namespace.
      • Do not place application specific namespaces within the namespace of a shared library namespace.

    Use of Acronyms

    Acronyms are generally proscribed, as they reduce readability especially to those programmers for whom English is a second language.  However, an acronym may be used if it is generally recognized by your programming community and if it doesn’t reduce readability.  Examples include DB, IO, Xml, Cpu, Gpu, Html, etc.

     

      • Capitalize both characters of two-character acronyms, except the first word of a camelCase identifier.  Example: DB, IO for UpperCamelCase and ioChannel for camelCase.
      • Capitalize only the first character of acronyms with three or more characters, except the first word in a camelCase identifier.

     

  • The trouble with long double

    Back in ‘04 I made a prediction that 80 bit floating point values would likely be supported in some future version of the VC++ compiler (just like we did in the 16bit version of the compiler!).  Alas, it’s now ’07 and much to my disappointment this will not come to pass; so much for my foresight.  There are a number of technical reasons for this, not the least of which is that implementing the feature requires more than just a change to the compiler.  It is my strong impression, however, that Microsoft would have solved these issues had customers more aggressively clamored for this particular feature. 

    The point is now rapidly becoming moot.  80-bit doubles seem so “last century”.  We are approaching the day when most new Windows systems will have FPUs that easily support the 128bit long double format.  It seems natural to expect, or rather for our customers to demand, that 128bit long double semantics be fully supported in some future version of C++.

    So lobby away people! http://msdn2.microsoft.com/en-us/visualc/aa336397.aspx  

    128bit long doubles would be a nice complement to the .NET 128 bit decimal floating point type.

  • Typesafe method of interfacing to DirectX shaders

    I thought this idea might be of interest to DirectX C++ programmers.

    Typesafety is perhaps the most critical feature of higher programming languages, and yet so often application programming interfaces introduce non-typesafe constructs. This is frequently the case with low-level APIs. A great example of this is DirectX’s APIs for setting GPU registers for vertex and pixel shaders. They include methods like these:

    D3DVOID SetVertexShaderConstantF(
      UINT StartRegister,
      CONST float *pConstantData,
      DWORD Vector4fCount
    );

    D3DVOID SetPixelShaderConstantF(
      UINT StartRegister,
      CONST float *pConstantData,
      DWORD Vector4fCount
    );

    StartRegister specifies the base register number, pConstantData should point to the value(s) to be loaded into the registers where each register is four floats and Vector4fCount specifies the number of registers to which the API should write.

    This API is intentionally generic.  But it lacks most of the type safety C++ offers. Consider the following simple cases representing aberrant uses of the API:

    const float fValue = 1.5f;
    const D3DVECTOR4 v4Value(1.0f, 2.0f, 3.0f, 1.0f);

    // case 1
    g_piDevice->SetVertexShaderConstantF(0, &fValue, 1);
    // case 2
    g_piDevice->SetVertexShaderConstantF(2, (float*)fValue, 4);

    In case 1, the y, z and w components of vertex constant register 0 will be loaded with unintended values. In case two, the values in register 3, 4 and 5 will be overwritten with garbage values since the Vector4Count parameter is wrong. In neither of these cases will the compiler provide an error or warning that something’s wrong.

    The easiest way to add some type safety is to provide a function that is overloaded on the value to be written. For the simple cases above, we could introduce the following:

    inline void SetVertexShaderConstantF(UINT RegisterID, float value)
    {
        D3DXVECTOR4 vTemp = { value, 0, 0, 0 };
            // note: on xbox, use XMVECTOR
        g_piDevice->SetVertexShaderConstantF(RegisterID, &vTemp, 1);
    }
    inline void SetVertexShaderConstantF(UINT RegisterID, const D3DVECTOR4& value)
    {
        g_piDevice->SetVertexShaderConstantF(RegisterID, (float*)value, 1);
    }

    Some compilers, like the one on Xbox 360, are able to provide additional optimizations when parameter values to inline functions are literals. To ensure that this optimization is available when using our typesafe versions, we could add a template that passes the RegisterID as a literal instead of as a variable:

    template <UINT TRegisterID>
    inline void SetVertexShaderConstantF(float value)
    {
        D3DXVECTOR4 vTemp = { value, 0, 0, 0 };
        g_piDevice->SetVertexShaderConstantF(TRegisterID, &vTemp, 1);
    }

    . . .

    // used this way
    SetVertexShaderConstantF<0>(value);

    If the register traits of the target GPU are known a priori, we can further refine this idea by introducing a compile time constraint on the template parameter TRegisterID. On Xbox 360, this value must be in the range 0…255. To constrain this at compile time on the xbox 360 we can use the _STATIC_ASSERT macro:

    template <UINT TRegisterID>
    inline void SetVertexShaderConstantF(float value)
    {
        _STATIC_ASSERT(0<=TRegisterID && TRegisterID<=255);
        D3DXVECTOR4 vTemp = { value, 0, 0, 0 };
        g_piDevice->SetVertexShaderConstantF(TRegisterID, &vTemp, 1);
    }

    An error will now be generated at compile time if the programmer uses this function with a register id that is out of range. (NOTE: If static assertions are not available in your environment, you can use or build something like the boost library’s BOOST_STATIC_ASSERT. )

    Aside from type safety, it would be nice if our interface provided a simple, typesafe way to declare all the registers needed for a particular shader. Let me show you the basic way I do this for vertex shaders:

    class CVertexShader
    {
    public:
        template <class TDataType, UINT TRegisterID>
        class CConstant
        {
        public:
            inline void operator=(const TDataType& value)
            {
                SetVertexShaderConstantF<TRegisterID>(value);
            }
        };
    };

    Notice that sizeof(CVertexShader::CConstant) is zero. This is important because we don't want our strategy to impose any additional memory requirements.

    Derived classes can then easily describe a typesafe program interface to a vertex shader. For example:

    class CSimpleVertexShader : public CVertexShader
    {
    public:
    	CConstant< XMMATRIX,0 > mWorld;
    	CConstant< XMMATRIX,4 > mView;
    	CConstant< XMMATRIX,8 > mProjection;
    	CConstant< XMVECTOR, 12 > vEyePositionW;
    };
    
    . . .
    // used this way
    CSimpleVertexShader Simple;
    . . .
    Simple.mWorld = mWorld;
    			

    Essentially, this provides a compile-time name binding of a particular vertex shader’s registers that is both type-safe and convenient to use.

    I also enhance class CVertexShader to add run-time binding to a particular instance of a loaded or compiled vertex shader. This looks something like the following (I’ve omitted the runtime assertions and state checking for brevity):

    class CVertexShader
    {
    protected:
        CInterfacePtr<IDirect3DVertexShader9> m_piVertexShader;
    public:
        IDirect3DVertexShader9* operator -> () { return m_piVertexShader; }
        operator IDirect3DVertexShader9* () { return m_piVertexShader; }

        HRESULT Set() { return g_piDevice->SetVertexShader( m_piVertexShader ); }

        HRESULT Load(const char* pszFilename);   
        HRESULT Load(const wchar_t* pszFilename);   
        HRESULT Compile(const char* pszCode);   
        HRESULT Compile(const wchar_t* pszCode);   
    };

    The implementation for the concomitant CPixelShader is nearly identical. For brevity's sake I've left out the details, but I’ll be happy to post the complete code if anyone requests it.

  • Stanley110 asks...

    Sorry I haven't had time to post in quite a while.  I’ve been developing the new Xbox Live Arcade for the upcoming Xbox 360, a project that has taken considerable time and effort.

     

    Stanley110 asks:

    A) Excel uses double precision. What benefit with respect to accuracy or precision of the calculated result is there with respect to single precision?

    B) Electronic calculators use single precision? Is this true?

    C) Aside from computer speed and things like that, is there any difference in the accuracy and precision of an aritmetic calculation when it is by double precision than by single precision.

     

    Short Answer to Question A: The precision benefits of double-precision over single precision are exactly as the name suggests: at least double the precision of single precision.  Accuracy is a different question altogether.  With carefully coded algorithms, single precision can yield very accurate results; however, most users (even most computer scientists) are not trained to devise such algorithms for all but the simplest cases.  Most of the time (though not always), if you perform the same computation in double and single precision, the double precision result will usually be more accurate.  I hate to make a blanket statement like that, so PLEASE note the qualifiers before flaming me with email :-)

     

    More info on floating point
    http://en.wikipedia.org/wiki/IEEE_floating-point_standard

    http://en.wikipedia.org/wiki/Talk:Computer_numbering_formats

     

    Accuracy and precision are not the same thing

    http://en.wikipedia.org/wiki/Accuracy

    http://en.wikipedia.org/wiki/Talk:Accuracy_and_precision

     

    (yes, I am a fan of Wikipedia; its freaking brilliant!)

     

    Excel uses double precision mainly because it’s what’s available on most architectures.  Moreover, if care is taken to properly account for error, doubles are precise enough for many financial computations, especially the kind used by most Excel users.  However, even simple tax or interest computations can be perturbed by the use of double precision (so be careful and check your results!)

     

    Short Answer to Question B: No; well, maybe some cheap calculators given away in a box of Cap’n Crunch, but useful calculators will have at least 12 decimal digits of precision ( single prec has only about 7; log10(2^24) ).  Most inexpensive calculators use doubles or extended precision since the chips for it are fairly inexpensive.  Really nice calculators use extended double, quad-precision or even provide decimal-floating point precision with 28 decimal digits or more of precision.  Some even use rational number systems under certain circumstances to represent numbers like 1/10, 1/3 etc.  The built in Windows calculator, for instance, provides 32 decimal digits of precision and uses rationals for certain computations.

     

    Gossip: I heard a rumor that Excel may (soon?) provide computations using .NET’s decimal type.  But I haven’t been able to confirm this.  So naturally I must spread the rumor.

     

    Short Answer to C:  I recommend the articles above.  Keep in mind that on many systems, double precision computations are just as fast or even faster than single precision computations. 

  • Visual Studio 2005 Beta

    Several people have sent technical support questions to me.  I'm not ignoring you; I am interested in your comments.  However, I don't have time to answer technical support questions in my blog. 

    I would refer readers to http://lab.msdn.microsoft.com/vs2005/default.aspx for more information about the VS2005 beta-1 and beta-2.

    Please post your TECHNICAL SUPPORT questions in the community newsgroups.  Employees in the VS division routinely answer questions in these newsgroups.  Developers in the greater C++ community also participate in answering questions. 

  • Single Vs. Double ...

    It’s almost Christmas as evidenced by how dead the office is; it’s pretty common for people at Microsoft to take-off the week of Christmas, and sometimes the following week.  MS is fairly generous with holidays, giving us both Christmas Eve and Christmas day.  This allows people to spend only 3 vacation days for a full week’s vacation.

     

    A few readers have asked if double-precision is so much better than single, why do we even support single-precision? 

     

    There are a number of reasons, but perhaps the most important are the following:

    ·        singles are half the size of doubles, reducing memory & bandwidth requirements. 

    ·        singles are the same size as DWORDs and thus can be moved in an atomic operation using 32bit integer registers (on occasion this can be important when doing things like InterlockedExchange on floating-point numbers).

    ·        some platforms support intrinsic, complicated operations in single-precision (e.g. SSE) but not in double precision

    ·        some hardware platforms do not natively support in-chip double precision computation; on such chips double precision must be emulated.

    ·        Single precision computations are faster than double precision on some platforms.  However, it is NOT safe to assume that singles are always at least as fast as doubles.  If double-precision is the native precision, the extra cost of narrowing results to single-precision MAY be quite high… much higher, in fact, than the cost associated with the memory overhead of doubles (cache misses etc). 

    ·        some 32-bit HW supports only integer operations—it’s faster to emulate single precision on these platforms than double precision

     

    For many applications, it’s preferable to store input and final results in single precision, but to perform all the intermediate computations in the highest precision that is practical on the target architecture.

  • Fused Multiply Add Question

    Sergey asks "Is there any chance that in MSVC++ 2005 Fused Multiply Add (FMA) function will be available as a part of runtime library on all supported platforms?"

    To my knowledge FMAs won't be supported on all platforms since not all platforms have FMA instructions (some platforms don't even have floating point units!).  However, I think we're moving in that direction.  I'll try to find out for you with respect to the x86 and x64 platforms.  In VC++, FMAs will be used by default under the fp:precise when they're available on the architecture.  I believe that the fp:strict model precludes FMAs since it potentially violates strict FPU status semantics and exception semantics.  For instance, there are values for a*b+c where the fused operation returns a valid answer but the unfused operation overflows.   

    An interesting side point since I'm on the subject.  On ia64's FPU and many other modern FPU architectures, separate multiply and add instructions aren't provided.  Two floating point registers are reserved to hold the values 0 and 1 respectively.  Simple addition is accomplished by using the 1 valued register as one of the multiply arguments; similarly simple multiplication is accomplished by using the 0 valued register as the addition argument. 


     

  • Method for getting /Op like consistency in MS C++ 14.0

    (From Microsoft Visual C++ Floating-Point Optimization)

     

    Many C++ compilers offer a "consistency" floating-point model (through a /Op or /fltconsistency switch) that enables a developer to create programs compliant with strict floating-point semantics. When engaged, this model prevents the compiler from using most optimizations on floating-point computations while allowing those optimizations for non-floating-point code. The consistency model, however, has a dark-side. In order to return predictable results on different FPU architectures, nearly all implementations of /Op round intermediate expressions to the user specified precision; for example, consider the following expression:

      float a, b, c, d, e; 
      . . .
      a = b*c + d*e;

    In order to produce consistent and repeatable results under /Op, this expression gets evaluated as if it were implemented as follows:

      float x = b*c; 
      float y = d*e;
      a = x+y;

    The final result now suffers from single-precision rounding errors at each step in evaluating the expression. Although this interpretation doesn't strictly break any C++ semantics rules, it's almost never the best way to evaluate floating-point expressions. It is generally more desirable to compute the intermediate results in as high as precision as is practical. For instance, it would be better to compute the expression a=b*c+d*e in a higher precision as in,

     

    In short, the old /Op model trades away accuracy for consistency across platforms.  For nearly all numerical programs, accuracy is preferable. 

    This is precisely (no pun intended) the reason VC abandoned the consistency model altogether (and I fully expect other compiler makers to follow suite)

     

    There are however rare cases when consistency across platforms may be desired.  In obviating the /Op model, Microsoft C++ 14.0 (in VS8.0) no longer provides a simple command line switch to enable cross-platform floating-point consistency.  To get consistency across platforms, programmers will need to modify their source code.  In the case from the whitepaper (above):

      float a, b, c, d, e; 
      . . .
      a = b*c + d*e;

    the results of the expression b*c + d*e are dependant on the intermediate precision (i.e. the register precision) of the target platform.  To enforce consistency across all platforms, users will need to introduce explicit narrowing operations at each point in the computation (setting _controlfp won’t achieve the same results for reasons outlined in [1])

      float a, b, c, d, e; 
      . . .
      a = float(b*c) + float(d*e);

    Of course this is rather inconvenient to say the least.  A more convenient method is to introduce a new “wrapper” class that will implicitly enforce consistency semantics.  Such a class will enable the code to be rewritten as

      cfloat a, b, c, d, e; 
      . . .
      a = b*c + d*e;

    which is clearly a simpler and more elegant solution (I named it “cfloat” for “consistent float”).  By overloading the arithmetic operators for the wrapper class cfloat, we can make it behave as if it were the built in floating-point type. 

     

    We begin by introducing a new type that wraps the floating point types (I’ll only show single precision float here, however the same method would apply to a wrapper class for double or long double).

     

     class cfloat

     {

     public:

        float value;

        cfloat() {}

        cfloat(const cfloat& v) : value(v.value) {}

        cfloat(float v) : value(v) {}

        . . .

     }

     

     Similarly for types double and long doulble

     

    Naturally, we’ll need the assignment operators:

     

     class cfloat

     {   

        . . .

        cfloat& operator = (const cfloat& v)

        {

            value = v;

            return *this;

        }

     

        cfloat& operator = (float v)

        {

            value = v;

            return *this;

        }

     

        Similarly for each operator +=, -=, *=, and /=

      

     }

     

    We’ll also want to introduce explicit operators for narrowing values from double and long double precisions (note that long-double isn’t strictly necessary)

     

     class cfloat

     {   

        . . .

     

        explicit cfloat(const cdouble& v) : value((float)v.value) {}

     

        explicit cfloat(double v) : value((float) v) {}

     

        Similarly for each long double

      

     }

     

    Then, introduce versions of each arithmetic operation

     

     inline cfloat operator + (const cfloat& a, const cfloat& b)

     {

        return a.value + b.value;   

     }

     

     inline cfloat operator + (float a, const cfloat& b)

     {

        return a + b.value;   

     }

     

     inline cfloat operator + (const cfloat& a, float b)

     {

        return a.value + b;   

     }

     

     Similarly for -, *, and /

     

    Strictly speaking the 2nd and 3rd variations of the operators aren’t necessary, but they do make debugging a bit easier.

     

    When the methods of the cfloat class are nicely inlined, the runtime performance of this class should be no worse than under the old /Op model. 

     

     

  • Method for retaining intermediate results in extended precision

    I thought I’d post a brief entry on how to retain intermediate results in extended precision.

     

    Consider the following direct summation algorithm:

                   

    double sum(double[] a, int c)

    {

    1:    double result = 0.0;

    2:    for (int i=0; i<c; i++)

    3:        result = result + a[i];

    4:    return result;

    }

     

    Under fp:precise, the intermediate value of the addition operation (line 3) is held in register precision.  However, the assignment operation explicitly forces the compiler to narrow the intermediate result to the target precision of the left-hand-side of the assignment—in this case to “double”. 

     

    To retain the extra precision of the in-register result it would be convenient to declare result as:

                    long double result = 0.0;

    The summation would then be retained in long double precision and narrowed to double precision only when the function returns.  Unfortunately this doesn’t work in VC++ because long doubles are stored at the same precision as doubles.  Although this is not a violation of C++ typing rules[1], it is an inconvenience that many developers find rather annoying (myself included).  Until recently, very few Visual C++ customers requested extended precision long doubles (though that’s begun to change…  I promise I’ll write about this more in a later blog entry).

     

    Even though extended precision long doubles aren’t yet supported in VC++, there are some “tricks” we can use to retain intermediate results in extended precision using VC++ in Visual Studio 8.0.  Let’s look at how to do this for the summation algorithm above.

     

    For this algorithm what we really want is for the summation to be accumulated in a register—that is, we want the compiler to enregister the variable result.  It would be especially nice if we could explicitly imply this semantic in our source code by adding the register keyword to the declaration of result (see my earlier post).  However, there is a round-about way achieve this.  Under fp:fast semantics, the compiler is allowed to ignore the precision narrowing implication of the assignment operation on line 3; this relaxation of the narrowing rule permits the optimizer to enregister the variable result.   We can enable fp:fast semantics for a single function by wrapping it in float_control pragmas:

     

                    #pragma float_control(except, off, push)

          #pragma float_control(precise, off)

    double sum(double[] a, int c)

    {

          double result = 0.0;

          for (int i=0; i<c; i++)

                result = result + a[i];

          return result;

    }

          #pragma float_control(precise, off, pop)

     

     With this change, the variable result is allowed to be retained in a register at the full register precision. 

     

    BEWARE: there are caveats to this solution, specific to each target architecture:

     

    Targeting x86

    When targeting x86 with VC++, the default FPU register precision is set to use 53bit significands (essentially a double).  To obtain a higher precision summation on x86, we need to set the register precision to use the full 64bit FPU significand provided by the x87 FPU.  This is achieved by calling the _controlfp function, as follows, at some point before calling the function sum. 

                    _controlfp(_PC_64, _MCW_PC);

    This instructs the x87 FPU to use extended precision semantics in-register. 

     

    This trick works with the summation algorithm and for many other tight-loop algorithms (e.g. Kahan’s summation algorithm I showed in my FP whitepaper).  Keep in mind that it will not work for every algorithm on x86 because any “spilled” registers will still be truncated to 53bit double precision (more info).  It’s also important to be aware of any unsafe side-effect optimizations that might take place in the body of the loop.  I recommend this trick ONLY for x86 developers who are willing and able to inspect the assembly listing to check for the correct behavior.  This trick subverts the intended semantics of fp:precise and should be used with considerable care.

     

    Targeting ia64

    We have three advantages when using this trick on ia64.  Firstly, ia64 intermediate results are always in extended precision (64bits of precision in an 80bit result).  This means that the call to _controlfp required on x86 is not required on ia64. The second advantage is that the VC++ compiler for ia64 preserves the full 64bits of precision when spilling FPU registers to memory.  This means we don’t have to worry about the issues involved with register spilling.  The third advantage is that the ia64 has many more floating-point registers than the x86 so there’s more opportunity for the compiler to enregister variables under fp:fast semantics.

     

    There are however some distinct disadvantages.  Assembly code on ia64 can be very difficult to read making it extremely hard to verify that the compiler produced the desired code.  The other disadvantage is that the compiler may attempt to perform unsafe optimizations that it wouldn’t under fp:precise.  The scalar reduction optimization in particular is an example of this:

     

    The scalar reduction optimization would effectively transform the sum function into

     

    double sum(double[] a, int c)

    {

          double s0, s1, s2, s3;

          s0 = s1 = s2 = s3 = 0.0;

          int c4 = c & ~0x3;

          int i;

          for (i=0; i<c4; i+=4)
          {

                s0 = s0 + a[i];

                s1 = s1 + a[i+1];

                s2 = s2 + a[i+2];

                s3 = s3 + a[i+3];

          }

          for (; i<c; i++)

                s0+=a[i];

          return s0+s1+s2+s3;

    }

     

    This optimization essentially reorders the operands.  Its important to note, however, that the extra-precision afforded by enabling fp:fast-semantics for this function may make up for the loss of strict accuracy caused by the reordering incurred with scalar optimization.  Moreover, many datasets and algorithms may not be sensitive to operand reordering so the speed advantage of the scalar optimized code may be compelling.

     

    As with x86, ia64 developers need to be careful when mixing in fp:fast semantics. 

     

    Targeting amd64

     

    Unfortunately, this trick won’t work when targeting amd64 because the compiler will only use SSE registers for floating-point operations (which have a maximum 53-bit significand).

     

    ALTERNATIVE TO THE “TRICK“:

    This kind of a trick can work pretty well in many cases.  However, if you feel that it’s just too much work to validate the generated assembly code, or if you think it’s too risky, you can always use a compiler from another vendor for those particular functions needing extended precision long doubles; once compiled you can simply link the routines back into your VC++ application.  I much prefer VC++’s toolset over the competition and wouldn't care to abandon it wholesale over its lack of extended precision long doubles.    

     

     

     

     

    [1] C/C++ standards only require that long doubles be stored in at least the precision of double (ISO/IEC 9899:1999 §6.2.5.10)

     

  • Overloading the semantics of the register keyword (followup)

    John Albert asks two questions:

    What would "register float" mean?
    How would this work under fp:fast?

    The basic guideline is that "intermediate expressions will be computed in as high as precision as is practical". For instance, suppose for a particular architecture all double-precision computations have to be emulated (i.e. a chip that intrinsically supports only single precision computations). Under the guideline, its not at all practical to emulate double or extended precision when expressions contain only single precision operands. Thus for those architectures, “register float” would map to a single precision format corresponding to the intrinsic floating-point format. Higher precision expression would have to be emulated, so “register double” and “register long double” would map to the formats of the emulated expressions.

    Under fp:fast the type distinctiveness of “register float” would remain; however, the sizeof(register float) under fp:fast would be platform dependant. If the platform only supports a one precision level, then register float would map to that precision level. However, many platforms have multiple FPUs (e.g. x87 and SSE2 on Intel’s chips). Because fp:fast allows for unsafe optimizations, the compiler can legally choose to perform some FP computations on one chip at higher-precision and some on the other chip at lower-precision. Under fp:fast I think it’s reasonable to map “register float” to the least precise format used for computing single-precision operations.

    Its an interesting question; thanks John

  • No inline assembly on AMD64

    Igor Abramov wrote:

    “Well, I was in desperate need for 80-bit long doubles. Now I coded this type as set of asm inlines and application prototype shipped in such form. But the production version will be done for amd64, in 64-bit mode for performance reasons, and VS 2005 will not have inline assembler. This means rewriting some code again. “

    Because VC++ in VS2005 doesn't allow for inline assembly when targeting amd64, you'll have to rewrite the inline assembly routines as pure assembly and then call them from C++.  However, inline assembly is still permitted when targeting the x86 architecture.

    I'm no amd64 expert by any means, but I'm told that using x87 80bit registers might not work because the current version of the OS for amd64 doesn't save the x87 FPU state when context switching; this is one of several reasons why the C++ compiler targets SSE instructions instead of x87 instructions.  I have heard, but don't quote me on this, that future drops of the amd64 Windows OS will save the x87 FPU state.

    Perhaps someone reading this post could shed a brighter light on the issue.

More Posts Next page »

© 2009 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker