The elimination of the virtual mechanism in the invocation of a virtual function is in most cases trivial when measured against the elimination of any call invocation at all – that is, when the call is expanded inline. An inline expansion not only saves the overhead of the function call, but exposes a wider sequence of code to the aggressive peephole optimizer. This is because the virtual mechanism in C++ is not an operation of the executing environment but simply [in the simple case, anyway] the indirect execution of the function address stored in the associated virtual table. That is, one goes through the pointer to the virtual table within the pointer or reference to the polymorphic class object in order to execute the function addressed at a fixed slot within the virtual table [got that?].

 

In C++, a programmer can suppress a virtual call in two ways: directly use an object of the class, in which case the polymorphism of the object is eliminated except in the trivial case in which the subtype hierarchy is the same size as the class of the object being directly manipulated. [The analogy under .NET, although it is not supported, would be toggling a reference type into a value type for some small program extent, eliminating the overhead of the managed heap and the virtual mechanism of the interface.] Obviously, this is a very special use of the polymorphic object, and is as likely to be an error on the programmer’s part as to be his intention. However, the ability to design first class value types – think of them as Abstract Data Types – and value type inheritance is something that I sorely miss under .NET, where complex value types are in my experience somewhat gimped. The second and more prevalent mechanism to suppress a virtual call is to invoke a class method through the fully qualified class scope operator. For example,

 

            WidgetExtension::display() { Widget::display(); /* now our specialized display */ }

 

This pattern of localization within a call chain of a type-dependent method relies on the ability of the user to limit the number of methods invoked to the initial virtual instance, which can occur anywhere within the inheritance chain. The subsequent chain of base class calls are then inline expanded. Without explicit language support, the habit of programmers concerned with performance [I don’t have any hard data, so this is anecdotal] is to duplicate the base class code within the derived instance to achieve the same result. This of course tightly couples the implementation of the method with that of the base hierarchy and a single change in the state members can cause the whole thing to derail. [The state of OO optimization is not currently far enough along to guarantee the elimination of these calls although that is, of course, feasible in theory.]

 

In order to maximize the effect of suppressing a virtual function, it is still necessary that the function be declared as inline. [Although the trend seems to be towards aggressive inlining of any method, it is not clear to me that that is yet guaranteed.] This strikes some people – declaring a virtual function as inline -- and some of these people speak and write publicly – as a gross mistake, arguing as follows: because the address of a virtual function [at least in current instances of all C++ implementations I know] is always placed within the virtual table of the associated class, by declaring the function as inline, the compiler will be forced to generate an out-of-line instance for which an address can be taken. In order to avoid this, therefore, a virtual function, by its very nature, should never be declared as inline.

 

Sounds reasonable, but. It is only reasonable when the alternative to a non-inline declaration is an explosion of generated out-of-line function definitions. We know there has to be one in either situation – it is either explicit in the case of a non-inline virtual function, or implicit when declared inline. The question, then, is: how many more need to be generated? If the answer is none, then the objection to specifying the inline attribute is moot. As it happens, in the general case [making elbow room for the obligatory pathological case], it is possible for the compiler to limit the generation of both the virtual table and the out-of-line definition of the inline virtual functions to one definition in the single shared file in which the virtual table is generated. [At least this is what we did within cfront back in the late 1980s and so I presume it is a generally solved problem. I believe it was Andy Koenig who first proposed the solution; although it could have easily been Bjarne himself.]

 

 Object-Oriented programming is paradoxical in a way that raises passions beyond all reason. On the one hand, it promotes elegance and simplicity [the two are not synonyms although some people do not make that distinction] that is unrivaled. For example, a generic compositing engine that was in use at DreamWorks [the code is proprietary and so I have sufficiently dumbed it down out of all recognition] supported more than a dozen image formats with a generic routine such as the following (where move represents just one of many operations):

 

void GenericEngine::move()

{

     // Pseudo Loop code ...

     for clippedHeight and clippedWidth do

         source_engine->getNextPixel(&pixel);

         target_engine->setNextPixel(&pixel);

 

         source_engine->stepToNextRow();

         target_engine->stepToNextRow();

}

 

If you have ever looked at compositing code, this is actually sublime. The problem is that its performance was a dog. It took far too long to actually do the compositing. Fetching each pixel through a virtual function turns out to be non-viable. Consider: each frame of a film is composited at 2K resolution of an RGBA image format at 8 or16 bits per channel. [I may have completely zoned out on this detail.] A frame on average is composed of, say, 8 levels. [I just picked that out of a hat – Mickey’s Magician’s hat.] A film runs at 24 frames per second, and cartoons run about 88 minutes.

This is the kind of design that talks well but can’t survive in a production environment. It is just not usable. In the heat of production [we’re talking over a $100 million dollars], the solution was to hack in an isA type member, switch on that member, and invoke the RGBA methods explicitly. Wow. That will not get you a conference paper, but it will keep your job at the studio.  [Listen, all the philosophy is the world about what is good programming means nothing if the project is outsourced to a third world country.]

When I got my chance at being the technical lead at something called ToonShooter, our engineered compositing engine employed a template Strategy that allowed us to put back the elegance but eliminate the polymorphism:

template<typename Engine>

void Compositor<Engine>::move()

{

     for clippedHeight and clippedWidth do

            source_engine->getNextPixel(&pixel);

            target_engine->setNextPixel(&pixel);

       source_engine->stepToNextRow();

       target_engine->stepToNextRow();

}

 

Interestingly enough, the hardest part of implementing the solution was getting programmers to think about it. Inheritance has become an almost universal solution. Programmers seem to shy away from using templates: they feel they are too complicated and difficult to manage. There is a sense that templates result in code bloat, while OO programming not only produces a smaller code footprint but also is fundamentally a better programming style. As it happens, OO code in general tends to generate a greater amount of run-time code. This becomes especially crippling in applications in which lots of objects are created, destroyed, and copied.