A reader, Johan Ericsson, comments
re: Further Discussion of MI and general C++/CLI design issues
I appreciate your discussion of the multiple inheritance issue.
Is the performance problem of MI only inherent in virtual base clases?
I've used MI in a lot of my code, but I've never found the need for virtual base classes. I often use MI as a refinement of an interface.
Consider an interface class that will be implemented in a number of concrete classes. Most of those classes have the same implementation of many of the member functions of the interface. It has been convienient to provide a concrete implementation of the interface that can be used when the default behavior is desired. It's just a shame that I can't use the same techniques when writing .NET code. Instead, all members of an interface have to be explicitly implemented for each class that derives from the interface.
Oh well, I know that I'm complaining about something that is not the biggest deal.
I guess that with a common base class "Object", it is not possible to avoid the need for virtual inheritance in MI. Otherwise, I would just be really happy with non-virtual MI capability in the CLR.
Just to recapitulate. The overhead of virtual inheritance lies in the fact that the shared virtual base class sub-object is ephemeral in its position within each subsequently derived class, but only if it contains state members. Access to those member cannot be statically fixed [that is, a constant offset added to the beginning address of the object] but must be calculated at run-time. This fact led to a funny reversal of judgment by Stroustrup during his implementation of multiple inheritance.
The definition of a class object at global scope requires that its associated constructor be invoked prior to the first statement of main. This is actually a difficult if uninteresting problem [that is, it is an engineering problem rather than a problem within computer science] – or at least it was back in the early 1980s, particularly when one had to be portable across all flavors of Unix, and, potentially, run on both the Macintosh and DOS [windows was just a twinkle in Alan Kay’s eyes at the time]. The implementation solution Stroustrup invented within the language was to generate an __sti function and an __std function [static initialization and static destruction] within each separately compiled program text file [think .C or .cpp or .cc] that contained global class objects.
[As a challenge to the reader, can you come up with a strategy for naming these files such that one is guaranteed to generate unique names? Using the file name itself does not work, because in a complex project, files may have the same name but be positioned within independent directories. However, in the C++ language, there is one aspect of the language that is guaranteed to be unique within each program text file. That’s enough of a hint. [People don’t give Bjarne credit enough for just how smart he is because his genius is not abstract the way, say, Bertrand Meyers’ is; that is a prejudice that is pernicious and widespread in our culture and goes back to the Greeks.]]
In any case, the second problem, once the unique std and sti files are littered across the .o binaries is to (a) recognize their presence, (b) thread them together somehow, and (c) have the set of them executed in order prior to the beginning of main. We came up with three solutions: an ascii solution when we had to be portable across all Unix systems – we used nm which dump out the symbol table and grep to find any files that began with __sti and __std [I told you this was not an active area of computer science – whenever I was bored, I would add an identifier which began as __sti_ha! and watch the system die an ugly death … ] and generates an array of pointers to these functions within a named pair of tables which were then iterated across within a _main() function we inserted as the first function within main. A colleague of mine back at Bell Labs named Robert Murray invented this `portable’ solution and called it munch, if memory serves. For the Vax and 3B20 processors [enough said], Rob provided an alternative solution which used the ld library to actually inspect the a.out’s symbol table in memory and thread the set of calls. Over time, folks out in the emergent C++ community sent us versions for other processors, such as Sun, Cray, etc. Finally, the underlying object format was extended to support initialization sections, and that is where the sti and std files were placed.
So, with that background, we can get back to Bjarne’s funny reversal, and with that back to virtual inheritance which I wanted to briefly review before I answer Johan. What should be clear is that the sti/std functions provided a general solution for what is called static initialization – that is, initialization before the beginning of main. A number of folks had agitated Bjarne for an extension to C++ that would lift the C language restriction of global initialization to constant expressions. That is, to permit things such as
// illegal in C and in C++ prior to Release 2.0
int *pi = new int( 1024 );
int j = *pi;
Up until a certain point, the response had always been, no, which caused grumbling in certain circles. Then, one day, the grumbling stopped. The rule was relaxed. I believe, although I never explicitly asked Bjarne, that this is because the assignment of a derived class pointer or reference to a virtual base class pointer requires a run-time evaluation, and so it became more problematic to resist the more general case.
The problem with the whole static initialization issue is that there is no support within the language to deterministically specify the order of initialization across files. This can cause difficult to trace segment faults since they occur prior to the beginning of main. The main programming solution is something called a Schwarz counter, named for Jerry Schwarz who invented to guarantee the initialization of cout, cin, cerr, and clog in his Release 2.0 implementation of iostreams. Because of metadata and the CLR abstraction layer, this is not a problem with .NET programming.
OK. That’s preliminary to answering Johan’s question which asks about the overheads intrinsic to multiple inheritance. The overheads all revolve around the second and subsequent base classes, and this is because of (a) the necessary `this’ pointer adjustment to address the beginning of that base class subobject, and (b) the need for an additional virtual function table and virtual pointer within the derived class object [the arithmetic back then was that there is n-1 additional virtual tables [and the associated pointer to that table] where n is the number of base classes. In single inheritance, there are no additional tables.
The this pointer adjustment is a pernicious overhead with regard virtual function calls – the two general solutions are either a thunk [I believe this is the Visual C++ solution; it was at one point] or multiple entry points within a function in which the this-pointer adjustment takes place in one particular entry point [the quite fine IBM compiler on which Josee Lajoie worked chose this solution]. When people ask why Visual C++ did not provide the covariant return feature that was the first change voted on by the ANSI C++ committee, it is because of the difficulty of adjusting the this pointer under multiple inheritance on the returned pointer or reference to the class object. [It was finally implemented in Visual Studio 7.0, I believe. The name returned value optimization was finally implemented in the recent Visual Studio 2003, along with impressive conformance work, particularly on templates, that makes Visual C++ now one of the best C++ compilers.] The multiple virtual tables significantly complicates the implementations of pointer to member functions -- some people might say they cripple them.
If you are interested in actual numbers and a detailed analysis of the implementation issues surrounding the C++ Object Model, both are available in my Inside the C++ Object Model, which was published in 1994 and represents all I knew [for better or worse] about C++ implementations at that time. I began because Bjarne graciously invited me to join his Foundation/Grail project to implement the Object Model phase of their compilation system, and I discovered that, apart from his Annotated Reference Manual, this was an undocumented area of the language and provided a way for me to systematize what I was learning. [I know the SGI C++ compiler team found it helpful.]
At the time, all of us were making up implementations as we went along, and there has been an evolution in the virtual technologies, so to speak. For example, Visual C++ uses a virtual base class table analogous to the virtual table [and had even patented that]. Bjarne inserted virtual base class pointers within the class object. This turned out to be a mistake since, as the hierarchy deepens, access becomes more costly, which is an effect of the implementation and not of language feature itself. Some compilers that copied our implementation provide(d) a switch which allowed the user to choose between space and time in this regard – what they would do, when time was selected, was promote the virtual base class pointers to the derived class – but of course they could not elide the other instances and so you had a space hit of duplicated pointers. When I implemented the C++ object model in Bjarne’s Grail/Foundation compiler, he pointed out a superior implementation invented by Michael Tiemann [g++, Cygnus] which places the location of the virtual base class offsets within the virtual table. Michael Ball [Oregon Software/Sun] refined that to have the two grow in two directions, one hold the virtual base class offsets, the other holding the pointers to the virtual functions. [I’m sure the technology has evolved some since then; that was one of the beneficial side-effects of the standardization – having all the compiler people in one room – but in 1994 when Peter Weinberger on becoming head of Area 11 of Bell Labs canceled the Foundation project, I chose Disney Animation over IBM Yorktown and moved away from the C++ compiler community.]
There have been a number of attempts to standardize the binary implementation of the object model from the triviality of internal name munching to the virtual mechanisms, but that has never succeeded, and has been imo a serious miscalculation by the community since it prevents interoperability. One of the innovations of .NET is a common infrastructure across languages, particularly when they conform to the CLS.
Remarkably, this last comment brings me to a second question, this one from Daniel O’Connell:
re: A Question about Copy Constructors in C++/CLI
Hrmm, I can see the use and thanks for the explination of purpose, I am still concerned about the usage across other langauges, however like many language specific things, one will have to be careful when the intent is to provide a CLS compliant library. I brought the issue up primarily because of a recent newsgroup discussion. Throughout this discussion I argued against copy constructors in the framework as a matter of technical issue, in a fairly similar(tho less efficent) manner as you have here.
I will hunt other blogs and such sources to see if I can find any other information, hopefully somewhere I can find out if the feature will require ClsCompliant(falst) or any other specifics.
The public specification, unfortunately, does not have any details with regard the behavior of the copy constructor [or at least none that I could find.] I have read certain internal documents on the implementation for review, but they are neither fresh in my mind nor internalized, and so I would only badly damage their meaning if I attempted to parrot them. The key blogs to attend to with regard the new language design are Herb Sutter, who leads the design team and has worked magic, and Brandon Bray. Mark Hall and Jonathan Caves are the key compiler folks connected with the design and are ultimately responsible for the implementation. And Jeff Peil also contributed significantly to the design, but I don’t believe the latter three are publicly talking, although they all have much worth saying.