What s Different in the Revised Language Definition?

Implicit Boxing



Ok, so we reversed ourselves. In politics, that would likely loose us an election. In language design, it means that we imposed a philosophical position in lieu of practical experience with the feature and, in practice, it was a mistake. As an analogy, in the original multiple inheritance language design, Stroustrup decided that a virtual base class sub-object could not be initialized within a derived class constructor, and therefore the language required that any class serving as a virtual base class must define a default constructor. It is that default constructor that would be invoked by any subsequent virtual derivation.


The problem of a virtual base class hierarchy is that responsibility for the initialization of the shared virtual sub-object shifts with each subsequent derivation. For example, if I define a base class for which initialization requires the allocation of a buffer, the user-specified size of that buffer might be passed as an argument to the constructor. If I then provide two subsequent virtual derivations, call them inputb and outputb, each provides a particular value to the base class constructor. Now, when I derived a in_out class from both inputb and outputb, neither of those values to the shared virtual base class sub-object can sensibly be allowed to evaluate.


Therefore, in the original language design, Stroustrup disallowed the explicit initialization of a virtual base class within the member initialization list of the derived class constructor. While this solved the problem, in practice the inability to direct the initialization of the virtual base class proved impracticable. Keith Gorlen of the National Institute of Health, who had implemented a freeware version of the SmallTalk collection library called nihcl, was a principle voice in convincing Bjarne that he had to come up with a more flexible language design.


A principle of Object-Oriented hierarchical design holds that a derived class should only concern itself with the non-private implementation of its immediate base classes. In order to support a flexible initialization design for virtual inheritance, Bjarne had to violate this principle. The most derived class in a hierarchy assumes responsibility for all virtual sub-object initialization regardless of how deep into the hierarchy it occurs. For example, inputb and outputb are both responsible for explicitly initializing their immediate virtual base class. When in_out derives from both inputb and outputb, in_out becomes responsible for the initialization of the once removed virtual base class, and the initialization made explicit within inputb and outputb is suppressed.


This provides the flexibility required by language developers, but at the cost of a complicated semantics. This burden of complication is stripped away if we restrict a virtual base class to be without state and simply allow it to specify an interface. This is a recommend design idiom within C++. Within C++/CLI, it is raised to policy with the Interface type.


Here is a real code sample doing something very simple – and in this case, the explicit boxing is mostly a lexical tax without representation.


      // original language requires explicit __box operation

int my1DIntArray __gc[] = { 1, 2, 3, 4, 5 };

      Object* myObjArray __gc[] = { __box (26) , __box (27) , __box (28) , __box (29) , __box (30) };


      Console::WriteLine( "{0}\t{1}\t{2}", __box (0) ,


              __box(my1DIntArray->GetUpperBound(0)) );


As you can see, there is a whole lot of boxing going on. Under T2, value type boxing is implicit [note that all T1 to T2 translations are output of the mcfront tool]:


      // revised language makes boxing implicit

array<int>^ my1DIntArray = {1,2,3,4,5};

array<Object^>^ myObjArray = {26,27,28,29,30};


      Console::WriteLine( "{0}\t{1}\t{2}", 0,

   my1DIntArray->GetLowerBound( 0 ),

   my1DIntArray->GetUpperBound( 0 ) );


Boxing is a peculiarity of the .NET unified type system. Value types directly contain their state, while reference types are an implicit duple: the named entity is a handle to an unnamed object allocated on the managed heap. Any initialization or assignment of a value type to an Object, for example, requires that the value type be placed within the managed heap this is where the image of boxing it arises first by allocating the associated memory, then by copying the value type s state, and then returning the address of this anonymous Value/Reference hybrid. Thus, when one writes in C#


object o = 0; // C# implicit boxing


there is a great deal more going on than is made apparent by the simplicity of the code. The design of C# hides the complexity not only of what operations are taking place under the hood, but also of the abstraction of boxing itself. T1, on the other hand, concerned that this would lead to a false sense of efficiency, puts it in the user s face by requiring an explicit instruction,


Object *o = __box( 0 ); // T1 explicit boxing


as if in this case one had any choice, or that it particularly matters when one is invoking Console::WriteLine. In my opinion, forcing the user to make an explicit request in these cases in at best the equivalent of one s mother repeatedly demanding as one is trying to leave the house, now you will be careful, won t you? Or, if you like, the child in the back seat asking, five minutes out from the house, are we there yet? In both cases, we are not questioning the sincerity behind the intent. And that is why boxing is implicit under T2:


Object ^o = 0; // T2 implicit boxing


There are side-effects to implicit boxing, of course. One of which being that the above initialization is not setting the object to null, but to address a boxed instance of the integer value zero. This requires the introduction of some entity that can represent a tracking handle to no object. Everyone s original choice, of course, is null, and lucky for C# they could start from scratch and introduce just such a keyword. Adding a paradigm to an existing language presents a few more constraints think of the somewhat analogous problem of turning fins into legs, or introducing lungs, as marine life moved onto land. In any case, my original choice was the refnull, which one can t champion with any real enthusiasm, and that has evolved over a year and a half into nullptr, as in:


Object ^o = nullptr; // T2 initialize tracking handle to refer to no object


[A T1 to T2 translation Head s Up] As I mentioned in an earlier post, this presents something of a bother for those moving their code from T1 to T2 since all comparisons and assignment/initialization of 0 change semantics because of implicit boxing. mcfront, the translation tool, attempts to automagically do the right thing, but certain cases such as calling an overloaded method requires a great deal of type analysis that goes beyond the original scope of the tool which consists of a parse engine, abstract syntax tree hierarchy (called an MCTree), and a tree-walker (called an Ent) to generate the T2 source-level code. It s just a question of having sufficient time to add the necessary type checking semantics, which are not an aspect of the parse engine component. If you are doing the transition by hand, it is something you need to watch out for.  [End of Head s Up]


Let me conclude by putting this is the context of our two earlier metaphors of (1) Kansas and Oz as representing native and managed Object Model behaviors, and (2) the two-faced Janus image as representing the design face of C++/CLI.


  1. T1 did not provide implicit boxing. Why? Simply put, we were thinking Kansas , not Oz. We were looking through the wrong Janus pair of eyes. And this resulted in an inelegance and sense of complexity for our users. T2 addresses this imbalance.

  1. On the other hand, T1 did provide direct access of the boxed value on the managed heap, since the alternative is not acceptable performance-wise. The lesson here is that without some Kansas thinking -- without using that set of Janus eyes -- system programming is not practicable.

For example, if one were to write a simple word-counting program that represents a map where the word is represented as a string key and the count as an integral value, then each increment of the count requires a downcast unboxing of the existing value and subsequent reboxing of the new value into a new heap object. Languages that have no performance characteristics are quick to ridicule performance concerns and usually stoop to the mockery of saying, it hardly matters how fast a program is if it simply returns an incorrect value faster, suggesting that concerns with performance lead to bad programs and, even worse, bad programmers. Implicitly, these people are condemning C and C++, and using that condemnation to promote the sales of their languages. But here is an example of where performance and correctness are not adversaries, but rather partners in providing a street savvy system's programming language for .NET.


disclaimer: This posting is provided "AS IS" with no warranties, and confers no rights. The opinions expressed are those of the author.