Stan Lippman's BLog

C++/CLI

Implicit Boxing

What s Different in the Revised Language Definition?

Implicit Boxing

 

 

Ok, so we reversed ourselves. In politics, that would likely loose us an election. In language design, it means that we imposed a philosophical position in lieu of practical experience with the feature and, in practice, it was a mistake. As an analogy, in the original multiple inheritance language design, Stroustrup decided that a virtual base class sub-object could not be initialized within a derived class constructor, and therefore the language required that any class serving as a virtual base class must define a default constructor. It is that default constructor that would be invoked by any subsequent virtual derivation.

 

The problem of a virtual base class hierarchy is that responsibility for the initialization of the shared virtual sub-object shifts with each subsequent derivation. For example, if I define a base class for which initialization requires the allocation of a buffer, the user-specified size of that buffer might be passed as an argument to the constructor. If I then provide two subsequent virtual derivations, call them inputb and outputb, each provides a particular value to the base class constructor. Now, when I derived a in_out class from both inputb and outputb, neither of those values to the shared virtual base class sub-object can sensibly be allowed to evaluate.

 

Therefore, in the original language design, Stroustrup disallowed the explicit initialization of a virtual base class within the member initialization list of the derived class constructor. While this solved the problem, in practice the inability to direct the initialization of the virtual base class proved impracticable. Keith Gorlen of the National Institute of Health, who had implemented a freeware version of the SmallTalk collection library called nihcl, was a principle voice in convincing Bjarne that he had to come up with a more flexible language design.

 

A principle of Object-Oriented hierarchical design holds that a derived class should only concern itself with the non-private implementation of its immediate base classes. In order to support a flexible initialization design for virtual inheritance, Bjarne had to violate this principle. The most derived class in a hierarchy assumes responsibility for all virtual sub-object initialization regardless of how deep into the hierarchy it occurs. For example, inputb and outputb are both responsible for explicitly initializing their immediate virtual base class. When in_out derives from both inputb and outputb, in_out becomes responsible for the initialization of the once removed virtual base class, and the initialization made explicit within inputb and outputb is suppressed.

 

This provides the flexibility required by language developers, but at the cost of a complicated semantics. This burden of complication is stripped away if we restrict a virtual base class to be without state and simply allow it to specify an interface. This is a recommend design idiom within C++. Within C++/CLI, it is raised to policy with the Interface type.

 

Here is a real code sample doing something very simple – and in this case, the explicit boxing is mostly a lexical tax without representation.

    

      // original language requires explicit __box operation

int my1DIntArray __gc[] = { 1, 2, 3, 4, 5 };

      Object* myObjArray __gc[] = { __box (26) , __box (27) , __box (28) , __box (29) , __box (30) };

 

      Console::WriteLine( "{0}\t{1}\t{2}", __box (0) ,

              __box(my1DIntArray->GetLowerBound(0)),

              __box(my1DIntArray->GetUpperBound(0)) );

 

As you can see, there is a whole lot of boxing going on. Under T2, value type boxing is implicit [note that all T1 to T2 translations are output of the mcfront tool]:

 

      // revised language makes boxing implicit

array<int>^ my1DIntArray = {1,2,3,4,5};

array<Object^>^ myObjArray = {26,27,28,29,30};

 

      Console::WriteLine( "{0}\t{1}\t{2}", 0,

   my1DIntArray->GetLowerBound( 0 ),

   my1DIntArray->GetUpperBound( 0 ) );

     

Boxing is a peculiarity of the .NET unified type system. Value types directly contain their state, while reference types are an implicit duple: the named entity is a handle to an unnamed object allocated on the managed heap. Any initialization or assignment of a value type to an Object, for example, requires that the value type be placed within the managed heap this is where the image of boxing it arises first by allocating the associated memory, then by copying the value type s state, and then returning the address of this anonymous Value/Reference hybrid. Thus, when one writes in C#

 

object o = 0; // C# implicit boxing

 

there is a great deal more going on than is made apparent by the simplicity of the code. The design of C# hides the complexity not only of what operations are taking place under the hood, but also of the abstraction of boxing itself. T1, on the other hand, concerned that this would lead to a false sense of efficiency, puts it in the user s face by requiring an explicit instruction,

 

Object *o = __box( 0 ); // T1 explicit boxing

 

as if in this case one had any choice, or that it particularly matters when one is invoking Console::WriteLine. In my opinion, forcing the user to make an explicit request in these cases in at best the equivalent of one s mother repeatedly demanding as one is trying to leave the house, now you will be careful, won t you? Or, if you like, the child in the back seat asking, five minutes out from the house, are we there yet? In both cases, we are not questioning the sincerity behind the intent. And that is why boxing is implicit under T2:

 

Object ^o = 0; // T2 implicit boxing

 

There are side-effects to implicit boxing, of course. One of which being that the above initialization is not setting the object to null, but to address a boxed instance of the integer value zero. This requires the introduction of some entity that can represent a tracking handle to no object. Everyone s original choice, of course, is null, and lucky for C# they could start from scratch and introduce just such a keyword. Adding a paradigm to an existing language presents a few more constraints think of the somewhat analogous problem of turning fins into legs, or introducing lungs, as marine life moved onto land. In any case, my original choice was the refnull, which one can t champion with any real enthusiasm, and that has evolved over a year and a half into nullptr, as in:

 

Object ^o = nullptr; // T2 initialize tracking handle to refer to no object

 

[A T1 to T2 translation Head s Up] As I mentioned in an earlier post, this presents something of a bother for those moving their code from T1 to T2 since all comparisons and assignment/initialization of 0 change semantics because of implicit boxing. mcfront, the translation tool, attempts to automagically do the right thing, but certain cases such as calling an overloaded method requires a great deal of type analysis that goes beyond the original scope of the tool which consists of a parse engine, abstract syntax tree hierarchy (called an MCTree), and a tree-walker (called an Ent) to generate the T2 source-level code. It s just a question of having sufficient time to add the necessary type checking semantics, which are not an aspect of the parse engine component. If you are doing the transition by hand, it is something you need to watch out for.  [End of Head s Up]

 

Let me conclude by putting this is the context of our two earlier metaphors of (1) Kansas and Oz as representing native and managed Object Model behaviors, and (2) the two-faced Janus image as representing the design face of C++/CLI.

 

  1. T1 did not provide implicit boxing. Why? Simply put, we were thinking Kansas , not Oz. We were looking through the wrong Janus pair of eyes. And this resulted in an inelegance and sense of complexity for our users. T2 addresses this imbalance.

  1. On the other hand, T1 did provide direct access of the boxed value on the managed heap, since the alternative is not acceptable performance-wise. The lesson here is that without some Kansas thinking -- without using that set of Janus eyes -- system programming is not practicable.

For example, if one were to write a simple word-counting program that represents a map where the word is represented as a string key and the count as an integral value, then each increment of the count requires a downcast unboxing of the existing value and subsequent reboxing of the new value into a new heap object. Languages that have no performance characteristics are quick to ridicule performance concerns and usually stoop to the mockery of saying, it hardly matters how fast a program is if it simply returns an incorrect value faster, suggesting that concerns with performance lead to bad programs and, even worse, bad programmers. Implicitly, these people are condemning C and C++, and using that condemnation to promote the sales of their languages. But here is an example of where performance and correctness are not adversaries, but rather partners in providing a street savvy system's programming language for .NET.

 

disclaimer: This posting is provided "AS IS" with no warranties, and confers no rights. The opinions expressed are those of the author.     

           

 

Published Wednesday, December 10, 2003 12:35 PM by slippman

Comments

 

igor f said:

Interesting stuff, thanks for the insights. The discussion of "nullptr" and boxing "0" speaks to an issue that I've been continually wondering about as I've been reading this series of blogs: what is the planned backwards-compatibility story of the next version of Managed C++? Specifically, will the new compiler continue to support (perhaps optionally) the original MC++ syntax (__gc, __value et al), or will all MC++ code need to be re-written or translated to the new syntax? Initially the prospect of translation didn't seem all that onerous, but if the new compiler does in fact support implicit boxing, the process will become more difficult and error-prone, as you point out. Unfortunately implicit boxing also would seem to complicate the backwards-compatibility scenario. I'd love to know how you anticipate this working out.
December 10, 2003 1:58 PM
 

Garrett Serack said:

Igor -> It appears that they are introducing this new C++/CLI Spec in addition to the existing MC++ one, in order to go abit farther with the language. There is a compiler flag (/Z:oldsyntax or something) that will let you continue to use MC++ style language. You can mix and match in a project, but a single file needs to be either one or the other. Garrett
December 10, 2003 3:00 PM
 

Andreas Häber said:

Just curious.. you wrote: "of the tool which consists of a parse engine, abstract syntax tree hierarchy (called an MCTree), and a tree-walker (called an Ent)" does the name for the tree-walker (Ent) come from Tolkien's ents in Lord Of The Ring? Cool way to name it :-) btw. I believe I read in a blog somewhere that the switch for using MC++ is /clr:oldsyntax. Using /clr you'll get "the new way".
December 10, 2003 6:04 PM
 

Andreas Häber said:

Regarding the switch for using MC++... Andy Rich wrote about that here: http://weblogs.asp.net/arich/posts/42068.aspx
December 10, 2003 6:11 PM
 

Srdjan said:

Interesting thing is that 'nullptr' is choosed to be a keyword representing null _managed reference_ [sic!] In my book, ptr is a pointer...
December 10, 2003 10:59 PM
 

Gil said:

A question on something I just saw in the code: array<int>^ my1DIntArray = {1,2,3,4,5}; What is the type of the expression {1, 2, 3, 4, 5}. If it is int[] then somebody must be doing an implicit conversion here? Surely not the CLR, so is it the C++ compiler. If the expression is array<int>^, then that is a big compatibility problem (so I assume that it isn't).
December 11, 2003 1:36 AM
 

AlisdairM said:

nullptr is also a proposal to ANSI/ISO for the next C++ standard, C++0x, as the literal value for null pointers. On the assumption that goes through, I am more than happy to see the same reserved word used in C++/CLI for effectively the same purpose. After all, we call them managed references but syntactically they are much closer to C++ pointers than C++ references.
December 11, 2003 12:09 PM
 

stan lippman said:

igor asks: Specifically, will the new compiler continue to support (perhaps optionally) the original MC++ syntax (__gc, __value et al), or will all MC++ code need to be re-written or translated to the new syntax? ... I'd love to know how you anticipate this working out. 1. the compiler will continue to support the old syntax, with a flag. 2. i am currently developing a translation tool which parses the old syntax and in intention at least translates both the syntax and semantics to the new language, or else generates a warning when such a translation may not be implemented. for example, in the old language, one could pin a whole object, and then pass the address of one or more members into the native space. in the new language, pinning a whole object is not supported. the ideal translation would be to recognize each interior address and backpatch a pin declaration. at the moment, that is on the stack to do. the actual details of how the tool will be deployed are unclear at the moment. our goal is to provide a first class transition experience. if you have anything less, you should let us know loud and clear.
December 17, 2003 2:57 PM
 

stan lippman said:

Interesting thing is that 'nullptr' is choosed to be a keyword representing null _managed reference_ [sic!] In my book, ptr is a pointer... Srdjan |srdjanjAT NOSPAMmicrosoft dot com as alisdairM points out, nullptr is a proposal to the ANSI/ISO, and represents a joint authorship of Bjarne Stroustrup and Herb Sutter, who is leading the C++/CLI language effort. We originally did not call it nullptr for the C++/CLI, but for the unification, although we agree nullptr is not absolutely accurate, getting convergence with ISO C++ is worth the small misnomer.
December 17, 2003 3:02 PM
 

stan lippman said:

A question on something I just saw in the code: array<int>^ my1DIntArray = {1,2,3,4,5}; What is the type of the expression {1, 2, 3, 4, 5}. *** i'll address the array revision in a subsequent blog, and address this then. it is a shorthand notation supported in the original language design.
December 17, 2003 3:06 PM
 

Kenny Kerr said:

August 3, 2004 8:19 AM
 

Stan Lippman s BLog Implicit Boxing | Weak Bladder said:

June 7, 2009 9:36 PM
 

Stan Lippman s BLog Implicit Boxing | Joint Pain Relief said:

June 8, 2009 2:34 PM
 

Stan Lippman s BLog Implicit Boxing | Outdoor Decor said:

June 19, 2009 12:17 AM
Anonymous comments are disabled

© 2009 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker