In this writing, I plan to discuss the history and rationale of handles. This is perhaps the most noticeable addition to C++. I have heard many questions about handles. Why does C++ need handles? Why are they named handles? Why did you use the hat to declare them? And much more. The design team spent quite a bit of time getting handles right.

First, it is most useful to know what handles are. I will provide a short summary. C++ has the notion of declarators, which are ways to build up types by adding symbols. The two declarators in standard C++ are pointers (denoted with the asterisk) and references (denoted with the ampersand). In addition to the previous declarators, C++ adds handles (denoted with the caret). The caret symbol (^) is often referred to as hat, just as the asterisk is referred to as star.

A handle refers to an instance of an object that is garbage collected (note isn’t exactly true, we’ll clear that up later). How does an object become garbage collected? There are two ways this is done: (1) boxing can copy a value type and put it on the GC heap, or (2) C++ introduces the gcnew operator that creates a new instance of any type on the GC heap. For example:

      Button^ b = gcnew Button();

The new and gcnew operator are similar. Both allocate memory and invoke the constructor of the object. Whereas new allocates memory from a native heap and returns a pointer, gcnew will allocate memory from the GC heap and return a handle. A boxed value type is easy to recognize, as the type is simply a handle to value type. For example:

      int m = 42;             // integer on the stack
      int* n = new int(42);   // integer on a native heap
      int^ o = gcnew int(42); // boxed integer on the GC heap

There are other ways to create boxed value types, but I will leave that for another time when I discuss the implementation of boxed value types.

Why is it important to distinguish whether an instance of an object is on the GC heap or a native heap? The GC algorithm implemented by the CLR is a generational compacting garbage collector. This means that the memory location of the object can change upon each garbage collection. This does not happen on the native heap. A pointer refers to an instance in memory that never moves. The garbage collector ensures that a handle always points to the right instance.

To access a member of an instance referred to by a handle, use the arrow (->) operator. For example:

      Object^ o = f();
      o->ToString();

Now, at this point someone reading this might think that handles are awfully similar to pointers. Such an observation may lead to questions as to why introducing handles to the language was even necessary. To understand this, it does help to look back at the Managed Extensions syntax that shipped with Visual C++ 2002. Much of the language redesign has used the experience of implementing that language and user feedback to evolve the C++ language. Let’s look at some of the problems with what Managed Extensions tried to do.

Before we do that, here is a quick summary. In Managed Extensions, there were three kinds of pointers. The first was a native pointer (also known as a __nogc pointer) which is a traditional meaning of pointer. It points to data in memory that will not move. Another kind of pointer was a whole object pointer (known as a __gc pointer). These pointed to instances of __gc classes. The third kind of pointer was an interior pointer (also known as a __gc pointer), and these could point anywhere and in particular inside objects on the GC heap. Managed Extensions also had a feature known as defaulting rules, which allowed the compiler to choose the most logical meaning for a pointer. Consider this example:

      // System::String is a __gc class
      System::String * s;        // String __gc * (whole object pointer)
      System::String __nogc * q; // ERROR – ill-formed type
      System::String __gc * r;   // String __gc * (whole object pointer)
      
      // int is a __value class (Int32 is mostly the same as int)
      int * i;                   // int __nogc *
      int __nogc * j;            // int __nogc *
      int __gc * k;              // int __gc * (interior pointer)
      System::Int32 * l;         // int __gc * (interior pointer)
      System::Int32 __nogc m;    // int __nogc *
      System::Int32 __gc * n;    // int __gc * (interior pointer)
      
      // std::string is a __nogc class
      std::string * v;           // string __nogc *
      std::string __nogc * v;    // string __nogc *
      std::string __gc * v;      // ERROR – ill-formed type

The defaulting rules were introduced to make it easier to write code. As seen above, a pointer to a __gc class can only be a __gc pointer, and pointer to a __nogc class can only be a __nogc pointer. The only place where the defaulting rules introduced difficulty was with value classes. In every regard, int and System::Int32 are the same type except that Int32 defaults to having __gc qualification and int defaults to __nogc qualification. Because __gc qualification can be added but not taken away (much like const and volatile), trying to pass a __gc pointer to a function expecting a __nogc pointer resulted in a compile-time error. It was with this that we first saw users struggling. There are two ways to resolve this compile-time error: (1) pin the __gc pointer and convert it to a __nogc pointer, or (2) change the function to accept a __gc pointer. It was the latter option that many people chose, and they did so by placing __gc everywhere in the code until the program compiled. In particular, when dealing with a sequence of pointers (such as __gc pointer to a __nogc pointer), it became clear that most people did not understand how a pointer acquired __gc qualification in the first place.

One advantage that the defaulting rules allowed was function templates could be agnostic to __gc qualification. For the most part, this is very useful; however, recall though that the garbage collector can move memory pointed at by a __gc pointer. After each garbage collection, the value in a __gc pointer could be different. Code that comparing less-than or greater-than of two __gc pointers could return different results before and after garbage collection. Such behavior is subtle, and can easily lead to fragile code.

Also, for the code reviewer, the defaulting rules required knowledge of what kind of type was being pointed to. If the type was a __gc class or a __value class, the code reviewer would like to look for unwanted pointer tricks like conversion to int and back.

Of course, the most significant drawback of the Managed Extensions pointer qualification was the inability to overload operators on __gc classes. The Base Class Library defines a number of useful overloaded operators and C++ users clearly wanted to use this functionality with the natural operator syntax. Pointers, however, already have operators defined on them (such as equality, less-than, dereference, and arrow). While it is conceivable that overloading some operators on __gc pointers could be done, it was impossible to do so cleanly.

With all that out of the way, the design team felt very strongly that a simpler design was needed to lower the intellectual burden of using the CLR GC heap. Handles solved the problem very nicely. They are closest to the whole object pointer from Managed Extensions. Because handles were freed from the compatibility of pointers, they were designed to afford the programmer all the advantages of pointers while providing first-class support for CLR features. In fact, handles have opened new possibilities in the language. This is a sign of good language design.

First, handles have the ability to overload operators. For instance, it is possible to write the following:

      X^ operator+(X^ xl, X^ xr);
      X^ x1;
      X^ x2;
      ...
      X^ x3 = x1 + x2;  // calls operator+(X^, X^)

Making operators in the new language features work well with the CLR has mostly been relaxing rules and making operators more flexible. The operator overloading design in C++ was already quite solid. At a later time, I will talk about how operators have changed.

Another useful outcome of handles is that C++ can take advantage of conversion functions in the Base Class Library. If a class referred to be a handle contains a user-defined conversion, the compiler will now be able to find it.

During the design of handles, the design team has worked towards offering the conveniences of pointers without the pitfalls. For example, using a pointer as a Boolean expression is a useful way to guard a member access. For example:

      Y^ y = g();
      if (y) y->Execute();

This is actually a very tricky thing to get right. In C++, bool is an integral type that can convert to an int via a standard conversion. These standard conversions happen all the time. Clearly, we wanted to avoid allowing every handle converting to integers. It would make it difficult to diagnose improper arguments when many overloads to a function exist. Also relying on a conversion function to exist in an ultimate base class does not work, as System::Object is only a base class for all ref class and value class types (it is not a base for native class types). The design team solved this problem by introducing a conversion function to a special Boolean type as a special member function. C++ has a number of special member functions already. This is yet another subject that I will write about later.

Handles do not have the same meaning as pointers. They do not have built-in less-than, greater-than, increment, or decrement operators. It is not possible to reinterpret_cast a handle to an int and then back to a handle. In every regard, handles are type safe. One of the design goals for the new language was to make writing verifiable code easier. That is, code should be verifiable the first time the code was written. A program that makes use of pointers is nearly always unverifiable. The list of rules for writing verifiable code is short, and among the rules is to use handles instead of pointers.

One nice part of this new design is that defaulting rules are not necessary. In C++, a pointer always refers to memory that will not move. In large part, the defaulting rules are no longer necessary because of gcnew. In the past, new behaved differently on int and Int32. Now, new and gcnew behave exactly the same way for both int and Int32. In fact, int and Int32 are exactly the same in C++. Another useful outcome of gcnew is that the MFC debug macros do not conflict with it, which will make it easier to use CLR features in existing MFC programs.

Perhaps one of the most exciting prospects of handles and the gcnew operator is that it was possible for the design team to lift the restriction that native classes could not be garbage collected. There is a significant amount of machinery to make this work, preserve existing C++ semantics, and implement a robust solution. This is part of the feature set known as the "unified type system" of which I will have to spend much time writing about. In short, the design team is making this work:

      std::vector<int>^ vec = gcnew std::vector<int>;

This particular feature (creating handles to native class types) unfortunately will not be part of the Whidbey feature set. As with most software engineering projects, we had to make cut-off decisions so we could deliver a solid compiler earlier.

If you’re interested in learning the manner in which the CLR implements handles, look for discussion of "object references" in the CLI standard and CLR documentation.

Stan Lippman deserves the credit for looking at a new declarator. When he first started working on revising the language, he was working on the notion of a rebindable reference. He first used the % symbol as the declarator. Jeff Peil, who had also come to the conclusion that a new declarator was needed, pointed out that % was not the best choice due to C++ digraphs. When the sequence of characters, %>, is seen by the C++ lexer, it is replaced with a closing curly brace, }. If handles were to be used as template arguments (which they definitely are designed to be widely used in that regard), the digraph behavior of the C++ Standard was undesirable. Of the remaining unused symbols, the caret was the best. Nostalgic memories of Pascal pointers are shared amongst all of us on the design team. J

A curious result of the choice to use the caret is another Standard C++ feature, alternative tokens. Wherever ^ is used, it is perfectly suitable to also use the keyword, xor. For example, the following is legal:

      Button xor b = gcnew Button;

As a note to Visual C++ users, both digraphs and alternative tokens are available only when compiling with the /Za switch. The /Za switch conflicts with the /clr switch. As standards conforming behavior on the CLR is still a goal for Visual C++, we do have a strategy for finishing standards conformance features and making them the default. At some point I can write about that too.

As for the unification feature, Herb Sutter is the one to credit for driving that work. Although much of the details were figured out by all five of us on the design team, he sold this to partners, managers, and most importantly C++ developers. I think he helped push the design team to figure out all the possibilities handles enabled.

I am leaving out discussion of getting handles from an lvalue (such as address-of with & returns a pointer from an lvalue). I will discuss that after more discussion of the unified type system and deterministic finalization.

Mark Hall is the one to credit for the gcnew operator. After doing most of the design work for Managed Extensions, he has been the most qualified to recognize ways to avoid the same issues.

Lastly, why are they named "handles"? At the beginning of the language design, they were called either managed pointers or whole object pointers. At times they were also called GC pointers and tracking pointers. The problem with this is that they are not pointers. Any adjective applied to pointers misses the point that they are not a modification to the semantics of pointers, but instead they are an entirely different abstraction. We noticed that discussion tended to confuse whether the context was referring to native pointers or this new declarator. (During the course of conversation, writing, or dialog, there is a tendency to drop adjectives as more context is built up). The challenge was left to figure out a better term. Brad Van Ee mentioned "handle" in a hallway conversation. That has been the term to stick.

Throughout this and past writings, I have been promising to write about a number of other topics. If anyone is more interested to hear about one subject before another, give me feedback either via comments or via email. Hopefully, this has been interesting and I am happy to answer questions as they come up.