Herb Sutter's Blog

  • Why "gcnew T" instead of "new (gc) T"?

    On comp.lang.c++.moderated, Peter Lundblad wrote:

    • I also don't see the need for gcnew. Why not use placement new, i.e.:
      T^ h = new (CLI::gc) T(14);
      This would require an extension allows placement new overloads to return something other than void* and a standard way to get to the raw storage where to construct the object. That extension, however, could be useful for other things as well, i.e. a new that returns a smart pointer so you don't have to have a public constructor on the smart pointer taking a raw pointer which is dangerous.

    There are several reasons why we didn't use a form of placement new.

    One reason is that we wanted to leave a door open in case in the future we wanted to allow placement and class-specific forms of gcnew. Having a parallel gcnew expression and operator best serves leaving that door open.

    Another reason is that existing libraries, including GC libraries, already use placement forms of new, and so many of the possible placement names are taken. In particular, new (gc) X; is already taken by the Boehm collector. Yes, I know you suggested CLI::gc instead of plain gc, but in practice I'm still concerned that enough people are liable to frequently write using namespace CLI; (actually stdcli) to make this problematic.

    Still another is one you cite: It's easier to teach that "the type of a new-expression (and operator new) is a *" today, and that "the type of a gcnew-expression is a ^".

    Finally, a minor reason is that gcnew is slightly less typing than new (gc) or new (cli), and moderately less typing than new (stdcli::gc).

  • C++/CLI candidate base document now available

    Today the C++/CLI candidate base document was posted, and it's freely available for download.

    This is the spec that Microsoft is contributing to the newly-formed ECMA TC39/TG5 standards committee for consideration for the C++/CLI standards process. It covers all the main proposed features, and it gives a pretty thorough look at the scope and shape of what's being contemplated. There are still places that need to be filled in, though, as well as some technical decisions that TG5 will need to decide (in addition to any existing decisions that they may decide to review or change).

    Note that this is the last version of the document that will bear a Microsoft copyright, so we've taken this opportunity to make it publicly available while we still own it. If ECMA TC39/TG5 adopts this as their base document, it will henceforth be an ECMA document maintained by that ECMA group. That means it will be up to TG5 to decide what changes to make and when to make future drafts publicly available. (From my informal conversations, I wouldn't be surprised if interim drafts were published every three months or so, but that's just my personal best guess right now. We'll have to wait and see when the whole group feels the spec is in shape for TG5 to feel ready to distribute its own first updated snapshot.)

    Whew! It's been a long year, a long month, and a long week. Enjoy! And please let us know what you think of this. Comments are welcome, and those of us on the team who are blogging (see my Links) will be answering as many as we can get to while we spend our days continuing to work on the Whidbey product.

    I'll probably blog fairly lightly over the next two weeks. Next week is a short week of course, with the U.S. Thanksgiving holidays closing most offices on Thursday and Friday. The following week, on Dec 4-5, is the first ECMA TC39/TG5 meeting already, down in College Station, Texas -- it sure has come up fast. I'll have more to report after that.

  • Why R^ instead of cli::handle?

    Nicola Musatti asked the following excellent question:

    • The hat symbol and gcnew could be replaced with a template like syntax, e.g.

      cli::handle<R> r = cli::gcnew<R>();

    I agree that those are alternatives. Everyone, including me, first pushes hard for a library-only (or at least library-like) solution when they first start out on this problem. I think an argument can be made for it, and at one time I did so too.

    To me, the killer argument in favor of a new declarator with usage R^ instead of a library-like cli::handle<R> is its pervasiveness: It will be by far the most widely used part of all these extensions, as it's the common use case the vast majority of the time for CLI types (as objects, as parameters, etc.). This extremely wide use amplifies two particular negative consequences we'd like to avoid: First, the long spelling (here "handle") could in practice effectively become a reserved word just because people are liable to widely apply using to avoid being forced to write the qualification every time (this is worse if the name chosen is a common name likely to be used for other identifiers or even macros, and "handle" is a very common name). Second, and worse, the long spelling would also make the language several times more verbose in a very common case than even the Managed Extensions syntax was, and that in turn was already verbose compared to other CLI languages.

    Compare five alternatives side by side:

      cli::handle<R> r = cli::gcnew<R>(); // 1: above suggestion

      handle<R> r = gcnew<R>(); // 2: ditto, with "using"s

      R __gc* r = new R; // 3: original MC++ syntax

      R^ r = gcnew R; // 4: C++/CLI syntax

      R r = new R(); // 5: C#/Java syntax

    I think you could make a case for any one of these, depending on your tradeoffs. But I think a tradeoff that favors usability will favor the last few options.

    There are also other issues where having ^ and % declarators/operators that roughly correspond to * and & enables a more elegant type calculus. I (or someone on the team) will have to write those up someday, but consider at some future time when we have full mixed types too: When we can have a type that inherits from both native and CLR base classes/interfaces, we will want to be able to pass a pointer to such an object to existing ISO C++ APIs that take a Base1* and a handle to the same object to existing CLI APIs that take an Base2^. Both will be common operations and therefore both should be distinctly expressible with a terse syntax:

      class NativeBase { };

      // a mixed type
      ref class R
        : public NativeBase
        , public System::Windows::Forms::Form
      { };

      void NativeFunc( NativeBase* );
      void CLIFunc( Object^ );

      R r;                  // object on the stack
      NativeFunc( &r ); // "give me a *" is spelled "&" as usual
      CLIFunc( %r ); // "give me a ^" is spelled "%"

    In this way, % is to ^ pretty much just as & is to *. If R^ were instead spelled using a templatelike syntax, what would be the corresponding code to get at it?

    Finally, consider the agnostic template case:

      template<typename T>
      void f( T t ) {
        SomeBase* b = &t; 
    // I have to have a way of saying "I want a *" without knowing the type of T
        SomeInterface^ i = %t; // I have to have a way of saying "I want a ^" without knowing the type of T
      }

    I'll write more about the full pointer system in the future. For other design considerations about handles I'll point to at Brandon's Behind the Design: Handles blog entry again, and to my own earlier this week on why pointers aren't enough by themselves.

  • Why "ref class X", not just "class X : System::Object"?

    On comp.lang.c++.moderated, Andrew Browne wrote:

    • The goals of the C++/CLI proposal are good ones, I think, but I wonder if it would be possible to achieve them without (most of) the new keywords and semantics?

      For example instead of:

      ref class R {/*...*/};        // CLR reference type
      value class V {/*...*/};     //  CLR value type
      interface class I {/*...*/}; //  CLR interface type
      generic <typename T>
      ref class G {/*...*/};       // CLR generic
      // etc etc

      couldn't we have

      class R : public System::Object {/*...*/}; // CLR reference type
      class V : public System::ValueType {/*...*/}; //  CLR value type
      class I : public System::Object
      {/* pure virtuals only here*/ }; // CLR interface type
      template <typename T>
      class G : public System::Object {/*...*/}; // CLR generic
      // etc etc?

    That's one of the alternatives I attempted, and I wasn't the first. I think almost everyone starts here, and I held on for a while before I became convinced I had to let go because it wasn't leading to the right places. Let me share some of the problems and objections that crop up when you work your way down this path:

    1. (Minor) Verbose

    The above alternative is a lot of typing compared to any of the alternatives (Managed C++ syntax, proposed C++/CLI syntax, and other CLI languages).

    There's a pretty easy solution for this one, using keyword shortcuts:

      class R : ref {/*...*/}; // CLR reference type
      class V : value {/*...*/}; //  CLR value type
      class I : interface
        {/* pure virtuals only here*/ }; // CLR interface type

    An inconvenience with this is that there could already be a class named ref, and so the syntax would have to be embroided somehow to account for disambiguating this; this is unfortunate but surmountable. But, more importantly, this shorthand still doesn't address the other drawbacks, below, of this general approach.

    2. Forward declarations

    Consider:

      class X;

    Is this a ref class, value class, interface class, or native class? There are a few cases where this needs to be known from the forward declaration.

    3. Indirect: The header hunt

    Consider:

      class X : public Y { };

    Is this a ref class, value class, interface class, or native class? Under the alternative, the only way to know would be to inspect Y and all base classes until you can determine whether any of them directly or indirectly inherit from Object or ValueType (or not). There are shortcuts (e.g., it's simpler for value types because they're always sealed and so the inheritance has to be direct), but the hunt remains.

    That may not seem like a huge issue, except that the types really are behaviorally different in small but important ways; for example, in one case a virtual call in a ctor or dtor will be deep, in the other it will be shallow. What metadata will eventually be emitted, if any?

    4. Closes doors

    Speaking specifically to the last part of the example:

    • template <typename T>
      class G : public System::Object {/*...*/}; // CLR generic

    Unfortunately, this conflates the ideas of the type category (ref/value/native) with the form of genericity (generic/template). It says that CLI types can only be genericized, and native types can only be templated, leaving no way to express the other two useful concepts:

    • a templated CLI type (C++/CLI syntax: template<class T> ref class R {};)
    • a generic native type (C++/CLI syntax: generic<class T> class N {};)

    Templated CLI types in particular are very useful and are supported in C++/CLI, which lets the template/generic choice and the class category choice vary independently.

    5. Other closed doors: Distinguishing mixed types (Future)

    In the future, C++/CLI is intended to eventually allow for full mixing and cross-inheritance of arbitrary types. Using the alternative inheritance-based syntax alone does not allow the programmer to distinguish between the following two distinct things that the proposed C++/CLI design lets the programmer express as follows:

      ref class Ref : public ANative { int x; };

      class Native : public ARef { int x; };

    This distinction can't be expressed using the proposed alternative above. Both types have System::Object as a base class, but one is a reference class that other CLI languages could use directly and where virtual calls during construction are deep, and one is a native class that other CLI languages can only use via a handle or reference to the ARef base class and where virtual calls during construction are shallow.

  • Q: Why keywords instead of __keywords? A: We already tried __keywords; they failed.

    Last week on comp.lang.c++.moderated, Nicola Musatti wondered why C++/CLI would use keywords that don't follow the __keyword naming convention for conforming extensions:

    • The standard already provides a way to avoid conflicts when introducing new keywords: prepend a double underscore.

    Right, and that's what Managed C++ used, for just that reason: to respect compatibility. Unfortunately, there was a lot of resistance and it is considered a failure.

    For one thing, programmers have complained loudly that all the underscores are not only ugly, but a real pain because they're much more common throughout the code than other extensions such as __declspec have been. In particular, __gc gets littered throughout the programmer's code.

    At least as importantly, the __keywords littered throughout the code can make the language feel second-class, particularly when people look at equivalent C++ and C# or VB source code side-by-side. This comparative ugliness has been a contributing, if not essential, factor why some programmers have left C++ for other languages.

    Consider:

      //-------------------------------------------------------
      // C# code
      //
      class R {
        private int len;
        public property int Length {
          get() { return len; }
          set() { len = value; }
        }
      };

      R r = new R;
      r.Length = 42;

      //-------------------------------------------------------
      // Managed C++ equivalent
      //
      __gc class R {
        int len;
      public:
        __property int get_Length() { return len; }
        __property void set_Length( int i ) { len = i; }
      };

      R __gc * r = new R;
      r.set_Length( 42 );

    Oddly, numerous programmers find the former more attractive. Particularly after the 2,000th time they type __gc.

    But now we can do better:

      //-------------------------------------------------------
      // C++/CLI equivalent
      //
      ref class R {
        int len;
      public:
        property int Length {
          int get() { return len; }
          void set( int i ) { len = i; }
        }
      };

      R^ r = gcnew R;
      r->Length = 42;

    I should note there's actually also a shorter form for this common case, to have the compiler automatically generate the property's getter, setter, and backing store. While I'm at it, I'll also put the R instance on the stack which is also a new feature of the revised syntax:

      //-------------------------------------------------------
      // C++/CLI alternatives
      //
      ref class R {
      public:
        property int Length;
      };

      R r;
      r.Length = 42;

    C# is adding something similar as a property shorthand. But C# doesn't have stack-based semantics for reference types and is unlikely to ever have them, though using is a partial automation of the stack-based lifetime control that C++ programmers take for granted. I'll have more to say about using another time.

  • Q: Aren't C++ pointers alone enough to "handle" GC? A: No.

    A few days ago on news:comp.lang.c++.moderated, Nicola Musatti wrote:

    • As for GC, pure implementations exist.
      [that add no new extensions to ISO C++]

    Not for a pure definition of "pure," they don't. :-)

    To explain why C++ pointers are insufficient (unless their semantics were to be changed at least a little, which would mean breaking existing code), consider two counterexamples:

    1. Not for a compacting GC. Certainly a bald pointer can't point directly to an object that moves around in memory, because C++ pointers are required to be stable, to always have the same value while pointing to the same object. Changing the semantics of a pointer to make it track will break lots of code, starting with set<T*>, because such tracking pointers cannot be ordered (their values will after all be changed arbitrarily at unpredictable times by the GC). There are also other restrictions, but that's one of the most noticeable. [Aside: Such a tracking pointerlike abstraction is needed, and is provided in C++/CLI. It just can't be spelled * without fundamentally scuttling ISO C++ conformance, is all.]

    2. Not for a non-compacting GC, either. This case can be got a lot closer, but even Great Circle / Boehm style collectors impose restrictions that break some conforming C++ programs. In particular, they restrict, if only slightly, the operations that Standard C++ allows on pointers. Consider the following well-formed ISO C++ program with well-defined semantics:

      int* pi = new int(42); // line 1
      pi = (int*)((int)pi ^ 0xaaaaaaaa);

      // ... do other work ...

      pi = (int*)((int)pi ^ 0xaaaaaaaa);
      cout << *pi; // perfectly ok, prints "42", won't crash
      delete pi; // ok

    Add-on GCs can't see such disguised pointers, and are liable to reclaim the memory allocated in line 1 before its later use, resulting in an attempt to access freed memory. Boom.

    This isn't perverse or theoretical, by the way. Consider "two-way pointers" as one example of a well-known implementation technique where two pointers are XOR'd together like this for a perfectly reasonable and legal use. In particular, a motivation behind two-way pointers is that you can have a more space-efficient doubly linked list if you store only one (not two) pointer's worth of storage in each node. But how can the list still be traversable in both directions? The idea is that each node stores, not a pointer to one other node, but a pointer to the previous node XOR'd with a pointer to the next node. To traverse the list in either direction, at each node you get a pointer to the next node by simply XORing the current node's two-way pointer value with the address of the last node you visited, which yields the address of the next node you want to visit. For more details, see:

      "Running Circles Round You, Logically"
      by Steve Dewhurst
      C/C++ Users Journal (20, 6), June 2002

    I don't think the article is available online, alas, but Steve's website has some source code demonstrating the technique.

    This perfectly standards-conforming and useful technique won't work correctly with any GC implementation I know of that does not extend the language so that pointers can retain their full standard meaning.

    Steve's technique works perfectly fine and unbroken, however, under C++/CLI. It works because C++/CLI preserves exactly the full semantics of * pointers without any limitations. To do so, C++/CLI needed to add a new abstraction for GC semantics instead of pretending that raw pointers are by themselves a complete solution for safe use in a GC environment (they aren't, only because they were never designed to be).

    For more about the design motivations behind the ^ declarator (aka a "handle"), see also Brandon Bray's excellent blog entry Behind the Design: Handles posted earlier today.

  • Q: Could the CLI binding become required? A: No.

    A few days ago on news:comp.lang.c++.moderated, "Chris" asked:

    • Here is a paranoid question: Is there a possible future step, where compiling C++ on a Microsoft plaftform becomes impossible _without_ using the CLI binding?

    No. Doing that would mean throwing away all the ISO conformance work that Visual C++ just spent nearly the whole last release cycle adding to the product. VC++ is now 98%-ish conformant to C++03 (the 1998 ISO C++ standard + its first technical corrigendum) and VC++ will continue to work on the remaining 2%, plus track the coming C++0x additions as they are created by the ISO and ANSI committees.

    Of course, the CLI extensions will be needed where programs specifically take advantage of CLI (i.e., .NET) data types and features, such the types in the .NET Frameworks libraries, and garbage collection and reflection. But programs that don't need those can ignore the extensions and compile just fine to either native binaries or to .NET IL. Note that last bit, because it seems to be not widely known: C++ code can still be compiled to IL and run in the .NET virtual machine (Common Language Runtime, or CLR) without using any extensions; the extensions are needed only for additionally using CLI data types and features like garbage collection.

    So there are three major scenarios:

    • Pure native: Compile existing programs to native binaries just like we've all been doing for years. No CLI features, no CLI extensions.
    • Normal C++ programs that happen to be compiled to IL instead of to x86: The code runs on the VM and is JITted and everything, but the program is still using all native data and not using any CLI data types, so no CLI extensions are needed here either.
    • C++ programs that explicitly start using some CLI data types or features: At those points in the code where those data types or features are used, and only at those points, the extensions will apply, and most of the time the only new syntax will be to write gcnew and ^ (instead of new and *).

    Unless you're actually authoring your own new CLI types, you're unlikely to directly use much more than gcnew and ^, plus maybe an occasional sprinkling of nullptr or %.

  • Help | About

    Welcome! My primary day job these days is that I'm an Architect on the Visual C++ team at Microsoft, currently responsible for leading the redesign of the C++ Managed Extensions for .NET (aka "Managed C++"). I also do a fair amount of other C++ writing and speaking (including right now busily writing two new books due out in the spring), and I chair the ISO C++ standards committee. You can find out more about me on my website.

    At first, I'll mostly use this blog to begin answering frequently asked questions about the language extensions redesign. The VC++ team has learned a lot about what worked and what didn't work with the current Managed Extensions for C++ (aka "Managed C++"). The redesign is an evolution of those extensions but it isn't being called "Managed C++" any more. The new syntax is about to undergo standardization in the ECMA and ISO worlds under the name "C++/CLI," a binding from C++ to the CLI, so I'll often refer to the extensions by that name. I get questions about this every day or two, and I'll primarily answer them here.

    In the meantime, you can find a general overview blurb about this work on my website's Microsoft page.

More Posts « Previous page

© 2009 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker