Stan Lippman's BLog

C++/CLI

  • The String Literal Returns

    In the last entry, I celebrated what I felt was an elegant solution to the problem of the string literal in the context of overload function resolution. But it turns out there is another area in which the string literal proves problematic. Who would have thought such a foobar kind of entity could cause so much trouble? It's these kinds of ambushes that makes the extension of a language so unpredictable.  

     

    So, here's the problem.

     

                throw "fritz";

     

    what does the compiler do? it has to set everything up at compile-time, then it actually gets handled by a run-time library that makes use of a limited form of type reflection to figure the kind of object having been thrown.

     

    there's no context, really. we don't know if the person is going to catch x, y, or z, so we can't know what to throw in any particular instance. this is all set up at compile-time.

     

    Now, here is the rub:

     

              void f()

               {

                         try { g(); }

                         catch ( String^ s ) {}

                         catch ( const char* pcs ) {}

                }

     

                void g()

                 {

                         throw "fritz";

                  }

     

    What gets caught? what an impossible question. one could argue if we are compiling for the CLR, then String^ should match, and if we are in native, and so on. But there's no reason to believe that just because we are compiling for the CLR, people are using String and not just const char*.

     

    So, a string literal is thrown as a type const char*, not as String^.  My first reaction was, gosh, that's terrible. Then, I thought, what am I thinking? It seems a very small matter. It's not even something I would run into since I don't throw anything but class objects. I just add it for completeness.

     

    I think if this bothers you, then you are bothered by our entire effort. Just as a sounding. That is, not everybody is happy with our work on C++/CLI. There are some folks out there that feel concern that we are somehow messing with their language, and we shouldn't be allowed to do that.

     

    we aren't doing it that way at all. We have really wanted to be good c++ citizens while being excellent Microsoft employees. I have never felt any pressure to compromise either while I have been here. We have had an extraordinary degree of freedom not simply in our design, but in our being able to reach out and work with the general c++ community. this language is a coalition. I think we have all wanted to put the best face on C++ in what we regard as an otherwise hostile environment for C++. We think this is a win-win situation for everyone. if you don't like something, you should let us know. we're not a hundred thousand leagues removed from our users. if you want to use the language, you have every right to tell us what you think about it; how you find it; what you want. if you make a good case for it, I think it could be done. that's just my opinion. but I think we have an opportunity to together as a C++ community put forth C++ in an otherwise hostile environment that for some unbelievable reason thinks C# or Java comes anywhere near being as good a language. I don't understand it. It's not a question of knocking those languages. They took so much from C++ -- it's not hard, after all the hard stuff was worked out, to go back and clean it up. sure. C++ is messy. it was real hard to do. it was getting done while we were using it. we didn't know where it was going. Bjarne was implementing it, using it, it became the tool we all used to make our living. we wanted it to be the best possible language for programmers. at least I did. I don't have a philosophy. I don't make standards. I just program and write. and I do that best in C++. C# and Java mean nothing to me. Now I have my own language to use on .NET. That was my personal agenda in all this. I think you should check it out.

     

    it was a great adventure. I find C++/CLI is something interesting again about C++ for me anyway. I'm not a fan of standards. I've never finally been involved in one. It's making a language that interests me, not in clean-up and all that necessary stuff for real-world production. I don't have the head for that.

     

    so, I just wanted to make a personal statement. I have watched the team here for almost 3 years. I worked with them sometimes. I snarled at them sometimes. I was fed up with them once or twice. and yet they have produced something really wonderful in my opinion, and it is much better than I had dared imagine. I just love it. I can't wait to program with it – I'd like to drop everything and just start. this is exactly what I have been waiting for. to have a language to program .net. we didn't have one before.

     

    C++/CLI is a entry visa nto what I call 3-dimensional programming. The CLR gives you the run-time as an entity to query, to modify, to almost in a sense be alive in. everything in a native program has to be done at compile-time. that is way too early for me to know exactly what needs to be done in many interesting environments. can't I wait? I'm not that performance critical that I need the machine.

     

    my first analogy was going to be between the traditional Disney animation and the Pixar CGI, but I think a better analogy is between an instinct and a decision. that is, an instinct is the trigger of a built-in behavior that is insulated from and unaffected by the environment. a decision considers the environment (context) and memory (past behavior) in order to choose a best action. this is what is missing from native static programming. it really can't be smart enough for the kinds of problems we need to solve now in my estimate. I can't prove that.

     

    I gave a talk at Disney Feature Animation a few months ago about C++/CLI, and a friend of mine who I love dearly, asked me, what do I need that for? he is in charge of plug-in 3D tools for Maya, a modeling toolkit for computer graphics. And he is right. But we need it I believe as a community going forward. and in that sense, we have opened that doorway for the C++ programmer. The worst case scenario was that the price of admission to this world was to have to renounce C++ -- I mean, mea culpa. we're such savages – we worry about performance, we break the type system, we make use of the underlying machine, we don't follow enough rules. we're loose cannons. Can't have loose cannons. I personally didn't want to accept that as being a true statement.

     

    But you cannot argue something like that. I can't say, hypothetically, C++ can be as good a language under .NET as Java or C#. Nobody would hear that – they would just see Lippman being defensive, or delusional. you have to demonstrate it. That's the only way to really convince people. That's the only way I can ever convince myself in fact. So, that is what C++/CLI means to me.

     

    I know Bjarne has his reservations and dislikes. Well, so do we all, if truth be told. But it's still just so wonderful overall. who cares if it's not perfect. I don't – I haven't gotten anything perfect, trust me. i don't know if it is better than this or that or the next thing. but it's really pretty good and i'm going to use it. and we really did it as a team.

  • The Type of a String Literal Revisited ...

    In the course of these entries, I have twice addressed the issue of the type of a string literal under C++/CLI -- in particular when resolving an overloaded function call. The issue is illustrated in the following example,

    public ref class R {

    public:

      void foo( System::String^ ); // (1)

      void foo( std::string );     // (2)

      void foo( const char* );     // (3)

    };

     

    void bar( R^ r )

    {

      // which one?

      r->foo( "Pooh" );

    }

    In the original Managed Extensions for C++, the invocation of foo() within bar() resolved to (3), exactly the same as it does under ISO-C++. That is,

    void bar( R^ r )

    {

      // under Managed Extensions to C++, resolved to

      // void foo( const char* );

     

      r->foo( "Pooh" );

    }

    To briefly review: In ISO-C++, the type of "Pooh" is const char[5]. There is no exact match of "Pooh" to any of the three instances of foo(). However, the trivial conversion of const char[5] to const char* represents a best match, and this is why (3) is invoked. There was no built-in notion of a string literal having any relationship to System::String.

    And this was changed in the design of C++/CLI. Actually, it was changed twice, and that is the talking point of this entry – to explain why the initial change had to be further refined.

    The overall effect of the change is to extend dual citizenship to a string literal compiled for the CLI. The initial change is described in my earlier entry entitled String Literals are now a Trivial Conversion to String. Here is a brief review of the issue.

    The first question is, what is the exact type of "Pooh" within C++/CLI? One answer is, well, obviously, it is of type const char[5] – otherwise, it could not be compatible with ISO-C++. We can't change that.

    The initial solution, therefore, was to introduce a new trivial conversion, that of a string literal to a System::String, that is of equal precedence with the trivial conversion of a string literal to const char*. This provides a somewhat elegant symmetry, but in practice results in a flurry of ambiguous calls. For example, under this design, the invocation of foo() now fails,

    void bar( R^ r )

    {

      // under interim C++/CLI, flagged as ambiguous

      // the following two candidate functions are equally good …

      //       void foo( System::String^ );

      //       void foo( const char* );

     

      r->foo( "Pooh" );

    }

    To disambiguate the call, the user would have to provide an explicit cast,

    void bar( R^ r )

    {

      // ok: void foo( System::String^ );

      r->foo( safe_cast<String^>( "Pooh" ));

    }

    In practice, in nearly every case, the C++/CLI programmer wished to have the String instance invoked in preference to the C-style string instance. And so giving equal precedence to both conversions was both a step forward (in recognizing the special relationship of a string literal to System::String under the CLI) and two steps back (the presence of the const char* argument in effect neutralized that relationship (first step back) and required an explicit cast to resolve (second step back) ).

    So, we had to fix that. That is, under the CLI, we want a string literal to more closely be a kind of System::String than const char*. The question was, how could that be achieved without breaking ISO-C++ compatibility? How might you resolve that?

    The insight to resolve this is to realize that the dual citizenship of a string literal applies to its fundamental type, not to its set of trivial conversions. In effect, under C++/CLI, the underlying type of a string literal such as "Pooh" is both const char[5] (its native inheritance) and System::String (its managed underlying unified type). Under C++/CLI, the string literal is an exact match to System::String and the trivial conversion to const char* is not considered. That is, under the revised C++/CLI language specification, the ambiguity has been resolved in favor of System::String,

    void bar( R^ r )

    {

      // ok: under current C++/CLI,

      //     void foo( System::String^ );

     

      r->foo( "Pooh" );

    }

    This reflects a fundamental difference between ISO-C++ and C++/CLI in their type systems. In ISO-C++, types are independent except when explicitly part of the same class inheritance hierarchy. Thus, there is no implicit type relationship between a string literal and the std::string class type, even though they share a common abstraction domain.

    C++/CLI, on the other hand, supports a unified type system. Every type, including literal values, is implicitly a kind of Object. This is why we can call methods through a literal value or an object of the built-in types. The value 5 is of type Int32. The string literal is of type String. It just doesn't work to treat a string literal as either more like or equal to a C-Style string.

    The integrated conversion hierarchy allows a working ISO-C++ program to continue to exhibit the same behavior when compiled for the CLI, while a new C++/CLI program exercising the CLI types reflects the new type priority of the string literal.

    Most readers and programmers have little patience with this level of detail, and often point to discussions of C++ type conversions as evidence of its complexity. However, I don't think that is quite fair. The existence of these rules is necessary if one is to reach an intuitive language behavior and guarantee a uniform behavior across implementations. It is because the language generally behaves in a type-intuitive way that allows programmers in fact to ignore these details.

    While the length of this discussion might seem disproportionate to the topic's importance, it strikes me as a canonical example of the extent to which we have had to work to integrate the CLI type system into the ISO-C++ semantic framework. This should also suggests certain good practices when a native class is being recast to a CLI class type. It is better, for example, to refashion the set of member functions accepting string literals rather than simply stirring in an additional String instance to our stew of overloaded functions, seasoning it to local taste.

     

  • A Primer on the Interior Pointer

    A value type is typically not subject to garbage collection except in two cases: (a) when it is the subject of a box operation, either explicit or implicit, such as,

    void f( int ix )

    {

        // explicit placement on the CLI heap

        int^ result = gcnew int( -1 );

     

        // exercise result …

     

        // implicit boxing of the integer value ix …

        Console::WriteLine( "{0} :: {1}", ix, result );

    };

    and (b) when it is contained within a reference type either as a member or as an element of a CLI array. For example,

    public enum class Color

         { white, red, orange, yellow, green, blue, indigo, violet };

     

    public value class Point

    {

        Color m_color;

        float m_x, m_y;

     

        // …

    };

     

    public ref class Rectangle

    {

        Point bottom_left;

        Point top_right;

     

        // …

    };

     

    void f( Point p )

    {

        Rectangle  ^hit = gcnew Rectangle( -2, -2, 2, 2 );

        array<int> ^fib = gcnew int(8){ 1,1,2,3,5,8,13,21 };

       

        // …

    }

    In f(), we allocate two whole objects on the CLI heap, a Rectangle and an array of eight integer elements. The Rectangle object contains two interior value class Point members. These are located on the CLI heap as fixed offsets into the area allocated to the containing Rectangle. Within each Point are two floating point members and a Color member. These are also located on the CLI heap as fixed offsets into the two interior Point objects.

    If the Rectangle object is relocated during a sweep of the garbage collector, all its interior members, of course, are relocated as well. The same is true with a CLI array. When the array is relocated, the addresses of each of its elements change as well. We cannot safely assign the address of any interior member or array element to a non-tracking pointer or reference. This isn't just a pitfall that we must learn to recognize and sidestep. Rather, the language disallows it. Our attempt results in a compile-time error. For example,

    // error: cannot assign an interior member

    //        to a non-tracking pointer …

    Point *p_bl = hit->BottomLeft;

    We need a form of tracking entity to hold the address of an interior member. What are some of the requirements on this entity? Well, one common pattern we felt necessary to support is that of an iterator – in particular when applied to the elements of a CLI array. For example,

    void f( array<int> ^fib )

    {

        SomeTrackingPointerNotation begin = &fib[0];

        SomeTrackingPointerNotation end = &fib[ fib->Length ];

     

        for ( ; begin != end; ++begin )

              // …

    }

    Our SomeTrackingPointerNotation needs to support pointer arithmetic. That is, when we write ++begin, this does not increment the address by 1, but rather by the size of the element type. For example, since array element are of type int, each increment must add sizeof(int) to the current address value.

    Neither the tracking handle (^) nor its indirect cousin, the tracking reference (%), supports pointer arithmetic. Moreover, a tracking reference does not support pointer comparison, such as

    begin != end

    Like a native reference, a tracking reference, once initialized, serves as an alias to the underlying object to which it refers. The comparison is not of the two tracking addresses but of the values stored at those addresses. This is not the iterator semantics we need.

    In a native design, we decide whether to go with a pointer or a reference declaration based on two primary factors:

    (1)                  if the object we wish to refer to is unavailable at the time of declaration, then we must declare a pointer and set it to null. A reference requires an initial object.  So does a tracking reference.

    (2)                  if we wish to refer to more than a single object during the lifetime of the declaration, then we must also declare a pointer. A reference cannot be reset to refer to a second or subsequent object. Neither can a tracking reference.

    What this suggests is that we need an analogous choice when the constraints of a tracking reference make it ill-suited to our design. A third, more flexible form of tracking entity, the interior pointer (interior_ptr<>), is given over to this role. It can refer to no object (but only by setting it to nullptr, not 0). And it can be reset to refer to a second or subsequent object. Moreover, it supports both pointer arithmetic and pointer comparison. For example,

    int sum( array<int> ^arr )

    {

        if ( ! arr )

             return 0;

     

        interior_ptr<int> begin = nullptr;

        interior_ptr<int> end = &arr[ fib->Length ];

        int sum = arr[0];

     

        for ( begin = &arr[1]; begin != end; ++begin )

              sum += *begin;

     

        return sum;

    }

    The declaration of an interior pointer is limited to local objects, including function parameters and return types. If we don't provide an initial value, the compiler automatically inserts code to set it to nullptr – so the explicit initialization in the above example is not strictly necessary. The type specified within the template brackets identifies the kind of object addressed; we do not indicate a pointer within the brackets unless we intend two or more levels of indirection. For example,

    public ref class Matrix sealed {

        float *m_mat;

     

    public:

        property interior_ptr<float*> Mat

        {

            interior_ptr<float*> get(){ return &mp_mat; }

        }

     

        // …

    };

     

  • Translation Guide between Managed Extensions and the new C++/CLI binding Available

    C and C++ programmers are notorious for relying on pointer indirection, and it seems blog entries are not immune to this. A translation guide attempting to exhaustively detail the differences between the original Managed Extensions for C++ (released with Visual Studio.NET) and the revised C++ binding to the CLI scheduled for Visual Studio 2005 (and attempting to provide some motivation behind each change) has been posted on MSDN at the following URL:

     

    http://msdn.microsoft.com/visualc/default.aspx?pull=/library/en-us/dnvs05/html/TransGuide.asp

     

    Although there has been considerable effort made in correcting all errors within the text, I am sadly aware that how imperfect these pieces of mine nevertheless turn out. So, on the one hand, I believe this guide will prove valuable to those needing this information. On the other hand, I also believe there are areas that (a) could have been made clearer, and (b) details that … well, that are in error. If you do use this guide and find either (a) or (b) (or some (c), (d), or (e) not itemized), please let me know, either by a comment to this entry or by a private email.

     

    Speaking of which – comments, that is. Due to the open nature of the internet, the providers of this site have found it necessary to provide a form of budget firewall – that is, they have put in place a facility to moderate the comments. This basically means that I see each comment before it becomes public, and it requires a click of my mouse in order for it to be published. I have not – not as yet, anyway – not published a comment; however, oddly enough, I can't prove that.

     

    In any case, those of you whom have been reading this from the beginning will notice that patchwork pieces of the translation guide have been posted here in one form or another. Often, comments have helped me recognize shortcomings and recast material in a hopefully more comprehensible manner. For example, my treatment of deterministic finalization in the translation guide was considerably reworked to address some of the posted comments to the initial blog entry on that topic. As W.H.Auden said in a different context, a piece of writing is never finished, it is simply abandoned.

  • An Tour of the STL.NET

    Part of the (reasonably pleasant) distractions from posting on this blog recently has been working up the first in a series of articles on STL.NET for our Visual C++ MSDN web site. The amount of work to get from an articulation of a topic to a formal publication of it is an amazingly labor-intensive 10% -- similar to the difference between prototyping a software solution and making it deployment ready. In any case, this relatively content-free entry is just to alert you of its going on-line at

     

    http://msdn.microsoft.com/visualc/?pull=/library/en-us/dnvs05/html/stl-netprimer.asp?frame=true

     

    If for some reason, this doesn't show up in the post as a clickable link, you can just visit the visual c++ subportion of the msdn site at

     

     http://msdn.microsoft.com/visualc

     

    and hopefully find a link to it there. In any case, the Visual C++ site, under the care and breeding of Brian Johnson and Ami Vora, has really been spiffed up with some very neat content and is worth a lookie-loo.

     

    For the article, David Clark, who did a wonderful job editing the piece, asked me to come up with a summary limited to 200. I misread that as 200 words, and wrote the following. I then discovered to my chagrin that it referred to 200 characters, including white space. I thought, well, white space is without content, so if I remove that, I get perhaps another 75 characters to play with, but that doesn't actually work … In any case, here is the summary that had to be sliced mercilessly:

     

    For the experienced programmer, the hardest part of moving to a new development platform such as .NET is often the absence of familiar tools through which she has honed her skills and on which she depends. For the experienced C++ programmer, one such essential toolkit is the Standard Template Library (STL), and its absence under .NET until now has been a significant disappointment. With Visual C++ 2005, we fix that by providing an STL.NET library. This article, the first in a series, provides a general overview of the STL program model using STL.NET – it discusses sequential and associative containers, the generic algorithms, and the iterator abstraction that binds the two, using plenty of program examples to illustrate each point. It begins by briefly considering the alterative container models available to the .NET programmer using C++ -- the existing System::Collections library, the new System::Collections::Generic library, and, of course, STL.NET. To provide for the widest readership, this article does not require familiarity with the STL library; however, it does presume some experience with the C++ programming language.

     

    This summary, when reduced to 200 characters, plus the white space i threw back in, ended as follows:

     

    With Visual C++ 2005, the Standard Template Library (STL) has been re-engineered to work under the .NET Framework. This article, the first in a series, provides a general tour of STL.NET.

     

    Talk about a poor relative. In any case, this article is culled from a text I am writing on the C++ binding to the CLI, so if you have any concerns or comments or a content wish-list, please drop me a line.

  • Why C++ Supports both Class and Typename for Type Parameters

    Recently, someone asked me why we support both class and typename within C++ to indicate a type parameter since the keywords do not hold any platform significance – for example, class is not meant to suggest a native type nor is typename meant to suggest a CLI type. Rather, both equivalently indicate that the name following represents a parameterized type placeholder that will be replaced by a user-specfied actual type.

    The reason for the two keywords is historical. In the original template specification, Stroustrup reused the existing class keyword to specify a type parameter rather than introduce a new keyword that might of course break existing programs. It wasn't that a new keyword wasn't considered -- just that it wasn't considered necessary given its potential disruption. And up until the ISO-C++ standard, this was the only way to declare a type parameter.

    Reuses of existing keywords seems to always sow confusion. What we found is that beginners were whether the use of the class constrained or limited the type arguments a user could specify to be class types rather than, say, a built-in or pointer type. So, there was some feeling that not having introduced a new keyword was a mistake.

    During standardization, certain constructs were discovered within a template definition that resolved to expressions although they were meant to indicate declarations. For example,

    template <class T>

    class Demonstration {

    public:

    void method() {

        T::A *aObj; // oops …

         // …

    };

    While the statement containing aObj is intended by the programmer to be interpreted as the declaration of a pointer to a nested type A within the type parameter T, the language grammar interprets it as an arithmetic expression multiplying the static member A of type T with aObj and throwing away the result. Isn't that annoying! (This sort of dilemna is not possible within generics – there is no way to safely verify that any T contains an A so that the runtime can safely construct an instance of the generic type.)

    The committee decided that a new keyword was just the ticket to get the compiler off its unfortunate obsession with expressions. The new keyword was the self-describing typename. When applied to a statement, such as,

    typename T::A* a6; // declare pointer to T’s A

    it instructs the compiler to treat the subsequent statement as a declaration. Since the keyword was on the payroll, heck, why not fix the confusion caused by the original decision to reuse the class keyword. Of course, given the extensive body of existing code and books and articles and talks and postings using the class keyword, they chose to also retain support for that use of the keyword as well. So that's why you have both.

  • Why C++/CLI Supports both Templates for CLI Types and the CLI Generic Mechanism

    I've been recently puzzling out a strategy for presenting the two mechanisms supporting parameterized types available to the C++/CLI  programmer: she can use either the template mechanism adapted for use with CLI types, or the CLI generic mechanism. This is not unique to the support of parameterized types, of course, but it seems a lightening rod for pernicky questions:

    (1) isn't the support for two mechanisms that are similar in intent but which differ in both gross and subtle semantic behavior confusing for a user?

    (2) doesn't the dual nature of these constructs increase the liklihood of programmer error?

    (3) isn't this a canonical illustration of the undisciplined nature of the C++ language design where everything but the kitchen sink seems to get thrown in? you guys just can't get your act together, can you?

    Let's see what kind of answers I can offer. [Disclaimer: of course, these are my thoughts, and do not represent either the corporate views or policies of Microsoft.]

    The C++ binding to the CLI which I refer to as C++/CLI represents an integration of two separate object models: the static object model of native C++ and the dynamic program model of the CLI. We've seen conflicts between these two models before – in particular between the native and CLI enum, the native and CLI array, and the native and CLI reference class.

    Under the CLI object model, individual languages are – not to put too fine a point on it – somewhat diminished – much as how in a modern country, the individual states while soverign are constrained to the laws of the central authority. For example, the CLI defines the underlying type system within which a language operates, as well as the inheritance model. As we saw in an earlier blog, the CLI does not, for example, support private inheritance, value inheritance (that is, the inheritance of implementation but not of type), or multiple inheritance (MI). While a language can choose to support these aspects of inheritance, that support requires a mapping onto the existing CLI object model because there is no direct support.

    The Eiffel language under CLI, for example, choose to provide an MI mapping because it felt that (a) MI is a valuable inheritance model, and (b) its users would be dimished without its support under the CLI – that is, it would give its users a dimished programming experience under the CLI than on native platforms, and this would likely deter them from migrating to the CLI itself – or at least of migrating to the CLI while continuing to exercise their Eiffel expertise and culture.

    We did not feel the same imperative as Eiffel with regards to multiple inheritance. But we did feel that imperative towards deterministic finalization of reference types declared within a local block, and so we provided a mapping of a class destructor to a IDisposable::Dispose() method, which is the CLI pattern of reclaiming resources prior to garbage collection finalization. Similarly, we did feel the imperative towards automatic memberwise copy and initialization – as supported by a copy assignment operator and copy constructor – and so we provided a mapping. (But these mappings are constrained by the underlying CLI implementation. We could not map memberwise copy into a value class because we could not guarantee that it could be carried out in all circumstances – at least that is my understanding. I haven't myself verified that but taken it on faith.)

    Again, we did this because without these mappings of essential aspects of the native C++ programming experience, we believe the C++ user would have a diminished experience of programming under the CLI than on the native platform, and that this would deter them from migrating to the CLI – or at least of migrating to the CLI while continuing to exercise their C++ expertise and culture. The template mechanism is another of the essential aspects of modern C++ programming. We believe its absence would represents a significant hole in quality of programmer life when using C++/CLI. Personally, that is my deep belief.

    So, with regard to parameterized types, it felt imperative that we provide some mechanism beyond what was offerred by the System::Collections namespace. The first mechanim that naturally comes to mind to the C++ programmer, of course, is templates. But what about generics? Why couldn't C++/CLI use generics for containing the CLI types, and leave templates for the non-CLI types? Why map the template mechanism into C++/CLI to support the CLI reference class, value class, interface class, delegate, and function?

    The honest answer is because we were left with no choice. One of the generic talking points in presentations, specifications, and hallway whispering, is that generics, while borrowing from C++ template, "does not suffer many of the complexities" of C++ templates.

    What are considered as superfluous complexities of C++ templates and were therefore eliminated from the generic mechanism – partial template specialization, the ability to inherit from the type parameter, support for non-type and template type parameters, the ability to specialize either an entire or selected members of a template, and so on – are considered by professional C++ programmers and designers of the language as essential modern programming design patterns that are fundamental to existing production code and widely-used libraries, such as the STL, LOKI, and Boost.

    The real problem is that although the C++ community and language designers and implementors have deep experience with parameterized types, that experience was not tapped while the design of generics were underway.

    So, we did not have any choice but to provide support of templates for CLI types and to provide an STL.NET implementation. This is great stuff if you care about C++ and want to see it succeed under .NET. Except for performance issues, C++/CLI, in my biased opinion, is shaping up as the premier C++ experience available. Personally, I'm so keen on the new language that I'm planning to reimplement my mscfront translator into the C++/CLI code from native. I'll be reporting my progress on that in quite some detail in a series of blog entries once I get the C++/CLI text I'm working on in shape.  

    So, that's why we have templates. Why did we also provide support for generics? Generics are deeply integrated into the CLI, and for that reason solve a number of problems left unsolved in C++ – in particular, the instantiation model. Because there is no concept of a runtime within native C++, there is no native concept of how a template is instantiated[1] – that is, when the binding of an actual parameter to the formal parameter occurs and to what extent. The work of the ISO committee in this area has not been stellar.

    Generics provides a constraint mechanism, something whose absence is keenly felt in the template mechanism.[2] A generic type is recognized by the CIL – the intermediate language; a template class is not, and so template classes are not cross-language and, it turns out, not cross-assembly as well. That is to say, every serious CLI language has to provide generic support. And that is what we do. Perhaps if we had been participants in the design, the outcome could have been different. But that, to repeat my aunt's favorite refrain, is water under the bridge.

    So, from my perspective, that is why we support both the template and generic solutions for parameterized types, and have tried to integrate them into a elegant symmetry.

     

     

     



    [1] Not surprisingly, the vocabulary for the different aspects and actions on a parameterized types are quite different between generics and templates – and I am not going down that path.

    [2] Bjarne's original Usenix template paper included a discussion of constraint syntax, which he then chose not to incorporate into the design. It's absence means that there is no way to know prior to actual instantiation whether a type is qualified to be used with a particular template class – other than examining the source code – which is not really practicable.

  • Identifying the Candidate Function Set

    Sorry it is taking me so long these days. I am in the throes of more formal writing – a book on our CLI binding for C++, and a series of articles for our Visual C++ MSDN website on STL.NET. And my translation tool is happily going through a formal test cycle – thank you Mitchell and Arjun – and so I've been fixing bugs and being a developer once more again. So the blog gets bogged down.

    I thought I would follow up on the issue of overload function resolution. And so this entry discusses how a candidate function list is built up. (I realize this is somewhat esoteric, but. Well, I hope someone finds it worth a spin around the page.)

    The first question of course, is, what the heck is a candidate function list? Literally, a candidate function list is the set of functions sharing the same name visible at a call point. As we'll see, the complexity, when present, has to do with determining the actual set of visible functions.

    It's always good to begin with a simple example – hopefully, if we can master this one, our confidence will surge, and we can march on to the question of namespace, qualified type names in the signature of a function, and the using declaration in general. Here is our example,

    void f(); // (1)

    void f( String^ ); // (2)

    void f( const string& ); // (3)

    void f( String^, array<String^>^ ); // (4)

     

    int main( array<String^> ^args )

    {

        if ( args == nullptr )

             handle_invalid_command_line();

     

        for each ( String^ s in args )

            f( s );

    }

    There are four candidate functions to the call of f() within main(). By inspection, we can see that only two of them are viable – that is, only the (2) and (3) instances can match the actual invocation. (And (2) is the best viable function.)

    So that was simple. Things start getting somewhat more complicated if the type of a function argument is declared within a namespace. Let's first look at a fully qualified name. In this case, the functions within the namespace that have the same name as the called function are added to the set of candidate func­tions. For example:

    namespace CLITypes

    {    

    public ref class C {…};

    void takeC( C^ );

    }

     

    // …

     

    void f( CLITypes::C^ cobj )

    {        

        // ok: calls CLITypes::takeC( C^ )

        takeC( cobj );

    }

    There is no takeC() function declared within the global scope in which f() is defined. There is, as well, no using declaration opening up a namespace. So, on first glance, this appears to be an illegal invocation: the candidate set appears to be empty.

    However, because the argument is qualified to occur within the CLITypes namespace, the functions declared within that namespace are considered as well. That is, the full set of candidate functions under this circumstance represents the union of the functions visible at the point of call and the functions declared within the namespaces of the argument types.

    There are three general cases in which functions with the same name do not overload – at least currently. (There is some activity within the ECMA committee that hasn't jelled as yet one way or another – at least as far as I'm aware.)

    1.      A derived class function that reuses the name of a base class virtual function overrides rather than overloads the base class instance. Except when the new slot modifier is applied, the derived class instance must conform to the signature of the base class function. It substitutes for the base class instance within the derived class virtual table.

    2.      A derived class function that reuses the name of a non-virtual base class function hides rather than overloads the base class instance. The signatures of the base and derived class instance are not considered. This is because the overload candidate set for a function does not extend across scope boundaries. (This is the guy currently under siege, I believe.)

    3.      A function declared within a local block hides rather than overloads all named instances of that function within the enclosing scopes for the extent of that block. This is the most esoteric of the three cases, so let me provide a quick example,

    String^ Marshall( int ); // (1)

    String^ g() {

    {

        // these puppies hide global instance …

        String^ Marshall( double ); // (2)

        String^ Marshall( char* );  // (3)

     

        return Marshall( 1024 ); // resolves to (2)

    }

    In this example, the global instance of Marshall() is not visible within g(); the candidate functions are limited to the two declarations within g() itself. The char* instance is not a viable candidate function for an actual argument of 1024. This leaves us with instance (2) match the formal parameter of type double through a standard conversion although the global instance, if considered, represents an exact match.

    The candidate functions also depend on the visibility of using declarations at the call point. This is because a using declaration opens up a namespace. For example,

    namespace libs_R_us {

        int max( int, int );

        double max( double, double );

    }

     

    char max( char, char );

    void func()

    {

        // namespace functions not visible

        // the three calls resolve to global max( char, char )

     

        max( 4096, 8192 );

        max( 35.1, 35.9 );

        max( 'J', 'L' );

    }

    In this case, the only function visible is the function declared in global scope. It is therefore the only candidate function, and is the instance invoked by all three calls within func(). This results the loss of precision in both arithmetic invocations. We have two choices for correcting this, both of which make use of a using directive to open the namespace. The question is where we should place it.

    One possibility is to place using declaration in global scope. For example,

    char max( char, char );

    using libs_R_us::max; // using declaration

    All three instances of max() are now visible within the global scope and are placed in the set of candidate functions. The three invocations are now each an exact match to a separate instance, as follows,

    void func()

    {

        max( 4096, 8192 ); // libs_R_us::max(int,int);

        max( 35.1, 35.9);  // libs_R_us::max(double,double);

        max( 'J', 'L' );   // ::max( char, char );

    }

    Alternatively, we might choose to place the using declaration within the local scope of func(). Why would we do that? Primarily to limit the extent of the changes in our program due to the larger set of candidate functions. By adding to the candidate function set at global scope within an existing program, we are potentially changing the function invoked at each call point that does not involve an exact match. This may be a more invasive change than what we are ready to support. The alternative declaration looks as follows,

    void func()

    {

        // local using declaration

        using libs_R_us::max;

     

        // same function calls as above

    }

    Surprisingly, we get a different the set of candidate functions now. This is because using declara­tions nest.[1] With the using declaration in local scope, the global function is now hidden. The only visible functions at the call points are the two declared within the namespace, and so our character comparison resolves to the namespace instance max(int,int) through a promotion of the two character arguments.

    There are two possible solutions to getting the three functions into the candidate set. We originally choose a nested using declaration in order to localize the inclusion of the functions to just the call points within func(). One solution, of course, is to move it back to global scope. But this opens the entire assembly to potential change. Alternatively, we can add the global instance to our nested set of declarations,

    void func()

    {

        // now we have all three in the candidate set

        using libs_R_us::max;

        extern char max( char, char );

     

        // same function calls as above

    }

    The set of candidate functions are therefore the union of the functions visible at the point of the call — including the functions introduced by using declarations and using directives — and the member functions declared in the namespaces associated with the types of the arguments. For example,

    namespace basicLib {

          void print( String^ );

          void print( Object^ );

    }

     

    namespace matrixLib {

        public ref class Matrix { /* ... */ };

        void print( Matrix^ );

    }

     

    void display()

    {

        using basicLib::print;

        matrixLib::Matrix ^mObj;

     

        print( mObj );      // matrixLib::print( Matrix^ )

        print( "literal" ); // basicLib::print( String^ )

        print( 1024 );      // basicLib::print( Object^ )

    }

    Which functions are the candidate functions for the call print(mObj)? The two basicLib functions introduced by the local using declaration are candidate functions because they are visible at the point of the call. Because the function call argument is of type matrixLib::Matrix, the print() function declared within the namespace matrixLib is also a candidate func­tion.

    Once the candidate functions are identified, the next step – which begins to involve type checking – is to determine the viable functions within the candidate set. That topic is a candidate for a subsequent blog.



    [1] Using directives do not nest. That is, using libs_R_us makes the namespace members visible as if they were declared outside the namespace at the location where the namespace is defined. Therefore, whether the using directive is nested or global makes no difference.

  • The Value of a Value Class

    A reader questions the nature of the value type when he writes,

     

    Sender: Slawomir Lisznianski
    =====================================
    1) Lack of support for SMFs makes value classes unnatural to use. An example in the C++/CLI spec at page 33 is incorrect, as it uses constructors with value classes. In fact, quite a few value class examples in the specification contradict with paragraph 21.4.1.

     

    SMF, for the uninitiated, means special member functions, and in this case refers to the constraint on a value class that it cannot declare a default constructor, copy constructor, copy assignment operator, or destructor.

     

    Mechanically, the reason these special member functions are not supported, I am told, is because there exists conditions during run-time in which it is not possible for the compiler to insert the appropriate invocations, and thus it is not possible to guarantee the semantics associated with these member functions. And so their support has been withdrawn completely. I suspect that the examples were written before the withdrawal of the default constructor, and the authors of the spec simply overlooked removing them.

     

    There are a number of negative responses one can have to this: Disbelief, disgust, savage anger are a few that come to mind.

     

    Another way of looking at this is to consider the why and when these special member functions are not required.  We do not need a copy constructor nor a copy operator when the aggregate type supports bitwise copy. Similarly, we do not need a destructor when the state of the aggregate type exhibits value semantics. Finally, if the runtime zeros out all state by default, then we do not require a default constructor. (In C++, primitive data types are not automatically zeroed out, and so most of our default constructor use – but granted, not all – is used to put the object in an uninitialized state.)

     

    That is, a value type in the philosophy of the CLI unified type system is a blitable entity with no internal plumbing, so to speak. That is all it naturally supports.

     

    You put a pointer in it, you got troubles – there are no special member functions to provide deep copy semantics or to free the resource addressed prior to the end of its lifetime. Let's not even consider attempting to declare complex member types. That's not what you do with value classes.

     

    I will claim that they are not unnatural. What is unnatural, but understandable presuming that you have a C++ background, are the sophisticated uses you think to put these rather unsophisticated types. When you think value class, think integer. Then things will begin to click for you.

     

    I will address your second question in a subsequent blog: So what's the rationale for trackable references ... ?

  • String Literal Conversion to String: Is It a Disaster?

    A reader asks,

     

    Sender: Jack

    re: String Literals are now a Trivial Conversion to String

    Won't it break most of existing libraries who will try to port to C++/CLI? One override for String^ will break a lot of user code and make calls for the overriden function with string literals look much uglier. Maybe it is better to make two types of string literals differ by, say literal prefix (i.e. old string literals would look like "this" and new ones like c"this")?

     

    As Adam Merz noted in his response to Jack's question of a literal modifier,

     

    That is exactly what Managed Extensions for C++ did with the S prefix, and what they are trying to avoid at this point (I would think)...

     

    Adam is correct. Here is how I described the change in an internal translation guide between the Managed Extensions for C++ and the revised C++/CLI language – this will give you the historical context for why we made the original change.

     

    In the original language design, a managed string literal was indicated by prefacing the string literal with an S. For example,

     

                String *ps1 = "hello";

          String *ps2 = S"goodbye";

     

    The performance overhead between the two initializations turns out to be non-trivial, as the following MSIL representation demonstrates as seen through ildasm:

     

    // String *ps1 = "hello";

    ldsflda    valuetype $ArrayType$0xd61117dd

         modopt([Microsoft.VisualC]Microsoft.VisualC.IsConstModifier)

         '?A0xbdde7aca.unnamed-global-0'

    newobj instance void [mscorlib]System.String::.ctor(int8*)

    stloc.0

     

    // String *ps2 = S"goodbye";

    ldstr      "goodbye"

    stloc.0

     

    That’s a pretty remarkable savings for just remembering [or learning] to prefix a literal string with an S. In the revised V2 language, the handling of string literals is made transparent, determined by the context of use. The S no longer needs to be specified.

     

    What about cases in which we need to explicitly direct the compiler to one interpretation or another, as in the case of an overloaded pair of functions?

     

    void f(char*);

    void f(String^);

     

    f("ABC");           // by default calls f(char*)

     

    In the revised language, an explicit cast is used rather than the prefix S. For example,

     

    f(( String^ )"ABC"); // ok: invoked f( String^ )    

     

    As you can see, the revised language originally sought merely to correct a failing in the original design – the surprisingly performance penalty for a user misstep in the declaration of a CLI string literal.

     

    This subsequent refinement during the standardization of the language under ECMA represents imo a rebalancing of the CLI and native type systems -- an acknowledgement of the need for the design to be Janus-faced (an image I introduced in one of the three introductory blogs) – that is, to look equally on the needs of the CLI and native programmer.

     

    Which brings us back to Jack's original question: isn't this going to blow existing code out of the water? Or, to put it more crudely, isn't this a disaster?

     

    Well, I don't believe so, although I wasn't part of the decision-making process and since the change is currently undocumented as far as I am aware, I have not seen any analysis of the effect of the language change. So, let me give you my take on the effect, and give my reasoning as to why I don't see it as being quite as dire as does Jack.

     

    1. The addition of a trivial conversion of a string literal to String^ is equivalent to that of const char* rather than taking precedence. The determination of a best viable function, therefore, results in the introduction of an ambiguity rather than to silently change the resolution (and therefore behavior) of the program. The user would then explicitly cast the invocation to the intended type; admittedly, this could be burdensome, but it is not dangerous.

     

    Under C#, a change in the access level of a class member silently changes the resolution of a named reference in an existing program – which may be completely unknown to the person making the change. We are not talking here of that level of semantic rupture, but merely the difficulty of accommodating the mechanical importation of native code into the CLI program space.

     

    1. So, let us try to address the issue of burden, which is what I suspect this all reduces to. I don't believe it is really all that extensive because the C++ overload mechanism operates by scope rather than signature, and so the set of candidate functions is constrained – unlike a language like C#, for example, where a change like this would be, I suspect, more severe and difficult to manage. (Again, I am not part of the design process currently and so I haven't drilled down on this as deeply as I otherwise might.)

     

      1. Candidate functions are limited by scope. Therefore, the extent of the effect of adding an f(String^) to an existing set of f() is limited to the scope in which it is introduced.

     

    To be exhaustive, we would break this down into the possible scope scenarios – local function, independent class, class hierarchy, namespace, and global scope – and analyze the extent of impact within each. I have not done that, but I suspect that it is really only the global and namespace introductions that are potentially disruptive and burdensome.

     

    Class hierarchies are not burdensome because names do not overload across base/derived class boundaries in C++ since they maintain their own scope. Namespaces are potentially burdensome because of using declarations, and the global namespace is burdensome just because it is the global namespace.

     

      1. the introduction of an f(String^) within a class or class hierarchy is not truly burdensome in its potential introduction of an ambiguity. While it may cause a cascade of ambiguities with the introduction of f(String^) at global scope, or within a heavily-used namespace, the real questions then is a design question, imo. What is the purpose of introducing the f(String^) in a space in which const char* is still heavily used? Perhaps in migrating the existing native code base, we should exercise some design refactoring of the interface?  What is the benefit of supporting both const char* and String in a CLI environment? and so on.

     

    So, I think the spirit of the change is in the right direction. The solution isn't perfect, however, since this represents the only potentially truncating trivial conversion – this is what I mean when I say that imo it is not strictly ISO-C++ conforming: there are no other trivial conversions that suffer a loss of precision – those are all more costly conversions. Of course, on the other hand, String^ is not ISO-C++, but I am not being that literal here. The problem is that if I place a wide-character in the string literal while it exactly matches String^ in the abstract, in practice, it is first parsed as a const char*, and so the second byte is discarded.  (Thank you, Dave Waggoner, for pointing out that problem.)

     

  • String Literal Conversion to String: Is It a Disaster?

    A reader asks,

     

    Sender: Jack

    re: String Literals are now a Trivial Conversion to String

    Won't it break most of existing libraries who will try to port to C++/CLI? One override for String^ will break a lot of user code and make calls for the overriden function with string literals look much uglier. Maybe it is better to make two types of string literals differ by, say literal prefix (i.e. old string literals would look like "this" and new ones like c"this")?

     

    As Adam Merz noted in his response to Jack's question of a literal modifier,

     

    That is exactly what Managed Extensions for C++ did with the S prefix, and what they are trying to avoid at this point (I would think)...

     

    Adam is correct. Here is how I described the change in an internal translation guide between the Managed Extensions for C++ and the revised C++/CLI language – this will give you the historical context for why we made the original change.

     

    In the original language design, a managed string literal was indicated by prefacing the string literal with an S. For example,

     

                String *ps1 = "hello";

          String *ps2 = S"goodbye";

     

    The performance overhead between the two initializations turns out to be non-trivial, as the following MSIL representation demonstrates as seen through ildasm:

     

    // String *ps1 = "hello";

    ldsflda    valuetype $ArrayType$0xd61117dd

         modopt([Microsoft.VisualC]Microsoft.VisualC.IsConstModifier)

         '?A0xbdde7aca.unnamed-global-0'

    newobj instance void [mscorlib]System.String::.ctor(int8*)

    stloc.0

     

    // String *ps2 = S"goodbye";

    ldstr      "goodbye"

    stloc.0

     

    That’s a pretty remarkable savings for just remembering [or learning] to prefix a literal string with an S. In the revised V2 language, the handling of string literals is made transparent, determined by the context of use. The S no longer needs to be specified.

     

    What about cases in which we need to explicitly direct the compiler to one interpretation or another, as in the case of an overloaded pair of functions?

     

    void f(const char*);

    void f(String^);

     

    f("ABC");           // by default calls f(const char*)

     

    In the revised language, an explicit cast is used rather than the prefix S. For example,

     

    f(( String^ )"ABC"); // ok: invoked f( String^ )    

     

    As you can see, the revised language originally sought merely to correct a failing in the original design – the surprisingly performance penalty for a user misstep in the declaration of a CLI string literal.

     

    This subsequent refinement during the standardization of the language under ECMA represents imo a rebalancing of the CLI and native type systems -- an acknowledgement of the need for the design to be Janus-faced (an image I introduced in one of the three introductory blogs) – that is, to look equally on the needs of the CLI and native programmer.

     

    Which brings us back to Jack's original question: isn't this going to blow existing code out of the water? Or, to put it more crudely, isn't this a disaster?

     

    Well, I don't believe so, although I wasn't part of the decision-making process and since the change is currently undocumented as far as I am aware, I have not seen any analysis of the effect of the language change. So, let me give you my take on the effect, and give my reasoning as to why I don't see it as being quite as dire as does Jack.

     

    1. The addition of a trivial conversion of a string literal to String^ is equivalent to that of const char* rather than taking precedence. The determination of a best viable function, therefore, results in the introduction of an ambiguity rather than to silently change the resolution (and therefore behavior) of the program. The user would then explicitly cast the invocation to the intended type; admittedly, this could be burdensome, but it is not dangerous.

     

    Under C#, a change in the access level of a class member silently changes the resolution of a named reference in an existing program – which may be completely unknown to the person making the change. We are not talking here of that level of semantic rupture, but merely the difficulty of accommodating the mechanical importation of native code into the CLI program space.

     

    1. So, let us try to address the issue of burden, which is what I suspect this all reduces to. I don't believe it is really all that extensive because the C++ overload mechanism operates by scope rather than signature, and so the set of candidate functions is constrained – unlike a language like C#, for example, where a change like this would be, I suspect, more severe and difficult to manage. (Again, I am not part of the design process currently and so I haven't drilled down on this as deeply as I otherwise might.)

     

      1. Candidate functions are limited by scope. Therefore, the extent of the effect of adding an f(String^) to an existing set of f() is limited to the scope in which it is introduced.

     

    To be exhaustive, we would break this down into the possible scope scenarios – local function, independent class, class hierarchy, namespace, and global scope – and analyze the extent of impact within each. I have not done that, but I suspect that it is really only the global and namespace introductions that are potentially disruptive and burdensome.

     

    Class hierarchies are not burdensome because names do not overload across base/derived class boundaries in C++ since they maintain their own scope. Namespaces are potentially burdensome because of using declarations, and the global namespace is burdensome just because it is the global namespace.

     

      1. the introduction of an f(String^) within a class or class hierarchy is not truly burdensome in its potential introduction of an ambiguity. While it may cause a cascade of ambiguities with the introduction of f(String^) at global scope, or within a heavily-used namespace, the real questions then is a design question, imo. What is the purpose of introducing the f(String^) in a space in which const char* is still heavily used? Perhaps in migrating the existing native code base, we should exercise some design refactoring of the interface?  What is the benefit of supporting both const char* and String in a CLI environment? and so on.

     

    So, I think the spirit of the change is in the right direction. The solution isn't perfect, however, since this represents the only potentially truncating trivial conversion – this is what I mean when I say that imo it is not strictly ISO-C++ conforming: there are no other trivial conversions that suffer a loss of precision – those are all more costly conversions. Of course, on the other hand, String^ is not ISO-C++, but I am not being that literal here. The problem is that if I place a wide-character in the string literal while it exactly matches String^ in the abstract, in practice, it is first parsed as a const char*, and so the second byte is discarded.  (Thank you, Dave Waggoner, for pointing out that problem.)

     

  • String Literals are now a Trivial Conversion to String

     

    There was a recent internal thread on the resolution of the following set of overloaded member functions of a reference class R. It represents a change in the earlier definition of C++/CLI, and a difference in type behavior that I reported in an earlier blog, and so I believe it is worth discussing. Here is the code snippet,

     

    public ref class R {

    public:

      void foo( String^ );

      void foo( const char* );

    };

     

    void bar( R^ r )

    {

      // which one?

      r->foo( "abc" );

    }

     

    The question of the moment is, which instance of foo() is invoked? Since there is more than one instance, this requires the function overload resolution algorithm being applied to the call. The presumption of this blog entry and the few that follow this is that the majority of readers are likely unsure how to formally think through the algorithm. What I'd ask you to do before reading on is decide what you believe is the correct program behavior and have an explanation clear in your mind.

     

    The formal resolution of an overload function involves three steps.

     

    1. The collection of the candidate functions. The candidate functions are those methods within the scope that lexically match the name of the function being invoked. For example, since foo() is invoked through an instance of R, all named functions foo that are not a member of R (or of its base class hierarchy) are not candidate functions. In our example, there are two candidate functions. These are the two member functions of R named foo. A call can fail during this phase if the candidate function set is null.

    2.      The set of viable functions from among the candidate function. A viable function is one that can be invoked with the arguments specified in the call, given the number of arguments and their types. In our example, both candidate functions are also viable functions. A call can fail during this phase if the viable function set is null.

    1. Select the function that represents the best match of the call. This is done by ranking the conversions applied to transform the arguments to the type of the viable function parameters. This is relatively straight-forward with a single parameter function; it becomes somewhat more complex when there are multiple parameters. A call can fail during this phase if there is no best match. That is, if the conversions necessary to transform the type of the actual argument to the type of the formal parameter are equally good. The call is then flagged as ambiguous.

    In an earlier existence of the language, the resolution of this call invoked the const char* instance as the best match. In the present version of the language, the conversion necessary to match "abc" to const char* and String^ are now equivalent – that is, equally good – and so the call is flagged as bad – that is, as ambiguous.

     

    This leads us to two questions:

     

    1. What is the type of the actual argument, "abc"?
    2. What is the algorithm for determining when one type conversion is better than another?

    The type of the string literal "abc" is const char[4] – remember, there is an implicit null terminating character at the end of every string literal.

     

    The algorithm for determining when one type conversion is better than another involves placing the possible type conversions in a hierarchy. Here is my understanding of that hierarchy – all these conversions, of course, are implicit. Using an explicit cast notation overrides the hierarchy similar to the way parentheses overrides the usual operator precedence of an expression.

    1.      An exact match is best. Surprisingly, for an argument to be an exact match, it does not need to exactly match the parameter type; it just needs to be close enough. This is the key to understanding what is going on in this example, and how the language has been changed.

    2.      A promotion is better than a conversion. For example, promoting a short int to an int is better than converting an int into a double.

    3.      A standard conversion is better than a boxing conversion. For example, converting an int into a double is better that boxing an int into an Object.

    4.      A boxing conversion is better than an implicit user-defined conversion. For example, boxing an int into an Object is better than applying a conversion operator of a SmallInt value class.

    5.      An implicit user-defined conversion is better than no conversion at all. An implicit user-defined conversion is the last exit before Error (with the caveat that the formal signature might contain a param array or ellipsis at that position).

    So, what does it mean to say that an exact match isn't necessarily exactly a match? For example, const char[4] does not exactly match either const char* or String^, and yet the ambiguity of our example is between two conflicting exact matches!

     

    An exact match, as it happens, includes a number of trivial conversions. There are four trivial conversions under ISO-C++ that can be applied and still qualify as an exact match. Three are referred to as lvalue transformations. A fourth type is called a qualification conversion. The three lvalue transformations are treated as a better exact match than one requiring a qualification conversion.

     

    One form of the lvalue transformation is the native-array-to-pointer conversion. This is what is involved in matching a const char[4] to const char*. Therefore, the match of foo("abc") to foo(const char*) is an exact match. In the earlier incarnations of our C++/CLI language, this was the best match, in fact.

     

    For the compiler to flag the call as ambiguous, therefore, requires that the conversion of a const char[4] to a String^ also be an exact match through a trivial conversion, something that is not currently documented in public language specification, so we can all be forgiven for being surprised at the current behavior.

     

    So this represents a fifth trivial conversion, one unique to C++/CLI. If you think about it, it makes good sense – in the same vein as having a CLI enum more nearly match Object than an arithmetic type. But it also represents the difficulties facing the experienced C++ programmer in crossing over from Kansas to Oz.

     

     

  • Factor, Don't Complicate

    A reader asks the following question,

     

    Sender: bv

     

    re: A Question about Copy Constructors in C++/CLI

     

    Ok, i have a question.

    What happens if: suppose you want to make a shallow copy. Ex:

     

    ClassX obj(somePtr);//internally obj.m_ptr = SomePtr;

     

    Now

     

    ClassX obj2(obj);

     

    What problems can occur in such cases?

    because now neither obj nor obj2 owns the member ptr. so none should destroy them.

    And i want that, only obj should be able to modify the ptr and not obj2. How will i do it?

     

    The purpose of a copy constructor (and copy assignment operator) is to allow users to overwrite the default behavior. In the C language, the default behavior of copying one aggregate object with another is bitwise copy. In the original C++ implementation, this was also the behavior of copying one class object with another. However, the greater functionality of a class over that of a C struct required changing the default behavior to that of memberwise copy – that is, to recognize the integrity of member and base class sub-objects.

     

    The complexity within C++ of copying one class object with another falls into two general categories – at least into two general categories that I wish to address:

     

    1. When we use primitive members such as pointers that reflect shallow copy and fall outside the constructor as resource acquisition pattern. In effect, we have to provide our own deep copy semantics.

     

    1. When we decide to implement complex behavioral patterns, such as, for example, copy on write, reference counting, and all sorts of neat abstract relationships that fall outside the default copy behavior.

     

    The default copy behavior of CLI reference types is shallow copy. That is, a reference type is a duple consisting of a named tracking handle and an object allocated on the CLI heap. The copying of one tracking handle to another results in both handles addressing the same heap object. In a garbage collection environment, the too-early destruction problem of ISO-C++ goes away.

     

     So, with that background, let’s go to the reader’s question.

     

    ClassX obj(somePtr);  //internally obj.m_ptr = SomePtr;

    ClassX obj2(obj);

     

    What problems can occur in such cases?

    Well, the first thing is, I will presume that this is an ISO-C++ class, not a C++/CLI class given the emphasis on pointers. So we are talking about a classic solution – that is, with memberwise default behavior. The minimal class we can extrapolate from this small code sample is,

     

    class X

    {

        T * m_ptr;

    public:

     

        // ClassX obj(somePtr); 

        X( T * somePtr ) : m_ptr( somePtr ){}

     

    };

     

    This is all that is required to support the examples provided by the writer. An initialization of one X object with another, such as

     

    X x1( myT );

    X x2( x1 );

     

    by default results in the bitwise copy of x2 with x1 without the explicit invocation of a synthesized copy constructor, with x2.m_ptr holding the same address as x1.m_ptr.

     

    The reader then asks,

     

    What problems can occur in such cases?

    because now neither obj nor obj2 owns the member ptr. so none should destroy them.

    And i want that, only obj should be able to modify the ptr and not obj2. How will i do it?

     

    Well, because both objects manipulate the same object through their pointer member, any write may come as a surprise; moreover, if they are on separate threads, there is a need for locking on write operations.

     

    As the user notes, it is not a good idea for either x1 or x2 to destroy the object addressed without somehow synchronizing it with the other – again, this is exactly the problem that garbage collection solves, removing the burden on the user.

     

    To synchronize the destruction, one has to come up with some form of reference counting mechanism – the first discussion of that in C++ was James Coplien’s Advanced C++. Bjarne provides an example in his C++ Programming Language. I’m not going to go into the actual implementation.

     

    The second question is, how should one restrain changes to the object that are pointed to by multiple instances of class X such that there is one `master’ and many … well, readers. The best way, in my opinion, to do this is to factor the design into two classes – and allowing the readers read-only access rather than holding a pointer to it. Otherwise, you have a rather clunky design in which the object has to ask, am I allowed to modify this guy I point to? Does the writer, or master, still exist? [probably can’t answer that] – and there is no notification semantics that one could employ, etc.

     

    So, the answer to the second question is, come up with a design in which the characteristics of the one writer or many readers is built into the types; otherwise, the class is non-intuitive and will likely confuse users and be a cause of error and frustration.   

  • Putting C++ and the CLI in Perspective

    A surprising thread that has emerged has to do with libraries. Here is an representative example,

     

    Sender: Dobes

    =====================================

     

    re: Towards a Natural Theory of Programming Languages

     

    I also think that the survival of a language is also more an issue of libraries than of language. 

     

    I prefer Java because it has gui,threading, sockets, class loading, etc. all built in to the standard library.  Since these are the primary features that differ between platforms, the majority of code can be written to depend only on these APIs, and in doing so be automatically cross platform.

     

    C++ could push its way back into usefulness if it, too, required that all C++ implementations provided a standard API for threading, I/O, loadable modules, and user interfaces.  I doubt they will - these are very difficult to implement in a language without garbage collection.  Many of the issues just cannot be addressed to satisfaction for all programs.

     

    Java and C++ exist in different pockets of the programming community. Telephony, Aerospace, Animation, for example, are all C++ domains. JPL, for example, is doing its 2009 onboard control system in C++ not Java because Java is not performant. So, to say that C++ `could push its way back to usefulness’ is not a quantifiable statement, is not currently true; it is simply an expression of how the writer personally feels.

     

    The environment in which ISO-C++ has a competitive disadvantage is in the dynamic programming paradigm which is imo currently exemplified by CLI and ok, Java. This is why we have been working so hard for the last two years to – ok – adapt ISO-C++ into C++/CLI which integrates the standard C++ Object model with the CLI Object model. We believe that we have established a competitive baseline language that is as good as Java in this domain. That is our hypothesis, in any case …

     

    The argument about libraries is both valid and moot. It is valid because I agree with it J. It is moot because the whole elegance of the CLI is that it provides a language-independent class library that provides everything Dobes, above, lists. If you prefer to write in standard ML of New Jersey, or in Python, or in, let the bits preserve you, Eiffel, you get the same class library and tools operating on a common intermediate language. We’re just beginning to do nifty things with this … not just in C++, but within the programming community at large and within Microsoft Research and in the academic community.

     

    While I have a preference for C++, obviously, its continued existence – as with any language – is finite. While we would like to extend its usefulness within the CLI – particularly since we see the CLI becoming the dominant environment – whether we succeed in that or not is the zest of our coming to work each day. You can’t know if we are going to fail or not beforehand … folks that think everything is decided already or by rudeness try to eliminate our efforts are inconsequential. We know that there is a lot of work to be done, and I don’t have all the answers. No one here can guarantee that. But we’re privileged enough to have the opportunity. I’m quite grateful for that.

  • System::String -> std::string

    A reader asks,

     

    Sender: Erik Brendengen

     

    Is there an easy way to convert from String^ to std::string?

                 Does String ^s = std::string( “bla”).c_str(); work the other way?

     

    This is a FAQ, one that I myself bumped into when I had to pass a System::String retrieved from System::Windows::Forms::TextBox to a native program that expected a std::string.

     

    There are generally two conversion strategies, depending on which side of the managed/native fence you wish to do the conversion work. Here are two methods making use of the PtrToStringChars( String^ ) utility from the vcclr.h header file that is part of Visual C++. Note that these are written in the new C++/CLI syntax – and that these are adapted from an internal FAQ ...

     

    #include <stdlib.h>
    #include <vcclr.h>
    #include <string>
    using namespace System;
     
    bool To_CharStar( String^ source, char*& target )
    {
        pin_ptr<const wchar_t> wch = PtrToStringChars( source );
        int len = (( source->Length+1) * 2);
        target = new char[ len ];
        return wcstombs( target, wch, len ) != -1;
    }
     
    bool To_string( String^ source, string &target )
    {
        pin_ptr<const wchar_t> wch = PtrToStringChars( source );
        int len = (( source->Length+1) * 2);
        char *ch = new char[ len ];
        bool result = wcstombs( ch, wch, len ) != -1;
        target = ch;
        delete ch;
        return result;
    }

     

    As to the second question, does String^ s = ns.c_str() work? Yes.

More Posts Next page »

© 2009 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker