A reader asks,
Sender: Jackre: String Literals are now a Trivial Conversion to StringWon't it break most of existing libraries who will try to port to C++/CLI? One override for String^ will break a lot of user code and make calls for the overriden function with string literals look much uglier. Maybe it is better to make two types of string literals differ by, say literal prefix (i.e. old string literals would look like "this" and new ones like c"this")?
As Adam Merz noted in his response to Jack's question of a literal modifier,
That is exactly what Managed Extensions for C++ did with the S prefix, and what they are trying to avoid at this point (I would think)...
Adam is correct. Here is how I described the change in an internal translation guide between the Managed Extensions for C++ and the revised C++/CLI language – this will give you the historical context for why we made the original change.
In the original language design, a managed string literal was indicated by prefacing the string literal with an S. For example,
String *ps1 = "hello";
String *ps2 = S"goodbye";
The performance overhead between the two initializations turns out to be non-trivial, as the following MSIL representation demonstrates as seen through ildasm:
// String *ps1 = "hello";
ldsflda valuetype $ArrayType$0xd61117dd
newobj instance void [mscorlib]System.String::.ctor(int8*)
// String *ps2 = S"goodbye";
That’s a pretty remarkable savings for just remembering [or learning] to prefix a literal string with an S. In the revised V2 language, the handling of string literals is made transparent, determined by the context of use. The S no longer needs to be specified.
What about cases in which we need to explicitly direct the compiler to one interpretation or another, as in the case of an overloaded pair of functions?
void f(const char*);
f("ABC"); // by default calls f(const char*)
In the revised language, an explicit cast is used rather than the prefix S. For example,
f(( String^ )"ABC"); // ok: invoked f( String^ )
As you can see, the revised language originally sought merely to correct a failing in the original design – the surprisingly performance penalty for a user misstep in the declaration of a CLI string literal.
This subsequent refinement during the standardization of the language under ECMA represents imo a rebalancing of the CLI and native type systems -- an acknowledgement of the need for the design to be Janus-faced (an image I introduced in one of the three introductory blogs) – that is, to look equally on the needs of the CLI and native programmer.
Which brings us back to Jack's original question: isn't this going to blow existing code out of the water? Or, to put it more crudely, isn't this a disaster?
Well, I don't believe so, although I wasn't part of the decision-making process and since the change is currently undocumented as far as I am aware, I have not seen any analysis of the effect of the language change. So, let me give you my take on the effect, and give my reasoning as to why I don't see it as being quite as dire as does Jack.
Under C#, a change in the access level of a class member silently changes the resolution of a named reference in an existing program – which may be completely unknown to the person making the change. We are not talking here of that level of semantic rupture, but merely the difficulty of accommodating the mechanical importation of native code into the CLI program space.
To be exhaustive, we would break this down into the possible scope scenarios – local function, independent class, class hierarchy, namespace, and global scope – and analyze the extent of impact within each. I have not done that, but I suspect that it is really only the global and namespace introductions that are potentially disruptive and burdensome.
Class hierarchies are not burdensome because names do not overload across base/derived class boundaries in C++ since they maintain their own scope. Namespaces are potentially burdensome because of using declarations, and the global namespace is burdensome just because it is the global namespace.
So, I think the spirit of the change is in the right direction. The solution isn't perfect, however, since this represents the only potentially truncating trivial conversion – this is what I mean when I say that imo it is not strictly ISO-C++ conforming: there are no other trivial conversions that suffer a loss of precision – those are all more costly conversions. Of course, on the other hand, String^ is not ISO-C++, but I am not being that literal here. The problem is that if I place a wide-character in the string literal while it exactly matches String^ in the abstract, in practice, it is first parsed as a const char*, and so the second byte is discarded. (Thank you, Dave Waggoner, for pointing out that problem.)