|
|
C++/CLI
-
A reader asks the following question,
Sender: Richard Hsu
Stan,
Would you suggest, for those of us who are yet to begin learning MC++, to wait till C++/CLI ships, because we'll not only be learning what will be obsolete syntax, but also a more difficult one.
So, would you recommend, till the release of C++/CLI compiler, it may be a better idea to use Interop in C# than go for MC++ as a temporary measure.
Let me preface these remarks by saying that this response represents my personal opinion and in no way represents official Microsoft policy.
I could not in good conscience recommend learning the Managed Extensions to C++ as a first introduction to the CLI object model and dynamic programming in general. The best books right now of course all use C# as the illustrative language – we sort of gave them no choice, imo.
However, that said, there is NO viable interopt story for native C++ other than the managed extensions – at least until we ship the C++/CLI compiler. My understanding is that the first beta is due to be released Real Soon Now (RSN).
So my answer to you is admittedly a fudge: I would go with the new language syntax and the intermediate beta releases if that is a viable solution for your environment.
Here is a draft of something I am writing that might be helpful (or not) …
For the third edition of my C++ Primer, I created a relatively large text query application that heavily exercises the STL container classes for parsing a text file and setting up an internal representation. For example, the include file looks as follows,
#include <algorithm>
#include <string>
#include <vector>
#include <utility>
#include <map>
#include <set>
using namespace std;
The data representation looks as follows:
typedef pair<short,short> location;
typedef vector<location> loc;
typedef vector<string> text;
typedef pair<text*,loc*> text_loc;
class TextQuery {
public:
// …
private:
vector<string> *lines_of_text;
text_loc *text_locations;
map<string,loc*> *word_map;
Query *query;
static string filt_elems;
vector<int> line_cnt;
};
Query is the abstract base of a class hierarchy supporting a query language. For example, here is how a query session might run:
Enter a query-please separate each item by a space.
Terminate query (or session) with a dot( . ).
==> fiery && ( bird || shyly )
fiery ( 1 ) lines match
bird ( 1 ) lines match
shyly ( 1 ) lines match
( bird || shyly ) ( 2 ) lines match
fiery && ( bird || shyly ) ( 1 ) lines match
Requested query: fiery && ( bird || shyly )
( 3 ) like a fiery bird in flight. A beautiful fiery bird, he tells her,
The native invocation of the text query system looks like this:
int main()
{
TextQuery tq;
tq.build_up_text();
tq.query_text();
}
I want to expose the TextQuery interface to a managed application without having to touch the darn thing, let alone consider reimplementing it. After all, it works. Honestly, I'm not sure if I'm ready to go to managed code if it means giving up on my extensive investment in native code.
The simplest strategy for exposing a native interface is to wrap a reference class around the native type, populating it with stub methods that invoke the associated native methods. Once performance data is collected, we may wish to cache or queue processing requests before crossing the boundary between native and managed or even selectively port critical portion of the original application. The key thing is to get it up and running without globs of expended time and headache. Here is the ten minute text query wrap,
#include "TextQuery.h"
public ref class TextQueryCLI
{
// pointer to the native type …
TextQuery *pquery;
public:
TextQueryNet() : pquery( new TextQuery()){}
~TextQueryNet(){ delete pquery; }
void query_text() { pquery->query_text(); }
void build_up_text() { pquery->build_up_text();}
// …
};
Under the current language definition, the only way to declare a native type as a member of a CLI class is to declare it as a pointer. We allocate it on the native heap through the new expression within the wrapper object's constructor. We delete it within the destructor. build_up_text() and query_text() function as stubs dispatching the call to the native TextQuery object. Here is our revised main() function
#include "TextQueryCLI.h"
int main(void)
{
TextQueryCLI tqc;
Tqc.build_up_text();
Tqc.query_text();
// destructor automatically invoked here …
}
We would not want to embed an actual native class object within a reference class – that is, to have it reside on the CLI heap. The problem is that the garbage collector is likely to relocate the object during heap compaction. Relocation does not provide a facility to invoke an associated copy constructor nor destructor.
A likely future extension to C++/CLI will relieve the programmer from the manual management of the native class. While it would still reside on the native heap, we would declare and use it as if it were an object. The compiler would transparently manage its allocation and deletion.
Wrapping a class hierarchy is slightly more subtle. For example, consider the following native hierarchy,
class Query{ … };
class BinaryQuery : public Query { … };
class AndQuery : public BinaryQuery{ … };
We'd like to introduce a reference class hierarchy that provides a wrapper to the native class hierarchy. Our first thought might be to include a pointer to the associated native type within each derived reference class. For example,
public ref class AndQueryCLI : public BinaryQueryCLI
{
// not the recommended strategy
AndQuery *paq;
// …
};
but this quickly fouls up. When our users do the following,
{
// oops: we lose our access to paq …
QueryCLI ^q = gcnew AndQueryCLI;
};
we lose direct access to the native AndQuery pointer.
A second issue has to do with the constructors associated with our wrapping classes. In a class hierarchy, a constructor invocation either represents the whole object being created, or a base class sub-object being initialized prior to the execution of the body of the whole object's constructor. When we write,
QueryCLI ^q = gcnew AndQueryCLI;
AndQueryCLI represents the whole object, and the invocation, in turn, of the QueryCLI and BinaryQueryCLI constructors represent the initialization of its two base class sub-objects.
If we place the native object pointer in the most derived class, its constructor becomes responsible for the initialization of the pointer – remember this involves native heap allocation that requires deletion within the destructor. This makes subsequent derivation extremely problematic – that is, when its invocation represents a sub-object rather than whole object initialization.
The better strategy is to store the native object pointer within the abstract base class – in our example, that would be QueryCLI, which becomes responsible for its initialization. For example,
public ref class QueryCLI abstract
{
Query *pquery;
public:
QueryCLI( Query *pq ) : pquery( pq ){}
// …
};
// …
public ref class AndQueryCLI : public BinaryQueryCLI {
public:
AndQueryCLI( AndQuery *paq )
: BinaryQueryCLI( paq )
{ /* perform class specific initialization */ }
// …
};
Notice the introduction of a context sensitive abstract keyword. This is considerably easier to understand as marking a class as abstract than the presence of a pure virtual function within the class definition.
Our abstract QueryCLI class will need to provide the interface for the entire class hierarchy it anchors. Some will argue that this hybrid implementation in which the abstract root of our hierarchy mixes both interface and implementation is bad design.
Because the wrapping pattern requires allocation of the native object on the unmanaged heap, a value class is unsuitable to contain the wrapped object. This is because the CLI object model does not support the association of a destructor with a value class. This leaves us with no way to automate the deletion of the allocated memory.
|
-
A reader writes
Sender: Brian Braatz
=====================================
I posted the following to the boost email group and receieved the following response. I was wondering if you had any comments or thoughts on this. I am personally concerned that MS will eventually ditch C++ or take away my ability to use ALL the features of the C++ lanuaguage (not now, but I feel it is coming.. someday I will have C# or VB.net as my only "choice" on the ms platforms)
There is no question that the original Managed Extensions for C++ was a severe body blow to the credibility of a viable future for C++ on the Microsoft platform. In fact, when I originally interviewed here, I said pretty much the same thing as Brian to the folks here, and it felt, anyway, that I spent much of the first half year here battling people’s perceptions that the managed exceptions while not perfect were pretty much ok. In some ways, the original release infuriated me; and that made me a less effective force for change than I might have been. It was clear to me from the outset that we had to either radically reengage our C++.NET vision or else pack up our tent and become a historical curiosity to an increasingly dynamic programming environment – something I would characterize as a tragically hopeful monster. Eventually, the whole department became mobilized; although it wasn’t until Herb Sutter took a leading role that we turned the corner, in my opinion. And I believe we have. The problem now is overturning nearly two years of being the .NET whipping boy and getting people to take us seriously – well, not us but the language, which is now called C++/CLI, and is both an ongoing ECMA standard and in liaison with the new ISO-C++ committee work. I personally guarantee that anyone that feels passionate about C++ will be both delighted by and engaged with the C++/CLI language that will be shipped with Visual Studio 2005, and should be available as a first peek in an upcoming Beta program. (I’ll engage more on the details in subsequent blogs – really J)
|
-
Without thinking too hard I had held the following precept: std::string is to ISO C++ as System::String^ is to C++/CLI. Or, to give the idea a political tint, I thought of the two string abstractions as separate but equal library class types. Therefore, I presumed the following pair of overloaded methods was inherently ambiguous,
void display( String^ );
void display( const string& );
and a call such as the following
display( "which one?" );
would be flagged as ambiguous and I would be required to indicate which instance I intended through an explicit cast. But this turns out to be way off the mark, and misses a key implication of the unified type system of the CLI.
The type of a string literal within C++/CLI, by default, is the same as in ISO-C++; that is, it is a const char*. However, if a string literal occurs in a context in which a String^ Unicode string literal is required, it is treated as such implicitly (otherwise, an explicit conversion is necessary). For example,
string str = "a native string literal";
String ^ps = "a managed string literal";
void print( String^ text );
print( "ok, recognized as a managed string literal" );
However, if we use the string literal in a context-free setting, the literal must be explicitly cast to a String^, as in the following instances,
( "hi" )-> PadLeft( 8 ); // error …
(( String^ )"Index" )->PadLeft( 8 ); // correct …
So, getting back to our original question, how is the string literal handled when we invoked our overloaded display() method?
display( "which one?" );
Under C++/CLI, a string literal goes through a hierarchical sequence of standard conversions, as follows (slightly simplified),
1. if there is no const char* parameter to match exactly, the const is discarded, and it best matches a char* parameter.
2. if there is no char* parameter, it next best matches a String^ parameter. This is what happens in our case.
3. if there is no String^ parameter, it next best matches one of the String^ base classes, either an interface, or Object^.
4. if there is no parameter of one of the String^ base classes, it next best matches a bool parameter!
Only if there is no match of a string literal through a standard conversion are user-defined conversions considered, such as is required to turn a string literal into a std::string. In this sense, a string literal is more nearly a String^ than a string.
This reflects a fundamental difference between ISO-C++ and C++/CLI. In ISO-C++, types are independent except when explicitly part of the same class inheritance hierarchy. Thus, there is no implicit type relationship between a string literal and the std::string class type, even though they share a common abstraction domain.
C++/CLI, on the other hand, supports a unified type system. Every type, including a literal value, is implicitly a kind of Object^. This is why we can call methods through a literal value or an object of the built-in types. The value 5 is of type Int32. It is derived from System::ValueType, and ValueType is derived from Object. A string literal bridges both the native and managed worlds. Its initial conversions are those of the native world. If the native promotions are not appropriate, it traverses the managed set of standard conversions. The string literal is represented as a String^ within the managed world. A String^ is derived from Object^. In addition, it implements a number of System interfaces. Its declaration looks as follows,
public ref class String sealed :
IComparable, ICloneable, IConvertible, IEnumerable {}
So, much to my initial surprise, the invocation of display() is not ambiguous – not in the least. Rather, the display(String^) is the best match. Again, this is because the creation of a std::string object requires a user-defined conversion of the string literal through the one-argument string constructor taking a const char* parameter.
Here are two further pairs of overloaded methods. If you do not understand why the particular instance is invoked, re-examine the list of four standard conversions presented above.
void display( const char *text );
void display( String^ text );
// best match: display( const char* ) …
display( "ok, invoke const char* not String^" );
void process( Object^ o );
void process( const string& s );
// best match: process( Object^ ) …
process( "ok, invoke Object^ not string" );
If you are an experienced native programmer, this probably feels somewhat unbalanced. Within C++/CLI, the traversal distance between “a string literal“ and the std::string class is a considerably wider swatch than within the native language..
|
-
A reader comments
Sender: Joe Pacheco
=====================================
re: Integrating Dispose and Finalize into the Language
What about destructors for value classes? Was it considered when you did the language design?
To become truly usable they are still missing couple of things, copy constructors being one of them. It's possible that it has some implementation difficulties but if we abstract from them for a moment, the question is if such value classes with copy constructors and destructors are of any value.
Possible applications could be containers for native objects or managed equivalent of smart pointers. I was also thinking about something similar to gcroot<> implemented with value classes that doesn't need to "root" the managed object.
I’ve forwarded the question to Brandon Bray who is one of the language design leaders of the emerging C++/CLI ECMA standard, and of Microsoft’s C++/CLR implementation of that standard in Visual Studio 2005 (there, I think I’ve mastered that nomenclature). He can detail the design thinking that lead to the original inclusion of a destructor and default constructor for a value type and the implementation obstacle(s) that subsequently led to the removal of these special member functions (delicately referred to as SMFs). The fundamental obstacle is the inability of the compiler to guarantee access to the object at creation and destruction points to invoke the appropriate method.
Both the question and the inevitable detailing of frustrated design intentions end up portraying the CLR as an obstacle to the realization of an at least more perfect C++/CLI language. That is certainly one perspective. I thought I would offer a counter perspective, and … well, then duck.
The fundamental nature of a value type it that it is blitable – that is, we can completely reproduce it with integrity by copying a fixed-size, contiguous segment of memory. All we need to implement this is the address of its first byte, and its total size in bytes. For a value type to be blitable, there are two necessary characteristics. One, it needs to maintain all its state information within itself (yes, this implies that pointers can screw up blitability). Two, it must not not contain auxiliary state that, if corrupted, leaves it undefined – for example, a virtual function table pointer. The canonical value type is an integer. This is the litmus test of every type system.
An entity is not blitable under the following conditions (because this is a blog and not a text, I invoke the right of not being exhaustive):
- It contains a copy constructor or copy assignment operator. Copying now is a semantic rather than physical operation.
- It contains either a member object or base class subobject that is not blitable. This requires memberwise rather than bitwise copy.
- It contains an auxiliary state member that supports implementation and which can be compromised by physical copying. The canonical example is the assignment of a derived class object (not pointer or reference) to a base class object. Except in the trivial case of an identical active set of virtual functions between the base and derived class, the bitwise copying of the derived object to the base class object corrupts the base class object’s internal virtual table pointer. Therefore, a form of memberwise copy is required.
- It contains one or more pointer or reference members that exhibit shallow copy, address a dynamic memory address, and the type itself defines a destructor that reclaims that memory. This is an implementation issue at the user level – a milder form of 3 – but one which has been a pitfall of C++.
That is, the ideal value type is pure state – think of a point with two or more coordinate values. It does not need a copy constructor or copy assignment operator because state is blitable. It does not need a destructor because the extent of state is its lifetime, and it is independent of all other state in other entities of this or any other type. A default constructor is not necessary because there is only state, not infrastructure to set up, and at best an object with a default initialization is in a safe but meaningless state – that is, it can be recognized as requiring initial values. So, an optimal value type – a pure value type – is one without these four SMFs. This is what the CLR provides.
Yes, there are conceptual extensions that we would like to provide. The obvious one, which you mention explicitly and which I discuss in an earlier blog, is to wrap pointers to native types. To do this without massive memory leaks, one needs a destructor. Unfortunately, the compiler is unable to guarantee that it can in all cases intervene before object destruction in order to invoke the method, and therefore it is not really possible to support a destructor for a value type. Is this terrible? Well, it is a real disappointment in the case of wrapping a native pointer in which we create massive numbers of instances since wrapping it in a reference type doubles the number of heap allocations (one for the reference, and one for the native).
How likely a scenario is this? Personally, I’m not sure. I spent a few days more than a year ago walking through a number of graded scenarios using a vector and matrix pair of classes that I had implemented in my days as a graphics programmer. I compiled and timed a program exercising these folks natively, then simply compiled and timed it using IJW. Then I wrapped the folks first in a value type – it leaked – and then a sealed reference, then reimplemented the folks in turn as value and reference types. (Of course, reference types in the original managed extensions don’t support operator overloading so that was kind of useless.) I also wrapped a nonsense query application I had written for the third edition of my C++ Primer. My sense from this exercise is that those things which would gain in performance by a value wrapper are those things that in practice are best ported directly into the CLI. But I have no practical data. Back in the late 1970s and early 1980s when the wrapper pattern became wide-spread in the C++ community, wrapping proved primarily of benefit over complex systems, such as sockets or X windows – or to shield an application from a specific api such as a database in case one had to for whatever reason replace it.
The CLI separates reference and value types in a way that is unnatural to the C++ programmer; that is, it constrains what each type is permitted in the body of its definition. C++ does not separate value and reference semantics in the definition of its types but, rather, in the declaration of its objects. This allows us a continuum rather than the ... oh, discrete quantum types of the CLI. This permits a great deal more flexibility, but at the cost of sometimes spectacular complexity in the understanding of our programs. The exemplar of this is probably best captured in the difference between the C++ template facility and the CLI generic mechanism.
|
-
A reader comments
re: Changes in Destructor Semantics in Support of Deterministic Finalization
I agree with Johan.
I use a pattern where all classes that need a finalizer must implement the Dispose pattern, and all classes with a Dipose method also have a finalizer; if you have one you must have the other. All clients using the class must call the Dispose method.
In the debug build if a finalizer ever gets called it throws an exception - it means that the client never honored the contract. In the release build we log it and continue.
About being safe during finalization...there are more issues than just which subsystems are still valid (e.g. it may not be safe to call Console.WriteLine during system shutdown), there are also issues related to thread safety. I believe it's possible for a finalizer to run at the same time that the Dispose method is called.
BTW: I'm looking at this from a perspective of pure C#, not managed C++.
This is all fine and good, but it is not language design. Rather, it is an intelligent response to an absence of support within your selected language. That is, it is a policy, and the primary difficulties of any policy are: (a) how do you communicate that policy to a heterogeneous community of programmers? (b) how do you integrate that policy with the policies imposed by other libraries that your library may be combined with and which you cannot anticipate? and (c) how do you enforce your policy over time and large scale?
What our language has done is integrate the notions of Dispose (~R()) and Finalize (!R()) into the class mechanism itself: we have provided both a vocabulary and automated support that fits intuitively as an extension of the native constructor/destructor `initialization as resource acquisition’ design of ISO-C++. I think this is both truly elegant and superior to anything provided by other .NET languages.
|
-
In the original language design, a class destructor was permitted within a reference class but not within a value class. This has not changed in the revised V2 language design. However, the semantics of the class destructor have changed considerably. The what and why of that change (and how it impacts the translation of existing V1 code) is the topic of this section. This is probably the most complicated section of the text, so we'll try to go slowly.
Before an object is deleted by the garbage collector, an associated Finalize() method, if present, is invoked. We refer to this as finalization. The timing of just when or whether a Finalize() method is invoke is undefined. This is what is meant when we say that garbage collection exhibits non-deterministic finalization.
When an object maintains a critical resource, perhaps a database connection or a lock, the freeing of this resource cannot depend on the invocation of a finalizer. The canonical solution is to implement the Dispose() method of the System::IDisposable interface. The problem with Dispose() is that it requires an explicit invocation by the user. In the revised language design, the class destructor is used to automate invocation of the Dispose() method.
In the original language, the destructor of a reference class is implemented through the following two steps:
1. The user supplied destructor is renamed internally to Finalize(). If the class has a base class (remember, under the CLR Object Model, only single inheritance is supported), the compiler injects a call of its finalizer following execution of the user-supplied code. For example, given the following trivial hierarchy taken from the V1 language specification,
__gc class A {
public:
~A() { Console::WriteLine(S"in ~A"); }
};
__gc class B : public A {
public:
~B() { Console::WriteLine(S"in ~B"); }
};
both destructors are renamed Finalize(). B's Finalize() has an invocation of A's Finalize() method added following the invocation of WriteLine(). This is what the garbage collector will by default invoke during finalization. Here is what that might look like:
// internal transformation of destructor under V1
__gc class A {
public:
void Finalize() { Console::WriteLine(S"in ~A"); }
};
__gc class B : public A {
public:
void Finalize() {
Console::WriteLine(S"in ~B");
A::Finalize();
}
};
2. In the second step, the compiler synthesizes an virtual destructor. This destructor is what our V1 user programs invoke either directly or through an application of the delete expression. It is never invoked by the garbage collector.
What is placed within this synthesized destructor? Two statements. One is a call to GC::SuppressFinalize() to make sure there are no further invocations of Finalize(). The second is the actual invocation of Finalize(). This, recall, represents the user-supplied destructor for that class. Here is what that might look like:
__gc class A {
public:
virtual ~A()
{
System::GC::SuppressFinalize(this);
A::Finalize();
}
};
__gc class B : public A {
public:
virtual ~B()
{
System::GC:SuppressFinalize(this);
B::Finalize();
}
};
While this implementation allows the user to explicitly invoke the class Finalize() method now rather than whenever, it does not really tie in with the Dispose() method solution. This is changed in the revised language design.
In the revised language design, the destructor is renamed internally to the Dispose() method and the reference class is automatically extended to implement the IDispose interface. That is, under V2, our pair of classes are transformed as followed:
// internal transformation of destructor under V2
__gc class A : IDisposable {
public:
void Dispose() {
System::GC::SuppressFinalize(this);
Console::WriteLine( "in ~A"); }
}
};
__gc class B : public A {
public:
void Dispose() {
System::GC::SuppressFinalize(this);
Console::WriteLine( "in ~B");
A::Dispose();
}
};
When either a destructor is invoked explicitly under V2, or when delete is applied to a tracking handle, the underlying Dispose() method is invoked automatically. But what if it is not? Then the garbage collector has no access to the destructor because it is not transformed into Finalize().
To accommodate this possibility, the revised language design provides a revised syntax using the bang (!) to support the explicit definition of a finalizer. For example, one might write
public ref class R {
public:
!R() { Console::WriteLine( "I am the R::finalizer()!" ); }
};
The !R() method is renamed internally as Finalize(). It is this method, if present, that is invoked by the garbage collector during finalization if the destructor has not been previously invoked. Here is what the transformation might look like:
// internal transformation under V2
public ref class R {
public:
void Finalize()
{ Console::WriteLine( "I am the R::finalizer()!" ); }
};
This has a number of gnarly consequences for V1 code. Essentially what should happen is that an explicit Dispose() method should be transformed into the class destructor. If a destructor is present, it should be transformed into a (!) finalizer. The translation tool currently doesn't do this, but is on the worklist.
|
-
Comments from Stan Lippman's BLog: Jon Flanders
re: Value Type Redux
I think it is always important to point out when having the “value types are boxed whenever they are treated as object” discussion, that if the value type overrides ToString() (or the method in question) boxing does not need to occur.
In the original Managed Extensions for C++ there is no support for implicit boxing. In the case of the ToString() example, this means that the invocation of an inherited and overridden virtual function through an object of a value type is different in the two cases. In the former case, the user must explicitly box the object or else the invocation is flagged at compile-time as an error. So, in the Managed Extensions for C++, Jon’s point is moot because the distinction is built into the language.
This had the pedagogical effect within the original Managed Extensions for C++ of teaching the programmer the underlying complexity of the unified type system, and providing a lexical incentive for the introduction of an overriding instance of the virtual function. The majority of users of the value type, however, has no authoring ability with regard the type definition and so found the lexical incentive a disincentive for using the language. In the revised language, currently under ECMA standardization as C++/CLI, implicit boxing is supported, and the general user is left blissfully unaware of the potential overhead of the call.
Which is why it is important in C# and VB.NET to call ToString() explicitly on value types, because most of them *do* override ToString(), and doing it explicitly avoids the need to box. The compiliers (C#/VB.NET) add the box instruction if you just write Console.WriteLine(v);, where typing Console.WriteLine(v.ToString()); just ends up as a virutal method call). This is true even when the value type overrides ToString().
C# and VB.NET have by choice no type vocabulary for speaking about the boxed value types on the managed heap. Console.WriteLine(), in this case, is just a special case of a larger issue -- the initialization or assignment of an Object^ with a value type. The unified type system requires that the value type be boxed in order to transform it into a handle/object duple that underlies the representation of a reference type.
While it is correct to state that invoking ToString() for those value types that have a overriding definition avoids boxing, that is a very special case for which a string representation makes sense. Were we using a Hashtable to count word occurrence, the invocation of ToString() would not be appropriate, and the user would have to live with the multiple boxings associated with the reading and writing of the boxed value types. However, the user might well never be aware of what is going on.
In the Managed Extensions for C++, the type vocabulary for speaking of a boxed value type is __box V*. This has been simplified in C++/CLI to V^ [this is discussed in more detail in an earlier blog entry]. This permits a direct handle on the representation in the managed heap, and does not require multiple boxing operations back and forth when we repeatedly read and write a boxed value.
|
-
I’d like to thank Yves Dolce for responding to Wesner Moise who writes:
Sender: Wesner Moise
re: Value Type Representation Between the Original and Revised C++
I am not sure that I understand you when you say v.ToString() results in an error, because v needs to be boxed in order to access an inherited method. V does not need to be boxed; indeed, valuetypes are not boxed in VB.NET or C#, when ToString() is called.
Yves dug a bit and wrote the following:
Sender: Yves Dolce
re: Value Type Representation Between the Original and Revised C++
I just tried with C#:
struct Complex
{
public Complex( double r, double i )
{
this.r = r ;
this.i = i ;
}
private double r, i ;
}
...
Complex c = new Complex( 1, 2 ) ;
c.ToString() ;
And the IL shows c IS boxed:
IL_000d: ldloca.s c
IL_000f: ldc.r8 1.
IL_0018: ldc.r8 2.
IL_0021: call instance void NeedToBox.Complex::.ctor(float64, float64)
IL_0026: ldloc.0
IL_0027: box NeedToBox.Complex
IL_002c: callvirt instance string [mscorlib]System.ValueType::ToString()
In fact, if memory serves, this particular problem with value types and the contextual need to box them or not was at one time a problem with generics in C#, but I don’t recall the issue. As Yves further documents, in the original C++ with Managed Extensions release,
Sender: Yves Dolce
re: Value Type Representation Between the Original and Revised C++
Wesner,
Try calling ToString() on an instance of:
_value class Complex
{
public:
Complex( double r, double i ) : m_r(r), m_i(i) {}
public:
//virtual String * ToString() { return String::Format( S"{0} + {1} i", m_r.ToString(), m_i.ToString() ) ; }
private:
double m_r,
m_i ;
} ;
I will fail with:
error C3610: 'Complex': value type must be 'boxed' before method 'ToString' can be called
As I wrote in my original blog entry,
The primary motive behind this design was pedagogical: it wished to make the underlying mechanism visible to the programmer so that she would understand the `cost’ of not providing an instance within her value type. Were V to contain an instance of ToString, the implicit boxing would not be necessary.
In thing2 [yes, referring to the two languages in this way is annoying, isn’t it?], the implicit boxing is carried out transparently:
v.ToString(); // thing2
but at the cost of possibly encouraging the class designer to introduce an instance of ToString within V. The reason the implicit boxing is preferred is because while there is usually one class designer, there are an unlimited number of users, none of whom would have the freedom to modify V to eliminate the possibly onerous explicit box.
The other reason not to require the explicit boxing is that it requires the programmer to understand what is going on under the hood, and the C# and VB.NET design philosophy is not to burden the user with that level of complexity: not just the boxing that is required, but the use of a virtual table, and the need to locate the pointer for that table for value classes that do not provide an explicit instance, etc. That is a lot of baggage to ask the user to carry for a seemingly simple ToString() invocation!
This kind of dialogue is actually quite refreshing, and something that book-writing does not give rise to. Once again, let me thank Yves!
|
-
For the work I’ve been engaged in currently in machine translation of the original language design [thing1] to the revised design of the language [thing2], I have been variously making stabs at understanding the possible usages of a managed Value type [V] and pointer modifications of that type [V*, __box V*]. Artur Laksberg and Mahesh Hariharan have both provided much helpful feedback.
Here is the canonical trivial value type used in the thing1 language spec:
__value struct V { int i; };
__gc struct R { V vr; }
In V1, we can have four syntactic variants of a value type [where forms 2 and 3 are the same semantically]:
- V v = { 0 };
- V *pv = 0;
- V __gc *pvgc = 0; // Form (2) is an implicit form of (3)
- __box V* pvbx = 0; // must be local
Form (1) is the canonical value object, and it is reasonably well understood, except when someone attempts to invoke an inherited virtual method such as ToString(). For example,
v.ToString(); // error!
In order to invoke this method, the compiler must have access to the associated virtual table of the base class. Because value types are in-state storage without an associated vptr, this requires that v be boxed. In thing1, implicit boxing is not supported but must be explicitly specified by the programmer, as in
__box( v )->ToString(); // thing1: note the arrow
The primary motive behind this design was pedagogical: it wished to make the underlying mechanism visible to the programmer so that she would understand the `cost’ of not providing an instance within her value type. Were V to contain an instance of ToString, the implicit boxing would not be necessary.
In thing2 [yes, referring to the two languages in this way is annoying, isn’t it?], the implicit boxing is carried out transparently:
v.ToString(); // thing2
but at the cost of possibly encouraging the class designer to introduce an instance of ToString within V. The reason the implicit boxing is preferred is because while there is usually one class designer, there are an unlimited number of users, none of whom would have the freedom to modify V to eliminate the possibly onerous explicit box.
Another difference with a value type between thing1 and thing2 is the removal of support for a default constructor. [It has been explained to me that this is because there are instances in which the CLR can create an instance of the value type without invoking the associated default constructor. That is, the thing1 addition of support of a default constructor within a value type cannot be guaranteed. Given that absence of guarantee, it was felt to be better to drop the support altogether rather than have it be non-deterministic in its application.]
This is not as bad as it might seem because each object of a value type is zeroed out automatically, so that the members of a local instance are not undefined. This also meant that in thing1 a default constructor that simply zeroed out its members was being redundant. The problem is that a non-trivial default constructor in a thing1 program has no mechanical mapping to thing2. The code within the constructor will need to be migrated into a named init function that would then be explicitly invoked by the user.
The declaration of a value type object within thing2 is otherwise unchanged. [Which means there is still no support for a destructor within a value type. When you couple that with the continued requirement that non-POD native classes be pointer members within the value type, this makes the use of a value type for wrapping non-POD native classes virtually useless.]
Forms (2) and (3) can address nearly anything in this world or the next [that is, anything managed or native]. So, for example, all the following are permitted in thing1:
R* r;
pv = &v; // address a value type on the stack
pv = __nogc new V; // address a value type on native heap
pv = pvgc; // we are not sure what this addresses
pv = pvbx; // address a boxed value type on managed heap
pv = &r->vr; // an interior pointer to value type within a
// reference type on the managed heap
So, a V* can address a location within an activation record [and therefore can be dangling] or global data segment, within the native heap [and therefore can be undefined], within the managed heap [and therefore will be tracked if it should be relocated by the gc], and within the interior of a reference type object on the managed heap [again, requires tracking].
Forms (2) and (3) map into interior_ptr<V>, although the revised language supports both interior_ptr<V> and V*. The primary behavior difference is that the interior_ptr is a tracking pointer; that is, if the object addressed is on the managed heap and that object is relocated by the gc, the interior_ptr is updated with its new address. A V* is restricted to only address non-managed heap memory. It would be an error to attempt to assign a V* the address, for example, of &r->vr, or the address of pvbx [that is, __box V*]. An interior_ptr requires a nullptr to indicate a pointer to no object; a V* would require a 0. For example,
V *pv = 0; // may not address within managed heap
interior_ptr<V> pvgc = nullptr;
Form (3) is a tracking handle. It addresses the whole object that has been boxed within the managed heap [remember that boxing copies the value type into a reference type of the value]. It is translated in the revised language into a V^:
V^ pvbx = nullptr; // __box V* pvbx = 0;
The following declarations in the original language design all map to interior_ptrs in the revised language design being value types within the System namespace,
Int32 *pi; -> interior_ptr<Int32> pi;
Boolean *pb; -> interior_ptr<Boolean> pb;
E *pe; -> interior_ptr<E> pe; // Enumeration
The built-in types are not considered managed types, although they do serve as aliases to the types within the System namespace. Thus the following mappings hold true between thing1 and thing2:
int * pi; -> int * pi;
int __gc * pi -> interior_ptr< int > pi;
So, when translating a V* in your existing thing1 program, the most conservative strategy is to always turn it to an interior_ptr<V>. This is how it was treated under the original language. In the revised language, the programmer has the option of restricting a value type to non-managed heap addresses by specifying V* rather than interior_ptr<V>. If, on translating your program, you can do a transitive closure of all its uses and be sure that no assigned address is within the managed heap, then leaving it as V* is fine. All V __gc * should, of course, go to interior_ptr<V>.
|
-
Program efficiency at the programmer level is something of a complicated issue – in part because it is contextual. That is, it is hard to say something that holds true in all cases. [This is why having a good profiler is essential.] For example, the same implementation may be adequate, if embarrassing (should anyone actually look), under some circumstances and the cause of a crippling bottleneck under others. Consider the following constructor definition:
foobar::foobar( const bar b, const foo f )
{
m_b = b;
m_f = f;
}
This is a terribly naïve implementation:
· The class parameters b and f are passed by value rather than by reference. This results in unnecessary temporaries being created and destructed.
· The member class objects m_b and m_f are assigned to rather than initialized, resulting in their unnecessary default construction.
· The constructor itself is a good candidate for inlining.
· (And yes, the naming conventions leave something to be desired J)
An absolutely better implementation is the following:
inline foobar::
foobar( const bar &b, const foo &f )
: m_b( b ), m_f( f )
{}
but … well, does it matter in terms of the project? For argument’s sake, let’s say a summer intern wrote this.The bottom line: It runs correctly and there is no security breach. Let’s fantasize and say the project is under severe schedule pressure. So, on the one hand, we just feel an immense relief that the foobar component is functional and checked in. (Hey, it demos!)
If bar and foo are large, complex classes with costly copy semantics, and if there are lots of foobar objects generated in our application, then this is likely to be a significant performance gain. On the other hand, if foobar class objects are only rarely created, then the improvements to the code are not significant, and the code is no big deal. That is, the code is better but, overall, nothing has really changed. At what point do we stop making code better?
So, my point is: making rules in itself isn’t enough. Knowing where to rigorously apply those rules matters as well. As conventional wisdom has it: profile!
Sometimes, there is no real software solution, and anything coded at the application level is mostly just trying not to get in the way. For example, I was technical lead on the ToonShooter project at DreamWorks Feature Animation. It intended to replace a very old Objective-C Macintosh-based [at least 4 years old!] pencil-test playback system for animators with a spanking brand new C++ implementation under Linux. The goal was two-fold: (1) to deliver an actual system either for Spirit [already in production] or, if that proved unfeasible, for Sinbad, at that point in artistic development, and (2) to componentized the class hierarchies for subsequent reuse in other applications – playback, compositing, file-format, math and geometry, etc.
At the heart of the system is a playback engine that must display the images at a minimum rate of 24 frames per second. If it can’t do that, then there is no system. The actual display of the images was done using an OpenGL command, glDrawPixels. Under SGI, it just hummed. Our initial test under Linux with a promised high-performance X Server and OpenGL implementation was a mind-boggling 2 seconds per frame. Turns out the OpenGL implementation was not making use of the underlying hardware, but doing hardware emulation in software.That’s like trying to play point guard without being able to dribble with your left hand.
This was before we even wrote a line of code. Although our project was to be implemented in C++, its success ultimately rested on finding direct hardware support of image blitting [that was not glamorous enough to get proper attention back then]. Until that was solved, our application couldn’t actually be deployed – at least not under Linux. Solving the display issue was a management issue. The point was, it blindsided everyone; we had jumped before confirming that the pool contained water.
[Having access to the Linux OS source code was a mixed bag. When Andrew discovered a flaw in the clock in his work on synchronization, no one seemed ready to delve in and try their hand at fixing it. On the other hand, Andrew studied the sound drivers and found a number of bugs, so the source code in this case did benefit from his superior expertise, and we were able to quickly overcome a faulty implementation. We originally hoped for a digital image solution for bringing the artists’ drawing to the computer, but there was no implementation of a sufficiently fast protocol [that is probably the wrong word [this was my first task as a technical lead in which many facets of the project were beyond my technical competence, and I owe Ed Leonard a great deal of gratitude for his overcoming my reluctance to take on such a position] and we could not get any commitment as to when I think it is called firewire would come to the OS. Since no one in-house was in a position to do the implementation itself, we ended up with a analog video solution.]
[HP rode in on a white horse, as it happened, up ending various other vendors challenged to come in and provide a solution [unfortunately, Microsoft was not an active player back then]. It felt truly heroic at the time, and HP have justly flourished within that domain space since. We delivered a componentized system in time for it to be successfully used by Spirit, saving them millions of dollars in animators’ time. It was probably my most satisfying experience as a developer. Unfortunately, the movie did not do well at the box office, and the company has since stopped doing traditional 2D animation so I guess the ToonShooter’s shelf life is over.]
Compositing proved a fertile ground for optimization, but a somewhat humbling one, as it turned out. In ToonShooter, once the pencil drawings are captured and the sound file imported, the animator plays back the scene, checking image flow and correct lip-synching. Levels are turned on and off as the different elements are added to or removed from viewing. Compositing is done on the fly as it delivers frames to the playback engine.
Even when we got the playback at speed, we had to insure that the compositing was able to deliver the image. Obviously, in part the success of this depends on the number of levels requiring compositing. For example, one level worked without a hitch [J]. Actually, we supported – I think we tried up to 3 levels without dropping frames in the prototype – hey, that’s as far as we pushed it! [We only had to demo it at that point!]. But we knew this was something critical we need to get a handle on before it became an issue. [Some animators we talked to in effects were talking of 30-50 levels – I mean, we’re talking glints of sunshine the size of a thumbnail, or the march of wavy lines across water, but still. It would have fallen on my watch if we couldn’t handle it.
For a grey-scale rough animation pencil drawing captured at 8 bits, compositing can be reduced to choosing the minimum of two values. That is, for each pixel, which is an unsigned char, retain the lower value (black is 0, white is 255). However, for a 640x480 image, that’s still 3K * n-1 levels traversals comparing pixel values. 16 levels would probably be stretching the just in-time delivery of frames. We kept poking at the code to see how we might further tweak it.
What I had forgotten is that sometimes the answer to optimizing code is finding a way to actually not execute it.
Part of the problem was my unfamiliarity with the domain. When our current head of technology looked at the code, he had an obvious answer: Rather than doing the arithmetic, do an index into a table of the 255 values. At first I didn’t get it, because I didn’t realize there is a fixed set of constant values that each pixel could represent. Once I got it, the solution was obvious. (It always is. I’m just not used to being the one with my mouth hanging open.) In fact, it’s a canonical solution anecdotally related in Jon Bentley’s excellent text, Writing Efficient Programs, published by Prentice-Hall.
There is actually a trick to doing this in C++. What you don’t want to do is build up the table in a constructor – even an inline constructor. Why? Well, a constructor is a run-time entity. The look-up table has to be independent of any class object, and so it is made a class static member. This is, under any kind of constructor scheme, the table gets `statically initialized’ prior to the beginning of main(). So even if you hard-coded each value, setting up the table is a run-time activity. It is significantly more efficient to code a program to write out the table as a built-in array of constant expressions into a program text file and then compile that as part of your build.
// Fill the min table
int min_lut[size][size];
for ( int i = 0; i < size; i++ ) {
for ( int j = i; j < size; j++ )
min_lut[i][j] = min_lut[j][i] = i;
}
// Write out the tables
ofstream text_file( "Compositor_LUT.cc" );
text_file << "// This file is created by …";
text_file << "#include <Compositor.h>\n\n";
// this is a static member of Compositor ...
text_file << "unsigned char Compositor::minLUT[256][256] = {\n" );
for ( int i = 0; i < size; i++ ) {
text_file << "{ " ;
for ( int j = 0; j < size; j++ ) {
text_file << min_lut[i][j] << ", " ;
// etc.
So. We can speed up/optimize image compositing up to a point. After that point, it levels out. Once we reach that point, the only workable solution is to avoid doing any compositing at all – or at least reduce it to a subset of the required compositing. [Actually, an even better solution is to move the operations to hardware with cool and nifty video cards, etc.]
The first step in achieving this is to cache composited images. Before we invoke the compositor, we check that the resulting image is not already available in the cache. Failing an exact match, we look for a best match – perhaps 2 or 3 of the 4 required levels have already been composited, and so we only composite the missing level(s). To achieve this in a reasonable fashion, we need to be able to efficiently tag the different composited images (by frame by active levels).
For this caching strategy to be effective, of course, the mechanism must be (considerably) cheaper than the process of compositing itself. The bitset and map classes make image tagging and retrieval reasonably simple. [For some reason, the gcc Linux compiler at that time didn’t support bitsets – apparently there was some contention about whether they warranted inclusion in the ISO standard under development. Back then, you could not use Linux for serious development without using gcc so I ended up having to incorporate the free SGI implementation of bitsets. ]
When an image is about to be composited, a bit is turned on for each level that is active. That value can then be turned into an unsigned long, which can be used as a unique key in a map:
map<unsigned long, Image*>;
int main()
{
bitset< 16 > mbit;
CManager pcomp;
Image *pim = 10000;
unsigned long search_key;
for ( int frame = 1; frame < 12; ++frame ){
mbit.reset();
for ( int ix = 0; ix < 16; ++ix ) {
if ( ix % 2 ) {
mbit.set( ix );
unsigned long key = mbit.to_ulong();
pcomp.add_to_cache( frame, key, pim );
}
}
}
// …
}
CManager::add_to_cache() frame: 11 key: 2 addr: 0x2710
CManager::add_to_cache() frame: 11 key: 10 addr: 0x2710
CManager::add_to_cache() frame: 11 key: 42 addr: 0x2710
CManager::add_to_cache() frame: 11 key: 170 addr: 0x2710
CManager::add_to_cache() frame: 11 key: 682 addr: 0x2710
CManager::add_to_cache() frame: 11 key: 2730 addr: 0x2710
CManager::add_to_cache() frame: 11 key: 10922 addr: 0x2710
CManager::add_to_cache() frame: 11 key: 43690 addr: 0x2710
frame # 11
0000000000000010 :: 0x2710
0000000000001010 :: 0x2710
0000000000101010 :: 0x2710
0000000010101010 :: 0x2710
0000001010101010 :: 0x2710
0000101010101010 :: 0x2710
0010101010101010 :: 0x2710
1010101010101010 :: 0x2710
I never actually deployed this. I had simply been asked to come up with a possible strategy for speeding up compositing on a software level just in case [I was moving on after the initial deployment of the system]. As it happened, moving the compositing to hardware was the optimal solution.
The two most effective optimization techniques, I suspect, are procrastination and cheating. For example, Martin Davis, in his text The Universal Computer, mentions a Pascal [!] program to compute Leibniz’s series for pi/4. Running on a 486 33MHz PC, summing 1 million terms took 50 seconds, 10 million took 8 minutes. Two years later, the same program was run on a Pentium 200 MHz machine, and “the times were reduced to 4 seconds and 40 seconds respectively.”
A compelling example of cheating is the original Doom game engine. Rather than calculate the hallway lighting for various hallways in real time, they simply pre-generated bitmaps. Level of detail is always a wildly-used cheat: things that are beyond a certain level of nearness do not need to be rendered with the same level of detail. My favorite example is detecting when one object intercepts with another [I can’t thing of the formal description of it off hand]. In a game like Mario Kart, there is none – the key element of the game is speed, and the audience [mostly children] find the occlusion [if that is the right word] of a truck or tree with Mario’s kart quite amusing, so why bother eating up cycles preventing it? During the making of the Hunchback of Notre Dame, Kiran Joshi developed crowd software that represented different individuals in cycles of movement in order to fill up the streets of Paris. Edge detection was necessary to the degree that having the hand of one entity piercing the face or body of a neighbor would spoil the illusion of reality necessary for the cartoon. This was managed by simple bounding boxes that marked a coarse-rectangle separating one character from another. It required considerably more work than did Mario Kart, but was not a significant amount of work overall. Not so in the case of the lovely Fantasia Segment, Toy Soldier, in which Dave Tonnesen’s work with simulating the physics of the movement of cloth was coupled with the separately generated movement of the ballerina. The required verisimilitude was labor intensive, but anything short of that would have been insufficient. Without knowing up front what the performance characteristics of a piece of software are required to be, it is very difficult to quantify that you have met them, and to know where to allocate precious programming resources.
|
-
The declaration of a managed array object in the original language design was a slightly non-intuitive extension of the standard array declaration in which a __gc keyword was placed between the name of the array object and its possibly comma-filled dimension, as in the following pair of examples,
using namespace System;
void PrintValues( Object* myArr __gc[]);
void PrintValues( int myArr __gc[,,]);
This has been simplified in the revised language design, in which we use a template-like declaration to mirror the STL vector declaration. The first parameter indicates the element type. The second parameter specifies the array dimension [defaults to 1, of course]. The array object itself is a reference type and so must be given a hat. If the element type is also a reference type, then that, too, must be so marked. For example, the above example, when expressed in C++/CLI, looks as follows:
using namespace System;
void PrintValues( array<Object^>^myArr );
void PrintValues( array<int, 3>^myArr );
Because a reference type is a tracking handle rather than an object, it is possible to specify it as the return type of a function. The syntax for doing this was criticized in the original language design again as being somewhat non-intuitive. For example,
using namespace System;
Int32 f() [];
int GetArray() __gc[];
int GetArray() __gc[]
{
int a1 __gc[];
return a1;
}
This has also benefit from the revised declarative language syntax. The example, above, when translated into C++/CLI, looks as follows:
using namespace stdcli::language;
using namespace System;
array<Int32>^ f();
array<int>^ GetArray();
array<int>^ GetArray()
{
array<int>^a1;
return a1;
}
The actual definition of the array type is contained within the stdcli::language namespace, although the final identifier for this namespace, and whether it will be autogenerated by the compiler [this was auto-generated by a translation tool] or require user specification is currently under discussion in the ECMA standardization process.
Of course, the definition of GetArray is a dismal failure since it is returning a null tracking handle! One of the pitfalls of using the dynamic programming paradigm is adjusting to the CLR reference type semantics which, from an ISO C++ point of view, are topsy-turvy. The correct implementation is to return the object by gcnew, and not to fret over who owns the handle – the CLR garbage collector owns it. [The reason the revised language provides a deterministic finalization mechanism through a steroid destructor mechanism is not for the automation of memory management – Give GC a Change, as the late John Lennon once wrote, if memory serves me – but to handle the automatic reclamation of other system resources that are not recognized by the underlying CLR.]
The explicit initialization of a managed array supported within the original language is maintained in the revised syntax. For example, the following two initialized declarations,
int myIntArray __gc[] = { 1, 2, 3, 4, 5 };
Object* myObjArray __gc[] =
{ __box(26), __box(27), __box(28), __box(29), __box(30)};
looks as follows in C++/CLI:
using namespace System;
array<int>^myIntArray = {1,2,3,4,5};
array<Object^>^myObjArray = {26,27,28,29,30};
The allocation of an array object on the heap, in the original language design, looks as follows:
using namespace System;
Object* myArray[] = new Object*[2];
String* myMat[,] = new String*[4,4];
In C++/CLI, the new expression is replaced with gcnew, and the dimension sizes are passed as parameters to the gcnew expression, as follows:
using namespace System;
array<Object^>^myArray = gcnew array<Object^>(2);
array<String^,2>^myMat = gcnew array<String^,2>(4,4);
I believe it is permitted in the new language to provide an explicit initialization list following the gcnew expression [but I am not going to unfathomably confirm that – this is, after all, a blog, and new language features generally fall outside my charter – see Herb Sutter or Brandon Bray’s blogs for details].
In the original language design, there was no explicit support for the param array that C# supports. Instead, one flags an ordinary array with an attribute, as follows:
void Trace1( String* format, [ParamArray]Object* args[] );
void Trace2( String* format, Object* args[] );
While these both look the same, the ParamArray attribute tags this for C# or other .NET languages as an array taking a variable number of elements with each invocation. In C++/CLI, the design for directly supporting this looked as follows:
void Trace1( String^ format, ... array<Object^>^ args );
void Trace2( String^ format, array<Object^>^ args );
in which the ellipsis […] preceding the array declaration tags it as a param array. Unfortunately, this has recently failed to make the list of features to be implemented for the inaugural release of C++/CLI. However, I can’t bear to [at least as yet] remove it from the translation tool.
|
-
A reader, Yves Dolce, writes:
Related to your last blog entry: The Astonishing S"Literal" String Type
I was surprised that sometimes, boxing is conceptually implicit...
That any Object method can be called directly without any explicit boxing.
And of course, explicit boxing is valid but seems less efficient as a real object ends up being created on the managed heap instead of just taking the address of the double (ldloca.s) ...
Console::WriteLine( S"2 cubed is {0}", result.ToString() ) ;
Console::WriteLine( S"2 cubed is {0}", __box(result) ) ;
gives
IL_0008: ldstr "2 cubed is {0}"
IL_000d: ldloca.s result
IL_000f: call instance string [mscorlib]System.Double::ToString()
IL_0014: call void [mscorlib]System.Console::WriteLine(string, object)
IL_002a: ldstr "2 cubed is {0}"
IL_002f: ldloc.0
IL_0030: box [mscorlib]System.Double
IL_0035: call void [mscorlib]System.Console::WriteLine(string, object)
Yes, the need to box occurs only when the source of the assignment to an Object^ is a value type. A Value type, recall, maintains its state within each associated object [what in C++ is created when we write T t]. A Reference type is a duple in which the named instance is a handle holding either null or the address of an unnamed object allocated on the managed heap. When we initialize the Object^ second parameter of Console::WriteLine with a Value type, there is a hiccup in the unified type system because there is no way to represent the Value type directly within the handle of a Reference type. The solution is to create a shadowed Reference to the Value type on the managed heap and pass in the address of that object. By invoking the ToString() method associated with the Value type, the need for boxing is elided because a String^ is a sealed Reference type.
I personally never bother to write ival.ToString() rather than ival when passing a value type to Console::WriteLine because generally speaking the overhead associated with a write operation is going to overwhelm the cost of the boxing itself, and any print operation is not going to be in a hot spot within an application. In general, why bother optimizing something for which the performance gain is negligible – particularly if it obfuscates the code, however slightly? [I know, one wants a steel discipline …]
In any case, this brings me to a second usage of the __box keyword that can provide performance gain by providing a direct handle to the boxed value type allocated within the managed heap. For example,
int main()
{
double result = 3.14159;
__box double * br = __box( result );
result = 2.7;
*br = 2.17;
Object * o = br;
Console::WriteLine( S"result :: {0}", result.ToString() ) ;
Console::WriteLine( S"result :: {0}", __box(result) ) ;
Console::WriteLine( S"result :: {0}", br );
}
Passing the boxed value type directly to Console::WriteLine eliminates both the boxing and the need to invoke ToString. [Of course, there is the earlier boxing to result to initialize br, so of course we don’t really gain anything unless we really put br to work.]
IL_0052: ldstr "result :: {0}"
IL_0057: ldloc.0
IL_0058: call void [mscorlib]System.Console::WriteLine(string, object)
In the revised language, the support for boxed value types is, in my opinion, considerably more elegant and integrated within the type system. Here is a translation of the function main() into C++/CLI [it’s machine generated; that’s why the spacing is different]:
using namespace stdcli::language;
int main()
{
double result = 3.14159;
double^ br = result;
result = 2.7;
*br = 2.17;
Object^ o = br;
Console::WriteLine( S"result :: {0}", result.ToString() );
Console::WriteLine( S"result :: {0}", result );
Console::WriteLine( S"result :: {0}", br );
}
I’ll talk about boxed value types some more in a future entry, or answer any questions. …
|
-
indranil banerjee writes
re: Virtual Function Behavior Between ISO C++ and C++/CLI Revisited
Thanks for the explanation. Rereading your earlier post and examples, it is quite clear that runtime resolution of virtuals is the same in both CLR and ISO C++.
However, now that I've understood the difference in virtual function resolution during object construction. I think will be even more hesitant to port ISO C++ code to C++/CLI. This is one more difference between the languages to be wary of. Is there anywhere on MSDN where a list of such difference will be maintained?
If I were in charge of a software project in which is the issue of porting ISO C++ code to C++/CLI came up [that is, publishing the type information within the CLR metadata for consumption across assemblies and languages], I would probably do the port incrementally, as follows:
- I would provide a C++/CLI wrapper to the ISO/C++ code, and would basically do the simplest thing possible: provide a one-to-one mapping of the public interface, in which each managed method is a simple stub to the equivalent native method. This would not be performant, but it would get me up and over the initial barrier. I would not manually do this, either, but invest my time in a tool that would auto-generate a mapping.
- Once this was available, I would begin to cache and synthesize methods such that crossing over between the managed/unmanaged barrier would be minimized. I would keep the original mapping so that the additional work is non-invasive, and then I would incrementally move certain critical pieces of the managed part from the initial mapping to the synthesized/cached interface. This would permit performance measure and tuning. I imagine this as being analogous to minimizing the communication over a data-base connection, doing as much as possible in memory, etc.
- Finally, once I had a handle on the hot spots within the C++/CLI wrapper, I would begin judiciously porting those portions of it that actually seem critical, and on moving this or that aspect of an application, be sure to profile the performance and be clear what has been gained and lost, etc.
This in part mirrors the early days of C++ usage with an existing C body of code, particularly when back in those days the performance overhead of C++ was a real concern to C programmers and their management.
As to your question about, is there anywhere on MSDN where such a list of differences will be maintained, the answer has to be the infamous yes and no. Yes, there should be a list maintained, ideally with examples that are not too trivial, and suggestions where possible for `best practices’. No, that will probably not be on MSDN but on a Visual C++/CLI web site that we are putting into place even as I write. And no, I don’t believe a list currently exists.
Finally, I believe virtual constructor calls within a constructor represents the 20% of the 80-20 pattern of usage, and so is not a real show-stopper. People that leverage this sort of hierarchical initialization and destruction tend to be the sophisticated user, I would imagine, and they would need to develop an alternative pattern under the Common Language Runtime Object Model.
Speaking of the 80-20 model, garbage collection resolves the 80% usage model of reclaiming heap memory automatically. What it fails to address is the 20% usage model of reclaiming other program resources such as a mutex or perhaps database connection. [This is the infamous non-deterministic finalization problem.] In the revised language, C++/CLI provides a deterministic finalization model through a destructor extension to the CLR Object Model. While this is a valuable mechanism for that 20%, I worry that C++ programmers, having internalized manually solving memory management through constructor/destructor pairs, will misuse this mechanism to perpetuate their non-managed habits and never give GC a real chance. This is one of the reasons I am not as much a champion of reproducing the native program within C++/CLI as others on the team.
Johan ericsson writes:
re: The Astonishing S”Literal” String Type
Not using the S is much better! I wonder, can we do the same for standard C++, ie get rid of the L. It could often be infered from the context.
Ie: the L shouldn't really be necessary in the following statement:
const wchar_t * str = L”hi”;
Why not?
The S was introduced into the original language design by analogy with the L [or so I presume]. Independent of the performance hit, the design is not bad. Another presumption I am going to make is that the original motivation of the L in C was to distinguish the size difference between literals, and of course that doesn’t transfer to the S type. The difference between a native and managed literal string is that the managed literal string is immutable. There is, of course, the reverse astonishment under the CLR of string manipulation generating potentially large numbers of string temporaries if we attempt to rewrite various characters of the string, such as turning “mississippi" character by character until “MISSISSIPPI” rather than using StringBuilder.
So, why not change the need to have the L as well as the S? The short answer is because wide characters are not a part of C++/CLI but are a part of ISO C/C++, and that crosses over a boundary that we should not go [in my judgment] . If we introduce a set of differences in the shared common language, then (a) we run the risk of alienating those people involved in the standardization of that common language, and (b) we make it extremely difficult to identify a common subset that can be moved between the two languages. [I’m giving you this not to be persuasive but to give a sense of the arguments. You may not find this presentation convincing, but the main points should hold in a debate.]
Again, in my judgment [and Herb Sutter could address this more fully and with more authority], it is imperative that we view C++/CLI as an integration of the dynamic programming paradigm onto the existing ISO-C++ language rather than see it as an opportunity to create P [as in BCPL], a new language separate from C/C++. So, that’s why – not because you might not be correct in your assessment, but it is not within the vision of our charter.
|
-
One of the astonishing infelicities of the original language design was the unflagged overhead of the seemingly trivial failing of placing an S in front of a string literal targeted to a managed reference object. For example, given the following two System::String declarations,
String *ps1 = "hello";
String *ps2 = S"goodbye";
here is the MSIL representation as seen through ildasm of the following two String declarations. Notice the astonishing performance difference.
// String *ps1 = "hello";
ldsflda valuetype $ArrayType$0xd61117dd
modopt([Microsoft.VisualC]Microsoft.VisualC.IsConstModifier)
'?A0xbdde7aca.unnamed-global-0'
newobj instance void [mscorlib]System.String::.ctor(int8*)
stloc.0
// String *ps2 = S"goodbye";
ldstr "goodbye"
stloc.0
That’s a pretty remarkable savings for just remembering [or learning] to prefix a literal string with an S; or, to look at it another way, that’s a durn stern penalty for not doing so. [In addition, if S”goodbye” occurs 5 times, they are collapsed into a single shared instance.] And ignorance is not a mitigating defense! Using the default Visual Studio settings for a project, this compiles without any warning, as the following illustrates:
nettest - 0 error(s), 0 warning(s)
What’s perhaps equally remarkable is that in another common corner of the language, implicit value type boxing was explicitly not supported because it was felt that it would result in a false sense of security for the programmer who would not realize its run-time overhead. For example,
int ival;
Object *po = ival; // error
Object *po = __box( ival ); // ok
Of course, these two design corners are not really at all the same – in fact, they seem to illustrate opposite design philosophies. In the one case, a trivial detail that is context sensitive silently causes a truly astonishing inflation of the run-time program. In the other case, there is no underlying gain or loss in the behavior of the program by having the explicit __box operator – only in the behavior of the programmer. It is a pedagogical design intended to teach the programmer about the nature of the CLR’s unified type system.
The solution in both cases is to make the behavior transparent. A reference type assigned with or initialized to a value type results in a boxing operation. This is as fundamental to the unified type system of the CLR as the copy constructor and copy assignment operator are to native C++. Ignore them at your peril. If you assign a literal string in a context where an S should be, the S is implicitly present.
What about cases in which we need to explicitly direct the compiler to one interpretation or another, as in the case of an overloaded pair of functions?
void f(char*);
void f(String^);
f("ABC"); // calls f(char*)
The decision of the language design team is to drop the S and rather require the user to explicitly cast the literal string, as in
f(( String^ )"ABC");
|
-
I seem to have rather badly explained the `virtual function invocation within a constructor issue,’ if one is to judge by the following question posted as a follow-up to my blog entry, for which I apologize:
Sender: Indranil Banejree
=====================================
re: Making a Virtual Table Context-Sensitive
Do you mean that that virtual calls are not context sensitive at all with Managed C++? How about C++/CLI?
If not, the feature will be sorely missed. One of my favourite GoF design patterns, Template Method directly makes use of this feature. Where a non virtual base class method calls a bunch of virtual methods.
I've written plenty of native C++ and Java that works like this. I'd hate for this to break in .NET. I'll check my colleagues copy of C# Design Patterns to see how Template Method is handled there.
Conveying something never seems to be as clear cut as I delude myself into believing. Before I actually attempt to clarify Indranil’s question, let me provide some context.
An object model has two faces, one that is presented to the user of a program language, and one that implements that model on the target platform. In the original implementation of C++ by Stroustrup, the target platform was the C language which of course provides no direct support for (a) type encapsulation [a struct does not maintain scope or permission sections], (b) interface specification [C is a data abstraction language in which function and state are separate], or (c) inheritance. The following simple class hierarchy, which does not support polymorphism [it has been variously called implementation or value inheritance]
class Point2d {
public:
Point2d( float x = 0.f, float y = 0.f );
float x() const { return m_x; }
void x( float new_x ) { m_x = new_x; }
// …
private:
float m_x;
float m_y;
};
class Point3d : public Point2d {
public:
Point3d( float x = 0.f, float y = 0.f, float z = 0.f );
// …
private:
float m_z;
};
has no direct mapping onto the C language, and could be translated in any number of ways. For example, one could choose a very simple object model in which each member of the class is assigned a slot. For Point2d, slot0 is assigned to the constructor, slot1 to the read function of the x coordinate member, slot2 to the write function of the x coordinate member, slotn-2 to the static member ms_cnt, slotn-1 to the state member m_x, and slotn to the state member m_y. Internally, the slots representing methods would hold the addresses while the slots representing state members could hold the actual values. This would be relatively simple to implement and would be most appropriate as a proof of concept rather than as a production model.
An alternative model might keep the state members within the class object, but factor out the methods in a method table, maintaining a slot within the object addressing that table. One could imagine variations to that – add a member table as well. With each additional level of indirection, one gains a further flexibility in terms of either substituting or extending the method or state table of the class.
The actual C++ object model chose to maintain the space and time efficiency of the C target platform: the implicit this pointer and an internal name-mangling identify a class method from an independent function; otherwise, there is no binding between a class object and the methods of that class. The non-static state members are stored by value within each class object, and so on.
In certain cases, such as pointers to class members or the virtual mechanisms of inheritance or runtime method invocation, there is no one-to-one mapping with the C target constructs, and so full-blown auxiliary abstractions are necessary, and these bring with them a space and time cost that hopefully is offset by the additional functionality. [The ideal in C++ has been that a programmer should not pay the cost of a facility unless the facility is used.]
This is an distinction worth emphasizing because too often people confuse the CLR platform with a constraint on the Object Model possible for a .NET language, and this is not true – or at least not true in general. For example, the CLR does not support value inheritance, and so internally a .NET language that chose to support value inheritance would have to translate that into a form that could be represented within the CLR. The same is true for multiple inheritance. It is not true to say that a .NET language cannot support multiple inheritance; only that the support of multiple inheritance requires that the programmer face of the object model be translated into a form able to be represented by the underlying CLR platform. Eiffel.NET, for example, has done just that.
On the other hand, there are some hard constraints that a language cannot reasonably get around. One such constraint is the different resolution algorithm of a virtual function invoked within a constructor and destructor. [One could imagine synthesizing numbers of special sub-object constructors for a class to simulate the ISO C++ behavior, but this would put it at considerable semantic odds with the rest of the .NET object behavior and could result in serious runtime faults.] In these cases, the C++ programmer has to concede that she is working in a different semantic model that requires learning new habits. C++ did that to the C programmer with the elimination of tentative global definitions.
The different resolution algorithm between ISO C++ and the CLR Object Models is uncompromising and there was no amelioration provided within either the original or revised C++/CLI language design. The two-part blog first described what the difference is, illustrating it with a simple example, and then peeked under the covers to show the work necessary in ISO C++ to provide its resolution semantics. This is a singular case because it has to do with object identity; that is, when is an object of a class an actual instance of that type? In the ISO C++ Object Model, the object is not an actual instance of that type until the execution of the explicit code of its constructor.
Apart from that, the virtual mechanism works the same. The order of constructors [and destructors] is the same. The run-time resolution of a method based on the actual type of the object referred to at each call point remains the same. The Template Method of the Gang of Four [GOF] Patterns book still works the same. While there be dragons, this is not one of their places of inhabitation.
|
|
|
|