Managed C++ (MC++) code generation is a cool accomplishment. I think it's another good testimony to the CLR's cross-language charter (and IL's flexibility) that we can support C++. (Not too mention our support for more dynamic languages like F#, SML.Net, and IronPython that also compile to IL)As the most basic example, I find it interesting to compare the IL generated for a simple program that allocates a buffer and does some manipulation when written in MC++ and C#.
#include "stdafx.h" using namespace System; int main() { Console::WriteLine(L"Multiplication table"); int size = 10; int * table= new int [size]; for(int i = 0; i < size; i ++) { table[i] = i * i; // unsafe pointer math! } delete [] table; // explicit free memory return 0; }
The thing I find that causes a double-take for the most people is how MC++ can have explicit memory management (eg, "new" and "delete") and pointer arithmetic when it runs on the CLR; which does garbage collection (instead of explicit memory management) and safe references (instead of raw pointers). The programs do the same thing (initialize a table of squares). Here's a side-by-side comparison of the IL (with source inline) between the C# and MC++. I've made some annotations and shaded the rows corresponding to interesting differences.
//000017: return 0; IL_0043: ldc.i4.0 IL_0044: stloc.s V_4 //000018: } IL_0046: ldloc.s V_4 IL_0048: ret } // end of method 'Global Functions'::main
//000017: } //000018: } IL_002e: ret } // end of method Program::Main
For trivia points, I'll point out there are some minor cosmetic differences including:- they generate the for-loop differently. MC++ has the comparison first; C# has the comparison at the end. You can verify for yourself they're result in the same behavior.- C# places a sequence point on the opening '{', whereas MC++ doesn't. Practically, this means when you first step into 'main', in C# you stop at '{', where as in MC++, you stop at the first line after '{'.- MC++ needs to distinguish between managed and unmanaged strings. (update: MC++ 2005 will use context to decided, see section 5.1 in the translation guide.) Unmanaged strings are just a pointer (RVA) address to the raw string data (as they are in C++). In MC++, managed strings get prefixed with 'S', (like S"Hello"). Managed strings are loaded with the 'ldstr' opcode. Since all strings in C# are managed, C# doesn't need a special string prefix.There are also some differences that help MC++ interoperate better with native code (like using the Cdecl calling convention on Main and that .vtentry thing).For now, the interesting differences are around allocating 'table', doing the assignment "table[i] = i *i", and freeing the memory. I've shaded the rows corresponding to these actions.The declaration of the 'table' local:It turns out the languages compile the variable 'table' differently:- MC++ compiles it to a "int32 *", which the runtimes views as a raw pointer. This is essentially opaque data to the CLR and may as well be a pointer-sized int. (It corresponds to a CorElementType of ELEMENT_TYPE_PTR)- C# compiles it to "int32[]", which the runtimes views as a managed array. This is a managed reference owned by the CLR, and lives on the GC heap. (It corresponds to a CorElementType of ELEMENT_TYPE_SZARRAY).One significant consequence of this is debuggability. The native debugger can't do much with inspecting a int32* because it's just a raw pointer. It doesn't know it's actually pointing to an array. And it sure doesn't know how big that array is, so it couldn't display the bounds even if it wanted to. In contrast, an int32[] is far more descriptive to the debugger because managed arrays have rich bound information (see ICorDebugArrayValue in CorDebug.idl). Thus in the C# case, you can expand 'table' and see the full contents.Allocating the buffer:C# uses the "newarr" IL opcode to initial 'table'. Since 'table' is a managed array, and 'newarr' allocates a managed array, that should all make sense.In contrast, MC++ calls a method 'new' which will eventually pinvoke out to msvcr80*.dll!operator new(unsigned int), and that will return a raw pointer value to allocated memory, just as you'd expect in unmanaged C++. You can step into the call in VS and see for yourself that you land in msvcr80!new. It's the same story with the call to delete. This is all consistent with MC++'s 'table' being a raw pointer.Making assignments:In the C# case, we've got a strongly typed array, and stelem.* is an il opcode for explicitly assigning into an array. This opcode has the semantics to let the jit do array boundary checks. In the MC++ case, we're dealing with raw pointers. The stind.* IL opcode allows direct memory assignment. You can see the raw pointer manipulation, just as we'd have in unmanaged C++, for the pointer arithmetic It's still IL, but it's not verifiable. That means the code loses a lot of the protection, such as the explicit boundary checks from the stelem opcode. C# has an unsafe code regions which lets it do the same pointer operations that MC++ is doing here.
In conclusion:While this is perhaps a laughably basic example (especially in retrospect), these sorts of qualities for MC++ codegen are a preview of the sort of techniques MC++ uses to codegen more complex constructs (like multiple inheritance, which is not supported in C#).