This is a question that actually covers a lot of ground in the VM, so I’ll be discussing this in parts over my next few entries. I highly recommend reading the information located under this node in the online MSDN documentation:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpguide/html/cpconadvancedcominterop.asp
I also highly recommend Adam Nathan’s “.NET and COM: The Complete Interoperability Guide”. (ISBN# 067232170X)
We’ll start by looking at consuming COM objects from managed code in an early-bound fashion.
Early-bound means that we have knowledge about the object in question, so we’ll tell the CLR about our COM object at compile-time. Late-bound means that we’ll do everything programmatically at run-time and don’t need information about the object up front; this is obviously more time consuming as we need to perform much of the work at run-time that the early-bound case handles at compile-time. It’s not a great idea in perf-critical situations if you can help it.
Since the CLR’s type system is completely dependent on metadata (contents of the assembly including types, methods, fields, attributes, offsets, etc.), we’ll need to provide a metadata ‘view’ of our COM object so the compiler knows what to bind to. Fortunately, most COM Objects already provide type information in type libraries; so we simply import that existing type library into the CLR’s metadata format via the sdk tool TlbImp or VS’s “Add a Reference”. Alternatively, one might decide to author the metadata from scratch, though this is a tedious and error-prone process. Either way, the resulting assembly is called an Interop Assembly (IA).
Due to the lack of information for certain constructs in type libraries, there are cases where the information must be added by hand to the IA after it has been built. A common case I run into fairly often is the IDL attribute size_is which can define the size of an array with another parameter:
HRESULT Proc1(
[in] short m;
[in, size_is(m)] short a[]); // If m = 10, a[10]
Once compiled by the MIDL compiler into a type library, this looks like:
HRESULT Proc1(
[in] short m;
[in] short a[]);
When TlbImp is run over the type library to create an IA, it doesn’t know about the size_is attribute as it is not present in the type library. Therefore, this information would need to be added to the IA by hand. I’ll detail these cases further if I see some interest on the subject.
The IA contains managed definitions representing native types, so now managed languages can treat these types as any other managed types. The definitions themselves have special attributes that tell the CLR that these actually represent COM objects and should be treated specially at run-time, but this is all transparent at compile time.
As an example, importing the standard stdole32.tlb gives the following metadata (dumped via ILDasm) for IEnumVARIANT:
.class interface public abstract auto ansi import stdole.IEnumVARIANT
{ .custom instance void [mscorlib]System.Runtime.InteropServices.InterfaceTypeAttribute::.ctor(int16) =
( 01 00 01 00 00 00 )
.custom instance void [mscorlib]System.Runtime.InteropServices.GuidAttribute::.ctor(string) =
( 01 00 24 30 30 30 32 30 34 30 34 2D 30 30 30 30 2D 30 30 30 30 2D 43 30 30 30 2D 30 30 30
30 30 30 30 30 30 30 34 36 00 00 ) // ..$00020404-0000-0000-C000-000000000046..
} // end of class stdole.IEnumVARIANT
The important attributes to note are the ComImport (shown as “import” above) and the addition of two custom attributes on the class, InterfaceType and Guid. ComImport tells the CLR that this type is actually defined in native code and was imported for use with a RCW. InterfaceType indicates what type of interface this represents (early-bound, late-bound or both). Guid is the native GUID of this interface.
I’ll use the following IL snippet as an example in the next section:
newobj instance void [COMSERVERLib]COMSERVERLib.FooClass::.ctor()
stloc.1
ldloc.1
stloc.2
ldloc.2
callvirt instance string [COMSERVERLib]COMSERVERLib.IFoo::GetFoo()
stloc.3
ldloc.2
castclass [SomeOtherIA]SomOtherIA.IBar
At run-time, the COM object is typically instantiated by calling newobj on the type contained in the IA (this corresponds to C#’s new operator and similarly named operators in other languages). Instead of going down the normal creation path for objects, however, the CLR recognizes the attributes on the type and instantiates the COM object. Once the COM object has been created, the CLR creates a place-holder object using the metadata in the IA (so that reflection has access to the list of static interfaces implemented by the object and described in the IA), and a backing construct called a Runtime Callable Wrapper (RCW). The RCW and the place-holder object are linked together and the caller is returned the place-holder object which appears to be a standard managed object.
newobj instance void [COMSERVERLib]COMSERVERLib.FooClass::.ctor()
This IL snippet shows creation of a COM object “FooClass” defined in IA “COMSERVERLib.dll”.
The RCW is the interesting structure here...it contains the pointer to the COM object, maintains a reference to the COM object (AddRefs once each time the object enters the CLR), a cache of specific interface pointers on the object, marshaling characteristics of the object, and other auxiliary information. It is interesting to note that the CLR maintains a one-to-one relationship between unique COM object pointers and RCWs in a given AppDomain.
Whenever a call/callvirt is made on the place-holder object, the call instruction is intercepted by the RCW and a call is made on the underlying COM object. Depending on how complex the arguments of the signature are, the 32-bit CLR will either create a compiled x86 stub to handle marshaling of the types or will delegate to a slower interpreted path that can handle the more complex types that the marshaler knows about. The 64-bit CLR employs a different mechanism called IL Stubs; here we create an IL stream to handle the marshaling and let the JIT compile it appropriately.
callvirt instance string [COMSERVERLib]COMSERVERLib.IFoo::GetFoo()
This IL snippet shows a call on the COM object instantiated above. While it appears identical to a callvirt to a managed interface, the CLR will intercept and make the call on the underlying COM object.
Casting on RCW-based objects is handled differently from normal mananged objects. For interface casting, the CLR first checks the metadata for a ‘static’ interface match…that is, that the interface being cast to is already listed in the metadata from the IA. If this fails we fall back on the COM notion of casting – calling QueryInterface. We’ll determine the GUID of the interface being cast to and then call QI on the underlying pUnk with that GUID – if it succeeds, then we cache the result and the cast is successful. This is interesting as we could have no interfaces listed in the metadata (which means typeof(MyComClass).GetInterfaces() returns no interfaces), but still have successful casts.
castclass [SomeOtherIA]SomeOtherIA.IBar
This IL snippet shows a cast on the RCW-based object instantiated above to an interface defined in a different assembly. If the COMSERVERLib IA lists SomeOtherIA.IBar in FooClass’s list of implemented interfaces, then this cast will look at the associated metadata and succeed immediately. If it does not list IBar, then the underlying COM object will be QI’d for the IBar interface.