Larry Osterman's WebLog

Confessions of an Old Fogey
Blog - Title

So what exactly IS COM anyway?

So what exactly IS COM anyway?

  • Comments 29

A couple of days ago, David Candy asked (in a comment on a previous COM related post) what exactly was COM.

Mike Dimmick gave an excellent answer to the question, and I'd like to riff on his answer a bit.

COM is just one of three associated technologies: RPC, COM and OLE (really OLE Automation).

Taken in turn:

RPC, or Remote Proceedure Call, is actually the first of the "Cairo" features to debut in Windows (what, you didn't know that there were parts of Cairo already in Windows?  Yup, actually, almost all of what was called "Cairo" is currently in windows).

RPC provides a set of services to enable inter-procedure and inter-machine procedure calls.  The RPC technology is actually an implementation of the DCE RPC specification (the DCE APIs are renamed to be more windows-like), and is on-the-wire interoperable with 3rd party DCE implementations.  RPC deals with two types of entities, client's and servers.  The client makes requests, and the server responds to those requests.  You tell RPC about the semantics of the procedures you're calling with an IDL file (IDL stands for "Interface Definition Language" - It defines the interface between client and server).  IDL files are turned into C files by MIDL, the "Microsoft IDL compiler".

When RPC needs to make a call from one process to another, it "marshalls" the parameters to the function call.  Marshalling is essentially the process of flattening the data structures (using the information in the IDL file), copying the data to the destination and then unpacking the flattened data into a format that the receiver can use.

RPC provides an extraordinarily rich set of services - it's essentially trivial to write an application that says "I want to talk to someone on my local network segment who's providing this service, but I don't care who they are - find out who's offering this service and let me talk to them" and RPC will do the hard work.

The next technology, COM, is built on RPC.  COM stands for "Component Object Model".  COM is many, many, things - it's a design pattern, it's a mechanism to hide implementation of functionality, it's an inter-process communication mechanism, it's the kitchen sink.

At it's heart, COM's all about a design pattern that's based around "Interfaces".  Just as RPC defines an interface as the contract between a client and a server, COM defines an interface as a contract between a client of a set of functionality and the implementor of that functionality.  All COM interfaces are built around a single "base" interface called IUnknown, which provides reference count semantics, and the ability to query to see if a particular object implements a specific interface.  In addition, COM provides a standardized activation pattern (CoCreateInstance) that allows the implementation of the object to be isolated from the client of the object. 

Because the implementation of the COM object is hidden from the client of the object, and the implementation may exist in another process (or on another machine in the case of DCOM), COM also defines its interfaces in an IDL file.  When the MIDL compiler is compiling an IDL file for COM, it emits some additional information including a C++ class definitions (and C surrogates for those definitions).  It will also optionally emit a typelib for the interfaces.

The typelib is essentially a partially compiled version of the information in the IDL - it contains enough information to allow someone to know how to marshall the data.  For instance, you can take the information in a typelib and generate enough information to allow managed code to interoperate with the COM object - the typelib file contains enough information for the CLR to know how to convert the unmanaged data into its managed equivilant (and vice versa).

The third technology is OLE Automation (Object Linking and Embedding Automation).  OLE Automation is an extension of COM that allows COM to be used by languages that aren't C/C++.  Essentially OLE Automation is built around the IDispatch interface. IDispatch can be though of as "varargs.h-on-steroids" - it provides a abstraction for the process of passing parameters too and from functions, thus allowing an application to accept method semantics that are radically different from the semantics provided by the language (for instance, VB allows parameters to functions to be absent, which is not allowed for C functions - IDispatch allows a VB client to call into an object implemented in C).

Anyway that's a REALLY brief discussion, there are MANY, MANY books written about this subject.  Mike referenced Dale Rogerson's "Inside COM", I've not read that one, but he says it's good :)



  • Maybe this (classic?) diagram will help you to better understand how marshalling works:
    +------+ |Remote COM object|
    |Client| +-----------------+
    +------+ /-\
    | |
    \-/ +------------+
    +-----------+ |Remote proxy|
    |Client stub| +------------+
    +-----------+ /-\
    | |

    That's the basic call-flow when invoking a COM object out of process, as explained by Mo above. So basicly the stub/proxy pair handles the out-of-proc complexity so the client doesn't have to think about it. Quite the same is done in CORBA.

    Sorry for my bad ASCII-drawing skills ;)
  • Jikes.. the diagram didn't look very good in the edit-box here, but it certainly ended up appearing worse after the conversion to HTML :/.

    [1] is a much better figure, which is figure 9 in [2] which is written by Kraig Brockschmidt who designed/created a lot of COM/OLE AFAIK(correct me if I'm wrong Larry!). You'll see a lot of other figures in that article explaining things discussed here as well.

  • COM, IMO, is about several things:

    - Memory allocation discipline (CoTaskMemAlloc et al)
    - Object activation protocol (CoCreateInstance et al)
    - Object lifetime control (IUnknown::AddRef and Release)
    - Object interaction protocol (IUnknown::QueryInterface and interfaces)

    The rest of the stuff kind of follows from these basics. The memory allocation protocol arguably is only there to enable marshaling but nonetheless establishing a standard for lifetime management of non-objects is pretty important.

    Interfaces, being long-lived binary API contracts are pretty darned important and useful. I'm constantly amazed at how people seem to have forgotten the problems with making non-virtual constructs part of the long-term contract for an object.

    You can debate the relative goodness of reference counting vs. garbage collection. In my book, determinism of lifetime beats faster allocations hands down. But then maybe I'm becoming a dinosaur. The stupid thing was forcing a virtual function call for every modification of the refcount...

    The activation is arguably the most important part of the definition at a systems level. The fact that the metadata to determine how and where to activate an object is separate from the calling code is probably the greatest genius of COM.

    It's unfortunate that a lot of issues came to light during/after development but hey that's the reality of product development.

    Object-based marshaling, which is very cool, is really a MS-only innovation over DCE RPC that as Larry mentions was done as part of Cairo. (I'm not sure that's true; when joining MSFT in '94 the incipient release of DCOM was heralded as the great enabler of truly distributed systems and Cairo was still incubating furiously at the time; it's just suprising for a technology that's incubating to spin off an important subpiece and actually release it... it would be like if Avalon or WinFS shipped before LH. But that's also where Nile a/k/a OLE/DB came from... ah for the old days when something as simple as the next set of APIs were going to solve everyone's problems...)

    In simple terms, COM = (OLE/2 - all the document/in-place-activation stuff).
  • Cairo was a set of technologies announced at the first PDC back in 1991 by Jim Allchin.

    As mentioned in this article:

    there were essentially 5 pieces to Cairo:
    1) DCE RPC
    2) x.500 Directory
    3) x.400 Messaging
    4) Content Indexing/Object filesystem.

    The NT networking team picked up the RPC component for NT 3.1, the directory was delivered in Win2000, the X.400 messaging system was delivered in Exchange, te content indexing was delivered in Index server.

    The only significant technology announced at the PDC that's NOT been delivered was the indexible filesystem.
  • One thing that has always confused me with COM is the STA Model which is implemented using a hidden window to provide Single Threaded access to the COM object. Can anyone throw some light on what actually goes on underneath this particular model? The COM runtime has a lot of quirks built into it hidden from the programmer and because of this sometimes it is possible to shoot yourself in the foot if you dont properly understand the apartment concepts and use multhi-threading in your program.
  • As I wrote here:

    Threading models for COM exist to protect COM components that weren't designed for multi-threading access. For example, since VB doesn't have any concept of threads, it's highly likely it'll mess up royally if a COM component authored in VB is called from multiple threads.

    So the apartment model exists to ensure that those COM components don't break royally when dropped into a multithreaded application.

    The hidden window is used for marshalling - just like RPC marshals parameters across a process boundary, COM marshals parameters across a thread boundary to ensure that only one thread calls into the COM component.
  • This is why it's important to never:

    1. block execution (e.g. synchronous I/O)
    2. drop window messages

    on a STA thread.

    The whole STA/MTA thing is rather unfortunate. It works as designed but not as most people expect. STAs were designed to make object authors' lives easier by giving them a simple concurrency model but you actually have to be a much better programmer to do things right on a STA thread.

    But maybe this is indicative of the fact that building highly responsive UI is much harder than most people budget for anyways.

  • I suspect this is one of the reasons why the BeOS folks designed all their stuff such that every window would have a separate dedicated UI thread; though on BeOS, threads are cheap.
  • I should clarify; BeOS doesn't (as far as I know) use COM, but the problem of UI-blocking is one that's plagued programmers on a whole host of different platforms for years :)
  • PingBack from

  • PingBack from

  • PingBack from

  • PingBack from

  • PingBack from

Page 2 of 2 (29 items) 12