Mixing deterministic and non-deterministic cleanup

Mixing deterministic and non-deterministic cleanup

  • Comments 25

Hi, my name is Alan Chan.  I’m a software design engineer in Visual C++ libraries team.  As the name suggests, my team owns some of the C/C++ libraries such as ATL, MFC, CRT, STL, and OpenMP.  Today, I’m going to talk about one very interesting COM interop bug that I have worked on recently. 

 

Basic scenario: We have a mixed MFC and .NET application.  When the user presses a button in the application, it pops up a MFC dialog which hosts a managed control.   This managed control implements IQuickActivate and is activated through COleControlSite::QuickActivate().  This function creates all the appropriate sinks - in this case IAdviseSinkEx and IPropertyNotifySink – and passes them to the managed control through IQuickActivate::QuickActivate(). 

 

When the user dismisses the dialog, the destructor of COleControlSite is called.  The destructor disconnects all the sink COM objects that were passed in through QuickActivate(), deactivates the managed control, and deletes the sink COM objects. 

 

This works on the whole.  However, if the user opens and closes the dialog repeatedly, the application will AV and crash.  If you read the title, you can probably guess what’s happening!  It turns out that the garbage collector (GC) is calling IUnknown::Release() on the IAdviseSinkEx and IPropertyNotifySink COM objects after they have been deleted.

 

As most of you know already, whenever a COM object is passed from native code to managed code, the .Net framework creates Runtime-Callable Wrappers (RCW) automatically.  The RCW wraps the object and controls lifetime.   In this case, when the destructor of COleControlSite calls Unadvise() to disconnect the sinks, the managed control does, in fact, disconnect the sinks and drop all references to the RCWs of the sink objects.  However, IUnknown::Release() is not called at this point.  IUnknown::Release() is called in the RCW’s finalizer function.  Dropping all references to the RCWs only means that the RCWs are eligible for garbage collection.  The GC can collect these RCWs at anytime.  Garbage collection is determined by memory pressure.  If these RCWs were garbage collected and finalized after the actual sink COM objects have been deleted, it will cause an AV.

 

So how do we fix this?  We need some way to clean up these RCWs deterministically.  Sadly, RCW doesn’t implement IDisposable interface or else we could call IDisposable::Dispose() for cleanup.  Fortunately, the framework provides a function, Marshal.ReleaseComObject(), for deterministic cleanup.  Each RCW keeps a reference count of how many managed references are pointing to itself.  (This is completely separated from the reference count in the COM object.)  Every time you copy or QueryInterface() a managed COM reference, the RCW automatically increments its reference count.  When ReleaseComObject() is called, the RCW decrements its reference count.  As soon as the reference count reaches zero, the RCW will then call the finalizer function and release all the COM reference counts that it is holding deterministically.

 

Therefore, similar to calling IUnknown::Release() in native code, we should always call ReleaseComObject() on the managed COM references before it goes out of scope.  This should guarantee that IUnknown::Release() is called deterministically. 

 

I hope this will save you time when debugging COM interop bugs. 

 

Thanks,

Alan Chan

Visual C++ Libraries Team.


 

 

  • Ah... the toxic sludge of Microsoft interop.  What "new technologies" will those cleaver executives come up with next?  I'm guessing technologies that will invariably require yet more kinds of kludge filled interoping - where complexity spirals - where we invest increasingly more of our time achieving the simplest of tasks.  Only time will tell.  I really do like Microsoft many products... it's a shame when good technology goes bad.  
  • give it a rest. they can't just break backward compatibility. thats killing their customer base. or is it...
  • Hi Alan,

    I have used ReleaseComObject in one web application where .NET didn't release the memory owned by native COM sooner (non-deterministic => deterministic).

    May be I am missing something?? How does calling ReleaseComObject help in your case ?
    :-( Can you explain that further ?


    Thanks
  • Hi,

    Thank you for your comment.  Calling ReleaseComObject() doesn't guarantee that the RCW will be release the COM object.  Let I said in the blog, each RCW has a reference count of how many managed reference is pointing to it.  So if there are other managed reference using that RCW, it would not call Release() on the COM object.  You can easily confirm if this is the case by checking the return value of ReleaseComObject().  If the return value is non-zero, there are other managed reference pointing to the RCW.

    If you really want to the RCW to call Release() at a particular point in the code, you can call ReleaseComObject() in a loop until the return value reaches zero.  This should ensure the RCW will call Release().  However, if you do that, be warned, when the other managed references try to use that RCW, it will cause an exception.

    Hope this helps.

    Thanks,
    Alan
  • Hi Alan,

    Thanks for the reply. I recalled reading some MSDN article about the looping. But, I am just curious how ReleaseComObject resolve your IQuickActivate problem ?

    In my web app problem, because RCW still has a ref count of 1 (GC does NOT 'feel' there is a need) and i.e. does NOT call release and hold some memory held by native COM. So, we have to call ReleaseComObject manually to force that to happen.

    Thanks
  • Hi,

    Thanks for the reply.  I'm glad to hear that it worked out for you.  

    In my case, however, the managed control which implements the IQuickActivate interface did not call ReleaseComObject().  It was relying on the GC to cleanup instead.  Therefore, the .Net framework still holds a COM reference count to the sinks even though we have explicitly disconnected the sinks.  And since the sink COM objects are deleted (ignored COM ref counts) when the dialog is destroyed due to MFC lifetime restrictions, the GC might call Release() after the objects have been delete.  Essentially, the GC is calling back to an object that had been deleted (ie. accessing memory that had been freed).  

    By adding ReleaseComObject() calls in the managed control, we are making the RCWs call Release() deterministically when we disconnects the sinks.  Therefore, this would guarantee that .Net framework have released all it's COM references before the COM object have been deleted.

    Thanks,
    Alan
  • Hi,

    Another issue with non-deterministic finalization and COM Interop is that, depending on the threading model and how the COM object is actually implemented, releasing it from a different thread than the one that allocated the object (the GC finalization thread) may crash.
    I always wondered why IDisposable was not implemented on strongly-typed RCW generated from a Type Library. Do you know what the reason for this design choice is?

    Regards,

    Ianier
  • So, in this case, the real problem is that MFC is supplying a COM object that doesn't really obey the COM rules for lifetime and is being forcibly deleted despite a lingering COM reference from the RCW.  Forcing the RCW to release the reference is a work-around for a shortcoming in MFC, not for a shortcoming of the CLR.
  • Hi Carl,

    It's not so simple :-). Keep in mind that COM has no clean way to deal with circular references, so there are situations where an object just deletes other objects that it "owns".

    In general, COM implementations of the Observer pattern go this way. In those cases, the order in which clients should release the objects is usually specified and documented. This is certainly not nice, but it's a limitation of COM we have to deal with.

    In addition, taking into account the frequent issues with thread-bound allocation/deallocation of COM objects, I think that making the RCW implement IDisposable would make our life easier.

    Regards,

    Ianier
  • Hi,
    I'm glad I found this article! I have exactly this problem - a MFC8 app using .net UserControls in a Dialog (and CWinFormsView) and randomly crashing. I think this goes towards explaining what is happening.
    Trouble is I can't quite figure out where I should be calling ReleaseComObject()! Will it be in MFC side? The managed control is held in a CWinFormsControl<>.
    Cheers,
    Aled.
  • Hi Ianier -

    RCW implement IDisposable - yes, that'd be a great help.

    In the particular case you described though, MFC is clearly at fault.  You had manually broken the reference cycle already by unadvising the sinks, but that still doesn't give MFC license to delete a COM object while it's still referenced.  Rather, the appropriate thing for MFC to have done would be to create a heap-based object (as COM objects should always be) that strictly follows COM lifetime rules.  After the reference loop is broken, that object should return failure from all of it's interface methods, but should continue to exist until the COM rules say it should go away.
  • Hi Carl,

    You're right. Unfortunately, many Microsoft COM APIs don't obey the COM rules strictly. Another example comes to mind: DirectSound devices own DirecSound buffers in the same way described here.

    I do admit, though, that sometimes this is done for practical reasons (e.g. overhead of checking the object state in every method of a real-time API).

    Regards,

    Ianier
  • Hi Carl, Ianier,

    You are both right.  I do agree that MFC can do a better job at cleaning up the object.  Ideally, like Carl suggested, when the dialog goes away, we should "zombify" the sink COM objects and keep it around until the .NET framework actually call Release() on it.  Unfortunately, for app-compatibility reasons, we cannot make that change now.

    Moreover, keep in mind that this code was designed and writeen many years ago (probably around 8 - 10 years ago).  At that time, there weren't any managed code.  If the other side is actually native code with deterministic cleanup, this code works very well.  And it has been for many years.  The developer at that point probably chose this design over zombifying the objects for simplicity and efficency reasons.  

    Thanks,
    Alan
  • Hi Aled,

    Just curious, what managed control are you using?

    To be safe, I just want to make that the problem that you are seeing is the same problem that I have described in the article.  First of all, are you (or the MFC code underneath) calling QuickActivate()?  Are there an AV coming from .NET framework?  You can check this by setting your VS2005 IDE to catch first chance access violations.  You can do this by: Debug->Exceptions->Win32 Exceptions->check "c0000005 Access Violation".  This way, when AV occurs the debugger will stop the execution.  When the AV occurs, can you check the top of the call stack whehter it is in .NET framework (I believe, mscoree.dll)?

    To answer your question, ReleaseComObject() should called on the managed side.  ReleaseComObject() is a managed function; it can not be called from native code.  And MFC doesn't know whether the other side is native or managed.  If the other side is native, it wouldn't make any sense to call ReleaseComObject().  Therefore, it must be called from the managed side.

    Thanks,
    Alan
  • Hi Alan,

    The managed control is one of my own creation - it is a UserControl in a class library. In my test app it has only a textbox and a button on it. The control is being created using IQuickActivate in MFC (via DDX_ManagedControl).

    The stack trace shows that the AV is occuring in mscorwks.dll!SafeReleaseHelper(), which was called via ole32.dll!CRemoteUnknown::DoCallback(). This is repeatable 100% when I force a GC collect.

    Interestingly, if I force MFC to bypass the IQuickActive method (using the debugger), the AV doesn't seem to occur anymore.

    My trouble I have with calling ReleaseComObject() is knowning what to pass to it. Passing just the managed object reference yields an exception that the object doesn't derive from __ComObject.

    Cheers,
    aled.
Page 1 of 2 (25 items) 12