Introduction

WPF offers a modern approach for building Windows applications, but it builds directly on Win32 – the traditional UI infrastructure in Windows. Because Win32 was developed in an era where CPU/GPU horsepower was much more limited than it is today, it utilizes a number of optimizations for rendering. These include using the “reverse painters algorithm”, tracking invalid regions for minimal updates, copying pixels around the screen to avoid unnecessary repainting, extensive clipping to avoid overdraw, limited support for alpha blending, and other such restrictions. Most of these optimizations were built for GDI, the standard rendering technology in Windows for many years. DirectX is an entirely different rendering technology, originally intended for PC gaming on Windows, but recently finding more of a presence in desktop applications (consider IE9, for example). DirectX provides access to the power of modern GPUs, and WPF embraced this technology – rather than GDI - in order to build the kind of rich composition framework we envisioned. As a consequence, WPF developed its own transformation, clipping, and composition implementation which is completely different than what Win32 implements.

In Win32, a “window” is the basic unit of user-interface composition.  Programmatically, a window is referenced by its handle, or HWND.  Windows come in two major flavors: top-level windows and child windows.  Top-level windows are what you normally interact with, they float on the desktop and usually have the standard minimize/maximize/restore/close buttons, a title bar, etc.  This is the primary scenario for WPF; where it is the exclusive content of the client area of a top-level window, which is what you get when you create a WPF Window object.  On the other hand, child windows are a unit of composition for user-interface components within a top-level window.  Child windows can be incredibly useful; they can run on different threads (or even processes), they can have their own state, they can respond to special messages, etc.  A number of technologies build on child windows, such as ActiveX, so incorporating them into your WPF application is a common requirement.  To properly include a child window in a WPF element tree you must derive from HwndHost and implement the logic to construct and destroy the child window, as well as hooking up keyboard processing and handling any special messages. While WPF does not use child windows for most of its user-interface elements because we have our own element tree, there are some exceptions; the WebBrowser control is probably the most widely known example of a WPF control that is actually a hosted child window.  WPF also includes support for hosting Windows Forms components, which are implemented with child windows.

You can even embed WPF content within a child window by directly creating an HwndSource instance. This is a powerful technique that we will be exploring more in this article, and it is often used in scenarios where WPF is used to develop a plugin for another application.  A note of caution: child windows have a number of exotic settings that WPF does not support well; such as tinkering with child/sibling clips or composited painting.

The kind of rich user-interface presentation that is enabled by WPF is possible in large part by the way we uniformly compose the element tree.  Composition is what allows WPF to render elements in interesting ways: clipped against opacity masks, distorted via transforms, blended with effects like transparency, processed through pixel shaders, redirected through brushes, projecting onto 3D objects, etc.  But due to the limitations of how child windows were designed to work, composing them is notoriously difficult.  Indeed, when WPF hosts a child window, we skip the composition all together and simply position it over its layout slot and then let it do its own thing.  As you might imagine, without composition many problems can be observed: missing content, incorrect clipping, lack of transparency, z-order issues, etc. We refer to these issues as “airspace” problems – a term coined by Nick Kramer.

In some instances, you may also encounter strange rendering artifacts on the screen; particularly when moving a child window around.  These rendering artifacts are often due to conflicts between GDI and DirectX, along with some limitations or bugs in the DWM.

This article discusses in some detail all of these issues, and explores some ways of mitigating them.

Airspace Problems

The most common complaint related to airspace problems we hear about concerns clipping. To be visible on screen, a WPF element tree is always associated with some window; typically a top-level window but it could also be a child window. WPF composes the element tree and renders to that window. Child windows always render on top of, and are always clipped to the bounds of, their parent window. Recall that when hosting a child window within an element tree, all WPF will do is position the child window over its layout slot. It is important to realize the WPF is not rendering the child window. The child window is rendering itself – or, more accurately, the child window is painting to a GDI device context associated with its window without coordinating through WPF. So even though an HwndHost appears to be an element that can be interleaved with other elements in the element tree, the hosted child window is actually placed on top of anything WPF renders. This results in predictable problems: the child window is not clipped by containers like scroll-viewers, and nothing can render over them, not even adorners.

In fact, there is no good way for WPF to render over of a child window. Some people have experimented with relaxing the clipping behavior of Win32, but this causes much worse problems. It is better to play nicely with Win32 clipping rules, which is why WPF enforces WS_CLIPSIBLINGS and WS_CLIPCHILDREN.

Since a child window is painting itself, and is not rendered by WPF, it should be clear that WPF doesn’t have access to the contents of the window. For example, if you use a VisualBrush to render an element tree, the contents of any hosted child window will simply be missing. Another example is that you can’t apply a ShaderEffect to a hosted child window because the contents of the window are not actually available to WPF for processing.

Another common symptom people notice is that even simple things like transparency don’t work either. This is because the Win32 painting model doesn’t generally support transparency, so the child window that is placed on top of the WPF content, has no way to paint itself with partially transparent pixels. Using styles like WS_EX_TRANSPARENT don’t work for reasons that will be discussed later.

A final nuance is that a hosted child window is created when the HwndHost element detects that it is plugged into an HwndSource. If an HwndHost element is removed from one element tree and then added to another, the hosted child window will be reparented to be within the destination HwndSource parent window. As a consequence, the z-order is reset to the bottom. In addition, while setting the ZIndex property will cause WPF panels to render their children in a different order, the HwndHost will not adjust the corresponding z-order of its hosted child window. Not that it would matter relative to the WPF content it is floating over, but the z-order gets out of sync even relative to other hosted windows.

Render Thread Considerations

WPF uses a somewhat novel technique of splitting the user-interface responsibilities between a traditional user-interface thread and a dedicated render thread.  The user interface thread (or threads, as there can be more than one) is responsible for processing user input and updating the element tree.  The render thread is responsible for rasterizing the vector graphics described by the element tree and updating the display. This design even allows the render thread to be in a separate process, even a process on on another machine.  This was the infrastructure that allowed WPF to support high-fidelity zooming on Vista that preserved the sharp vector graphics, and the efficient performance over remote-desktop connections that really improved the experience compared to bitmap-based remoting.  Unfortunately, this design required the various OS remoting components to be updated in sync with the WPF releases, and this eventually proved untenable and the features were disabled in WPF 3.5 SP1.

Another goal of this design was to allow a degree of independence for rendering tasks so that they wouldn’t be blocked by stalls on the user-interface thread.  The obvious scenario is “independent animations” that the user interface could start and then forget about, and they would animate on the render thread, free from any glitches or stuttering due to processing on the user interface thread.  This would be ideal for transitions, which typically involve a lot of work on the user interface thread to load in new content.  Unfortunately, WPF never got to complete the feature work required for independent animations.  In fact, the only real benefit from the separate render thread is a certain amount of parallelism and that the MediaElement can render frames independent of the user interface thread once playback is started.  More recent incarnations of XAML platforms have better support for independent animations, such as Silverlight 5.

Unfortunately, this design has its own problems too.  The render data needs to be duplicated for both the user-interface and render threads, and updates have to be sent and received, and object lifetimes have to be carefully coordinated.  This increases both the CPU and memory costs associated with heavy graphics content.  WPF has implemented a number of optimizations for efficient partial updates, but there is still significant room for improvement.  Another problem is that there are some situations where the user-interface thread is expected to synchronously complete a rendering operation.  Two prominent examples are PrintWindow and bottom-to-top painting.  PrintWindow redirects the device context for the window and sends a WM_PAINT to the window, the window is expected to paint its contents to the redirected device context, and when the window proc returns from handling WM_PAINT, the window is restored to its original state.  PrintWindow is unreliable when used on a window with WPF content because the user-interface thread does not actually paint, so the content is missing.  Bottom-to-top painting is configured by specifying the WS_EX_TRANSPARENT style on a child window.  When this style is present, Win32 will paint the siblings beneath the window first, and then paint the window.  This allows the child window to selectively let that content show through, though it can be tricky to implement well. But because WPF doesn’t paint of the user-interface thread, the contents of our window are not reliably available for any sibling over us to use.  There is also the unusual WS_EX_COMPOSITED style that doesn’t work with WPF content either.

Finally, there are times when the user-interface thread needs to copy bits from the screen and this is problematic with WPF.  An example is when moving a child window without the SWP_NOCOPYBITS flag (note the double negative; this means moving a window with copying bits). To avoid unnecessarily repainting a window when it moves, Win32 will copy the existing content of the window from its old location to its new location, and only repaint what is left behind. But since the old contents that GDI is trying to copy might be updated at anytime by the separate render thread via DirectX, the necessary coordination is missing, and sometimes garbage is copied. Another example is when a top-level window over a WPF window has the CS_SAVEBITS style.  With this style, Win32 will copy the bits before it displays the window and then put the bits back when the window is hidden.  Menus and message boxes are the typical windows with this style. As Raymond Chen discusses, Win32 tries to detect when something makes the saved bits invalid and will fall back to just repainting the background.  But WPF’s separate rendering thread using DirectX can go undetected, so Win32 may put back the old contents.

The splitting of user-interface processing and rendering across separate threads is unusual in Win32 applications, and doubtless the source of many subtle bugs. If integrating child windows into your WPF application is a primary concern, I strongly recommend disabling WPF hardware acceleration. By disabling hardware acceleration, WPF will render using GDI. This eliminates many issues because GDI can coordinate correctly even from multiple threads. If disabling hardware acceleration is not an option, we will review some mitigations that work around most of the rendering artifacts I know of. Unfortunately these mitigations are not always trivial.

DWM Considerations

Originally, painting operations for a window were clipped to the visible region of that window and then allowed to update the video memory of the screen directly. This was very efficient, but rules out such effects as transparency and can lead to lagging visual artifacts as updates to the screen were gated by the responsiveness of applications.

Windows 2000/XP introduced layered windows via the WS_EX_LAYERED style. For layered windows, the operating system retains a bitmap of the contents of the window and can compose it on the screen as needed, without waiting for the application to respond, and can apply effects like transparency.  The contents of this bitmap can either be provided by the application directly, or can be filled by “redirecting” the traditional painting operations for the window. SetLayeredWindowAttributes controls the appearance of windows with redirected content; but it is restricted to simple effects like constant transparency or a color key.  Applications that can provide their own bitmap, rather than using redirection, can use UpdateLayeredWindow instead.  This API can accept a bitmap with a per-pixel alpha channel, so this is the API that WPF uses when you set the Window.AllowsTransparency property to true. Note that this kind of layered window only displays the contents of the bitmap that the application explicitly provides; nothing drawn via GDI or DirectX to a device context for this window or any child windows is shown. This is the root cause of the problem where the WPF WebBrowser control cannot be displayed in a layered window

Redirected layered windows provide “output redirection”; where the operating system redirects the output of rendering operations normally destined for the screen into a bitmap; which allows it the flexibility to compose the bitmap in various ways.  However, if the composition is non-trivial, the operating system must also provide some kind of “input redirection” where messages and APIs related to the input coming into an application accounts for the composition effects.  It is also important that APIs related to location, size, and coordinate transforms also account for the composition.  So far, the operating system doesn’t compose layered windows in a way that would require input redirection.

Windows Vista introduced the Desktop Window Manager which redirects all top-level windows (that aren’t already layered) and then composes the contents of all of the windows onto the desktop.  DWM is capable of composing the window contents in more interesting ways than XP was able to; the Flip3D feature is probably the most dramatic - but note that it disables input to the windows and so it does not need to worry about input redirection. The DWM can also scale a window to accommodate higher DPI settings; but this is a simple scale transform and required only modest updates to input processing coordinate transform APIs. The redirection services of the DWM remained limited to top-level windows.

Difficulties are encountered on Vista if both GDI and DirectX are rendering to the same window. Vista handles this scenario in the DWM by maintaining separate surfaces for the GDI and DirectX content. This mode is called “Vista Blt”, and the DWM chooses what to which to display on the screen based on a heuristic as to which rendered last. The DWM will present from one or the other, or it may even display some content from both. Windows 7 introduced a better mode called “Device Bitmaps”, which allow both GDI and DirectX to be captured into the same backing surface. Unfortunately, Device Bitmaps are not compatible with all APIs and will be disabled if those APIs are called (the primary example is DwmGetCompositionTimingInfo), and it has its own problems.

Windows 8 introduces Direct Composition, which extends the redirection and composition services of the DWM to child windows.  This is a major development, and finally enables child windows to be layered.  Unfortunately, Direct Composition still uses the DWM as an external compositor, rather than being intimately integrated with the application.  In other words, you configure what you want and the DWM composes it for you.  This precludes certain kinds of composition effects, such as texturing a child window around a 3D model within your application or processing it through a pixel shader.  It is also worth mentioning that even though Direct Composition allows for composition effects that tear away from the input and coordinate APIs, there is still no support for input redirection.

In the past you could disable the DWM if it didn’t support a particular scenario very well; though that was rarely a good idea.  However, starting with Windows 8, the DWM can no longer be disabled. 

Redirection

As perhaps hinted at above, the ideal solution to these problems would be to let application frameworks like WPF compose the contents of child windows themselves.  This kind of “internal” compositor would enable far richer composition effects.  However, “input redirection” must also be provided for any such composition system to be used for more than passive “transitions”.

When developing WPF 4.5, we decided to take on the challenge of implementing a complete redirection solution so that WPF applications could compose the contents of child windows as naturally as any other kind of content.  We explored many techniques, but finally settled on using some form of API interception to shim the problematic GDI, DirectX, and Win32 APIs.  We settled on Detours, because it is a very robust and complete interception library developed here at Microsoft.  It is important to note that this is an in-process library, nothing we did had impact outside of the WPF process that opted into this new feature.

Output redirection for DirectX was pretty straight forward.  DirectX is a great API in this regard because all rendering operations effectively occur off-screen until a final swap-chain (or device) “present” is invoked.  Our basic technique was to intercept this present call and implement it by copying the contents to another video surface; which we would then display in WPF via a D3DImage.  We did, however, have to support Direct3D 9, 10, 11, and potentially future versions too.  This was only tenable because the DirectX team has had the foresight to refactor modern Direct3D APIs on top of DXGI and this is where the important present API exists.  But even with this refactoring, our code has to create intermediate surfaces and such which require a specific Direct3D version.  Again, the DirectX team came through with the ability of DirectX 11 to emulate the device state of previous versions.  The remaining complexity was in handling all possible surface formats and usage restrictions.  We also had a hard time handling D3DSWAPEFFECT_FLIPEX, and eventually just stripped that flag when creating swap chains.

Output redirection for GDI was more involved.  GDI has a large API surface area, and GDI does not have the equivalent of “present”.  Instead, GDI rendering APIs take effect whenever the driver decides to flush the command stream.  We ended up intercepting 200+ GDI functions, and replaying them to an offscreen DC.  The vast majority of these interception shims were mechanically generated by a tool based on whether they read state, set state, or actually rendered to the device context.  The hardest challenge was dealing with clipping.  We needed to mirror some (but not all) of the clipping state from the real device context onto our redirected device context.  Some of the clipping state is set deep inside the operating system, so detecting when the clips were changed was beyond the reach of our interception library.  Other clipping state needed to be excluded, such as when a child window is partially outside the bounds of its parent, but composed such that those regions are visible.  In such cases we have to somehow get WM_PAINT and related messages to still be generated.  Clipping too much is clearly a problem, but so is clipping too little.  We eventually developed our own extensive clipping model for redirected windows that tried to emulate what Win32 did internally, but this was a continuous source of bugs.

Input redirection posed another difficult challenge.  Any input device that depends on the location of elements to direct the input stream needs to take the composition effects into account.  WPF has already been plumbed for this, of course, since our elements have always been fully composed our input events are routed appropriately.  But Win32 stubbornly considers windows to be rectangles that are rigidly aligned with the horizontal and vertical axes. This restriction is baked deep into the kernel where the window tree data structures are kept.  The kernel takes raw input from devices like the mouse and performs an initial hit test to decide where the input should go.  Since our solution was in-proc only, we have to retain the restriction that our top-level window obeys these rules.  (Note that with layered windows, WPF can render any shape, and this allows us the apparent freedom to do things like rotate our top-level window.  To see this is action, rotate a ComboBox and the drop down the list.) 

The kernel will deliver “pointer” input to the thread queue that owns the window (including child windows) it thinks is under the mouse.   The thread that owns the queue will eventually pick up that input and perform the hit-test processing again and then deliver the appropriate messages, update state, etc. We have to play by those rules, but still find some way to appropriately handle composited child windows. We explored two basic approaches: prevent the kernel from finding any composited child windows, or make sure the kernel finds the correct composited window.

Child windows can be hidden from the kernel’s input processing in a variety of ways, but specifying the WS_EX_TRANSPARENT style or disabling the window is the easiest.   The kernel will find the top-level window and deliver the input to that queue.  From there we could perform our own hit testing to find the appropriate child window, and generate our own window messages to send it.  The hardest challenge with this technique is sending all of the correct messages, with the correct payload, in the correct order, at the correct times, and somehow updating all of the correct global state.  By the time you are done, you would have effectively reverse engineered a significant portion of Win32!  Further, since child windows can be running in other threads and processes, we would have to synchronize messages and state across these boundaries.  We soon decided that this would be too difficult.

Our other option was to ensure that the kernel finds the child window that we know is the right one.  This technique is to eagerly perform a hit-test ourselves, and then move the appropriate window under the mouse position so that the kernel will find it.  All of the messages and state are then handled normally by Win32, including across thread and process boundaries.  For example, say that we hit-test the mouse position through the element tree - taking into account all composition effects - and determine that the lower-right corner of a child window is what is found.  We would then align the entire window hierarchy so that the appropriate child window’s lower-right corner is what the kernel will find when it hit-tests the mouse coordinate.  We needed to do this alignment before the kernel does, so we chose a mouse hook.  We started with a low-level mouse hook, because those are pretty easy and they are invoked early enough.  But low-level mouse hooks will block the kernel until the application responds.  So if your app is busy, mouse processing for everyone is stalled.  A common example of this for developers, is when you are debugging your application, the mouse would stop working!  That is unacceptable, so we changed to using a regular mouse hook.   This kind of mouse hook is invoked before the thread further handles the mouse input that the kernel gave it.  It turns out this this is sufficient, because Win32 will correctly adapt to changes in the window tree between when the kernel posted the input to the queue and when the queue was processed by the thread.  So as long as the kernel finds some window belonging to our thread so it can deliver the input to our queue for further processing.  Now an unresponsive application will only stall input processing for itself.  There are lots of details to sort out, but this technique worked pretty well.  The major restriction of this technique is that we cannot support any kind of multi-touch, because a window cannot be in two places at the same time.  We decided to accept this limitation for composing child windows, and hope that new multi-touch controls would be written in pure WPF.

Layered windows (recall that WPF uses the type of layered window where we provide the bitmap of the contents ourselves) remained a problem.  Even though we could redirect the output from child windows within, the OS refused to deliver paint messages to them, so they would simply never bother to paint.  Content like videos would work fine because they typically initiate their own painting instead of waiting for a WM_PAINT message.  We explored a couple of options, including sending our own fake paint messages, or hosting those child windows in their own invisible top-level system-redirected layered windows, but composing their contents in our window.  We will revisit this technique later in this article.

With this system in place, we were able to successfully integrate child windows - without any airspace limitations - into scenarios that were simply impossible before:

  • A 3D cube with standard WebBrowser controls on each face, much like Chris Cavanagh’s YouCube.YouCube
  • Simple transition effects, like spinning and fading
  • Standard stuff like putting ActiveX controls within a scrollviewer and using an AdornerLayer to adorn them.
  • A carrousel control with a mix of WinForms, WPF, ActiveX, and straight-up Win32 controls on it; with a mirror effect and much smoother animation.
  • An MDI solution that could host legacy controls in each floating window, with transparency, scaling, and visual brushes.
  • A RemoteDesktop ActiveX client running in a layered window.
  • Using other video playback controls rather than our troubled MediaElement.
  • And many, many more examples.

I hesitate to overstate this, but it was quite possibly the most exciting work I have ever done during my 15 years at Microsoft. I was very proud of the accomplishments of my engineering team in solving one of the most vexing problems that afflicted the otherwise fantastic WPF platform.

You can imaging my heartbreak when, after an extensive review, we decided we could not actually ship this feature. Our concern was that we had to hack too deeply into the system, and in ways that were too difficult to explain - let alone maintain. Even though we required that developers explicitly turn on this feature for each HwndHost, we felt the kinds of problems they would encounter would be baffling to them and training our support engineers to handle the escalations would be very difficult. Even towards the end of our development, we were struggling with a long bug tail and performance concerns.

This is exactly the kind of deep system integration that needs to be done by the Win32 platform team, officially sanctioned and supported. With Win8, we are beginning to see some incremental improvements in this space, as noted before in the DirectComposition API. Unfortunately, it is still not possible to build the same kind of rich composited experience we had developed.

The rest of this article will focus on how external developers like you can build some of this functionality without hacking as deeply as we did. I will leverage the lessons we learned during our development to provide reasonable real-world workarounds to the most common problems. I will build an example MDI framework as a real-world application of the mitigations we will discuss. Please note that this is an example, it is not intended to be a feature-complete solution you can just drop into a project. But it should serve to showcase the problems and workarounds nicely.

Subclassing HWNDs

Before we dig into mitigating the airspace issues, we should first discuss intercepting window messages. WPF offers several hooks for intercepting the window messages received by the windows we integrate with. HwndSource.AddHook and HwndHost.MessageHook are examples. However, it is sometimes the case that you need to intercept messages sent to other windows. Win32 programmers have long used a technique called subclassing to replace the HWND’s window proc with their own, thereby allowing them access to the messages dispatched to the window. This is a very powerful technique, but surprisingly tricky to do from managed code. Since the new window procedure is usually implemented in managed code, but the function pointer is stored in unmanaged memory, you must carefully control the lifetime of the delegate to ensure that the GC doesn’t decide to collect the delegate while a reference is still being held in unmanaged memory. This requirement has led to a popular myth that claims you actually need to pin your delegate in order to also prevent the GC from relocating the delegate. It turns out that this is not actually required, as Chris Brumme - an authority on these matters - explained in his blog:

Along the same lines, managed Delegates can be marshaled to unmanaged code, where they are exposed as unmanaged function pointers. Calls on those pointers will perform an unmanaged to managed transition; a change in calling convention; entry into the correct AppDomain; and any necessary argument marshaling. Clearly the unmanaged function pointer must refer to a fixed address. It would be a disaster if the GC were relocating that! This leads many applications to create a pinning handle for the delegate. This is completely unnecessary. The unmanaged function pointer actually refers to a native code stub that we dynamically generate to perform the transition & marshaling. This stub exists in fixed memory outside of the GC heap.

However, the application is responsible for somehow extending the lifetime of the delegate until no more calls will occur from unmanaged code. The lifetime of the native code stub is directly related to the lifetime of the delegate. Once the delegate is collected, subsequent calls via the unmanaged function pointer will crash or otherwise corrupt the process. In our recent release, we added a Customer Debug Probe which allows you to cleanly detect this - all too common - bug in your code. If you haven’t started using Customer Debug Probes during development, please take a look!

Keeping your managed delegate alive as long as native code references the function pointer is actually hard to do in practice because things like AppDomain.Unload can tear down your managed objects quite abruptly. Even more complicated when dealing with subclassing is that someone else could subclass the window after you. Since subclassing just replaces the window procedure with another function pointer, and it is up to that implementation to call the previous one, the pattern does not support unsubclassing from the middle of a “chain” of window procs. A final wrinkle is that finalizers and events like AppDomain.Unloaded run on a separate thread, and this is almost always at odds with the thread-affinity model of Win32 (and WPF).

Microsoft improved the situation in ComCtl32 v6, with the addition of ”safe subclassing” APIs like SetWindowSubclass. Disappointingly, these new APIs remain relatively unknown. In the code accompanying this article, I have provided a managed class called HwndHook that uses these APIs to allow you to arbitrarily hook the window proc of any window in your process and owned by your thread. This will come in handy mitigating some of the issues we will discover in our MDI solution.

Mitigating Rendering Artifacts

As mentioned above, when changing the position of a child window, Win32 has an optimization to copy the contents from the old location to the new location, and just repaint the old location. When WPF is rendering with DirectX from our separate render thread, reading the screen contents from GDI on the UI thread is unreliable. The preferred solution is to disable WPF hardware acceleration, which causes WPF to use GDI, and GDI can coordinate appropriately across threads; but that may not be desirable for other reasons.

Another way to avoid the problem with this rendering artifacts is to disable the problematic optimization that copies the pixels. The key is the WM_WINDOWPOSCHANGING message. This message is sent to a window before it is actually moved or sized, giving it an opportunity to adjust the various parameters; and one of those parameters contains the flags which can be set to include the SWP_NOCOPYBITS flag.

HwndHost already responds to this message in order to coerce the HWND to have the size and position determined by layout; but it does not currently set this flag. It is easy enough to do this for HwndHost instances you create, by overriding the WndProc virtual. But what about existing HwndHost-derived classes like WebBrowser? Since HwndHost exposes the MessageHook event, it is relatively straightforward to add an extension method to add a handler to that event, look for the WM_WINDOWPOSCHANGING message, and set the SWP_NOCOPYBITS flag. You could also you use the HwndHook class introduced above to accomplish the same thing.

When you include the SWP_NOCOPYBITS flag, Win32 will move the window and then send it paint messages to render the contents in the new location. Painting in Win32 is split into two passes: painting the background, and then painting everything else. The background painting can be done automatically by specifying an appropriate brush when creating the window. Window procedures can also respond to the WM_ERASEBKGND message and do their own painting. Background painting is usually done by filling the window with a solid color. The WM_PAINT message is sent after the WM_ERASEBKND and this is where most of the “look” of an window is painted. This two-pass painting scheme can cause flicker on the screen, since GDI can flush its commands to the screen at any time, as you can see the window being cleared and then repainted.

This flicker is what is referred to in KB article 969728. This article recommends clearing the SWP_NOCOPYBITS flag to avoid this flickering for Windows Forms controls. This is, of course, the opposite advice than what I am giving! It is true that clearing this flag will reduce flicker, but it will increase the risk rendering artifacts if you then move around the hosted windows.

The WebBrowser control, however, is an interesting case in which we can do something about the flicker. When IE8 is installed, it seems to clear the background in response to the WM_ERASEBKND message. However, it also seems to completely paint the entire contents of the window in response to the WM_PAINT message. The flicker can be avoided by simply discarding the WM_ERASEBKND message, and letting WM_PAINT handle everything. It turns out that the window that actually does the painting is buried several levels inside the window that HwndHost is hosting, so some fiddly code is required to wait until the window tree is created and to dig down to the appropriate descendent window. Once we have found the window that does the actual painting, we can use the HwndHook class to intercept the WM_ERASEBKND message and discard it. This technique is shown in the sample code as an attached property to WebBrowser, and it is very specific to IE.

Note that IE9 has changed the painting behavior such that it no longer flickers, which suggests it is not clearing the background in response to WM_ERASEBKND anymore. Note also that IE9 now supports hardware accelerated rendering, which changes things substantially. This hardware accelerated rendering is controlled by the FEATURE_GPU_RENDERING key, and is disabled by default. Unfortunately, the only way to enable this setting is by a registry setting, which can be done in the sample application.  I have also included an option for disabling script errors, which are really annoying when using the WebBrowser control.

Remember - none of this is needed if you disable hardware acceleration for WPF.

Another interesting discovery was that moving an HwndHost around can cause the old contents to remain on the screen for longer than usual. This can lead to “trails” of old content smearing around the window if you are interactively moving the child window around - say by dragging around an MDI element with a child window inside of it. The reason for this is the when the window moves, Win32 invalidates the region where the window used to be. Invalid regions are updated when the thread gets done processing most other things, but input processing is higher priority, so dragging around the window just keeps accumulating the dirty regions for later. When the WM_PAINT message is eventually sent to our window proc, we just queue a request over to the render thread to update, which is another source of latency. An effective mitigation for this is to paint some content in the HwndHost.OnRender method. Now when the element is moved, WPF knows it needs to immediately update both the old and new positions - without waiting for Win32 to send us a WM_PAINT message.

Mitigating Clipping Issues

The fact that HwndHost elements are not overlap or clip like other visuals is probably the most noticeable airspace issue people encounter. In Win32, overlap is almost always handled by clipping (there are exceptions, such as WS_EX_TRANSPARENT). While in WPF, overlap is almost always handled by overdraw. The Win32 way can be more efficient (though clipping itself can be expensive for complex geometries), while the WPF way allows for effects like transparency. Win32 generally enforces clipping (with a few hints from window styles) while WPF generally makes clipping something you have to explicitly request. To avoid this common class of airspace issues, we need to make WPF and Win32 play by the same overlap/clipping rules.

If we had a redirection model, we could make the Win32 content play by WPF’s rules by using WPF to compose its bitmap with the rest of the scene. Without a redirection model we need to make WPF play by Win32’s rules, which is what we will investigate here.

Win32 clipping occurs in the following situations:

  • Painting operations are clipped to the window being painted.
    This is always done for Win32 windows, but WPF actually rarely clips rendering instructions. This is because, in part, the “shape” of an element is not so easy to determine - and is often defined by what the element actually renders. So how can you know what to clip the rendering to? However, all elements have a Clip property that can be used to enforce clipping.
  • Painting operations are clipped to avoid children.
    This means that parent windows cannot (generally - see WS_CLIPCHILDREN) paint on top of their children. This is also the case in WPF - with the exception that rendering operations for a parent are actually rendered behind the children, rather than clipped. In WPF, AdornerLayers offer the functionality of rendering over children.
  • Painting operations are clipped to avoid overlapping siblings.
    This means that windows cannot (generally - see WS_CLIPSIBLINGS) paint on top of their siblings that are higher in z-order. This is also the case in WPF - with the exception that rendering operations for an element are actually rendered behind their siblings according to their z-order.
  • Painting operations are clipped to the visible regions of ancestors.
    This means that child windows cannot paint outside of their parent windows. Surprisingly, this is not a restriction generally enforced by WPF. Again, this is partly because the “shape” of an element is determined by what it renders - and what it renders is often the result of a subtree of simpler elements. However, a panel can set the ClipToBounds property to cause WPF to clip the children to its layout. Drawing outside of the bounds of ancestors is an important technique for various effects, such as transitions. There are several panels that will clip their children by default; ScrollViewer is an example where clipping is very important to the core functionality of the control, and airspace problems are particularly noticeable.

The “shape” of a window is either a simple rectangle, or defined by a region explicitly set on a window. This shape is used for clipping, as well as for input hit-testing. For example, regions are how top-level windows are given rounded corners (though this region is supplied by the system, not by application code). Note that layered windows - especially the type where the application provides a bitmap with an alpha channel - are slightly different in that the OS will ignore completely transparent pixels when hit-testing. This is not true for any other type of window.

One could imagine calculating enough information from WPF’s layout and rendering data to determine an appropriate shape for the visible part of a hosted child window, and setting a region to reflect that shape. However, much of what appears to be clipping in WPF due to overlapping content is actually just overdraw. WPF implements the painters algorithm (remember Win32/GDI normally implements the reverse painters algorithm), and is happy to overdraw content. This would have to be detected and accounted for to be a general solution. Another concern would be performance. WPF can animate the contents of the visual tree, and the effective shape of the visible region of the HwndHost could be constantly changing. I have not pursued this approach because it is complicated to extract the necessary information from WPF in a performant way. It might be an interesting exercise for the interested reader.

Child windows are always clipped to their parents. This could be used to add clipping behavior to a container, such as a ScrollViewer, that need to clip its contents to the viewport. ScrollViewer could be configured to use a child window as the viewport, and all of the content (including other child windows) would be placed within this intermediate child window and Win32 would naturally enforce the clipping.

Child windows are also clipped to their siblings, assuming the appropriate window styles are set. If we have two elements that are supposed to overlap each other, we could contain each one within their own intermediate child windows, and then the elements would clip properly against each other. Of course, the shape would be either a simple rectangle or an explicit region, but Win32 would use clipping handle the overlap. This would address the problems if there was a hosted child window within the subtrees, but it disables overdraw so effects like transparency are lost.

The key to this “intermediate child window” technique is to create a WPF control that will host a child window, and that will in turn host more WPF content within the child window. The contents of such a control would be clipped to the bounds of the child window, and it would clip properly when overlapped with other such controls. WPF actually provides the two major pieces of required functionality in the HwndHost and HwndSource classes; all we basically have to do is plug them together. HwndHost hosts a child window within an element tree, and HwndSource hosts an element tree within a window. Usually HwndSource is used to host a WPF element tree within a top-level window, but it works just as well for a child window.

The trick is keeping all of the normal WPF features working, so that the inner element tree appears to be fully connected to the outer element tree. We can’t hook together the visual tree, because the elements are not visually connected. However, WPF is designed to support most features between elements connected through the so-called “logical” tree. Once we hook up the logical tree, events route correctly and properties inherit as expected.

Input events are initiated from their corresponding window messages, such as WM_MOUSEMOVE or WM_KEYDOWN, which Win32 dispatches to particular windows based on who the mouse is over, or who has keyboard focus. Introducing a child window will cause some of these messages to be dispatched directly to the child window, rather than going to the top-level window. The HwndSource that we are using in the child window has all of the logic to respond to these messages and drive the WPF input system, and since the disjoint element trees are logically connected, the input events are routed through the tree as expected.

Keyboard input is handled differently from mouse input. In well-behaved applications, WPF reacts to keyboard events when the corresponding message is processed through the ComponentDispatcher. This is supposed to happen before a message is dispatched to any particular window; but that depends on the cooperation of the application’s message pump. Top-level WPF windows will respond to keyboard events from the ComponentDispatcher when focus is anywhere within them, and convert the messages into calls through IKeyboardInputSink. HwndHost has a stub implementation of this interface, but it is the responsibility of the class that derives from HwndHost to provide the full implementation. Conveniently, HwndSource also implements this interface, so once again all we have to do is plug the two together. There is one interesting wrinkle, though. The IKeyboardInputSync interface on HwndHost is called in response to keyboard events routing to it, and HwndSource will raise new keyboard events when its IKeyboardInputSync is called. The problem is that once we logically connect the inner element tree to the outer element tree, the keyboard event routes are unified. This means that the keyboard event raised by the inner HwndSource will actually route through the outer HwndHost, which will then call into IKeyboardInputSync again, creating an infinite loop. We work around this strange situation by temporarily severing the logical tree during the calls to IKeyboardInputSync.

In the sample code accompanying this article, this “intermediate child window” technique is implemented in a class called HwndSourceHost.

Mitigating Z-Order Issues

Above we showed that the problem of overlapping and clipping WPF content along with Win32 content could be mitigated by introducing intermediate child windows. However, the z-order of these windows still needs to be kept in sync with WPF’s sense of z-order.

When a WPF panel allows its children to overlap, the “z-order” of the the children becomes important, as this determines which children are in front of others. WPF determines the z-order of visuals it is rendering by the order they are returned from GetVisualChild. Calls to GetVisualChild are often just passed through to a controls underlying UIElementCollection, so adding a removing elements from the UIElementCollection can obviously impact the z-order of the elements.

When an HwndHost is first connected to an element tree that is rooted beneath an HwndSource, the HwndHost is asked to provide the hosted window. If the HwndHost is removed from the tree, the existing window is not destroyed, it is simply reparented under a non-visible window; and when the HwndHost is added to another element tree, that window is simply reparented under the new HwndSource. When reparenting a child window, it is placed on top of the existing children, which may not correctly reflect the order of the WPF elements in the collection.

WPF panels also support a more efficient mechanism of using an indirection index to map the index of an element in the collection to the appropriate visual index. The Panel.ZIndex property is the public exposure of this indirect indexing feature. Simply setting the ZIndex property will cause the panel to adjust the order it returns the children from GetVisualChild. It is probably possible to monitor changes to this property, but it seems incomplete since panels are free to adjust the order they return elements from GetVisualChild however they please.

Since it does not seem practical to automatically detect changes that might influence the z-order of the elements in a panel, we rely on the application knowing when the z-order of possibly overlapping HwndSourceHost elements have changed. The application can then call the static method HwndSourceHost.UpdateZOrder passing in the panel in which the HwndSourceHost elements are being overlapped. This static method simply enumerates the children by calling GetVisualChild and corrects the Win32 z-order to match. The order of elements returned by GetVisualChild is necessarily the proper z-order, so this is correct, if a slight burden on the application.

Redirection Revisited

The mitigations presented so far have been pretty straight-forward, but still limit the fidelity of composing hosted Win32 windows within WPF element trees. In fact, we have further constrained the regular WPF portions of the tree to align with the clipping rules of Win32. Effects that require overdraw, such as transparency, are no longer possible. Effects that require rasterizing through an intermediate surface, such as a visual brush or shader effects, are also broken. To provide for those features, we need to use some form of redirection instead.

In this section, we turn our attention to implementing a simple form of redirection. We will handle output redirection for GDI - and DirectX on Windows 7 and above. We will also handle input redirection for the mouse; recall that keyboard input does not require redirection at all since it is not dependent on the rendered appearance of the elements. We will not be using any form of API interception; instead we will be using the fully supported features of redirected layered windows. The penalty we will pay is primarily in performance; so make sure you investigate the impact on the performance of your scenario on your target hardware.

The basic idea of this technique is to place each child window we want to redirect within a separate top-level redirected layered window that has its transparency set to 0. This is a standard top-level window with the WS_EX_LAYERED style, and the transparency has been configured with a call to SetLayeredWindowAttributes. It is important to note that this is an officially supported type of window, we are not coercing the system into some unusual state. This top-level window is transparent so that it is not visible on the screen. Since the transparency of this window could change at any time, Win32 continues to redirect and retain the contents of this top-level window even when it is completely transparent. That is the most important feature we are using - that the window is painting, and the contents are captured into a bitmap, even though the window is not visible.

To display the hosted window in WPF, we copy the content from the invisible top-level window into a bitmap, and display that bitmap as a regular element in the WPF element tree. This makes complete composition possible - overdraw, clips, transforms, shader effects, etc. Even though the top-level window is invisible, Win32 allows access to the content by either calling PrintWindow or just blitting from the GDI device context for the top-level window. Ideally we would copy the contents into a bitmap only when the contents actually change. Unfortunately, I have not found a technique to reliably detect this for both GDI and DirectX content. Instead, we have to resort to updating the bitmap on a timer. As you can imagine, this results in unnecessary work and contributes to the performance penalty of this technique.

Copying the contents of the bitmap works great for GDI content, but DirectX content is more problematic. In Windows XP, redirected windows did not redirect DirectX content at all. Starting with windows Vista, the DWM (when it is enabled, of course) assumes responsibility for managing the surfaces backing top-level windows. Vista uses the “VistaBlt” mode, in which GDI and DirectX are redirected into separate surfaces, and then combined together to be presented. It is not until Windows 7, with the introduction of device bitmaps, that GDI and DirectX content are redirected into the same surface. GDI methods like PrintWindow, or reading from the GDI device context directly, access the surface that contains the GDI content. Only with device bitmaps (Windows 7 and above) will our technique work for DirectX content. This is important because if you are actually hosting some WPF content within your redirected child window, WPF will render with DirectX unless you disable hardware acceleration. Of course, DirectX content can come from other technologies than WPF; such as XNA, Direct2D, Windows Media Player, etc. Also recall that DirectX surfaces presented with D3DSWAPEFFECT_FLIPEX are handled specially by the DWM; they are not actually combined into the redirected surface but rather are composed on top when the DWM presents the window. These surfaces are not accessible to our redirection technique, so you will have to either configure the component you are hosting to disable this mode or fall back to the clipping techniques discussed previously. One common component that uses D3DSWAPEFFECT_FLIPEX is the Windows Media Player, and I am not aware of any way to disable it. It is very important that you consider all of the requirements you need to satisfy carefully when using this technique, and test thoroughly to avoid being caught by surprise by some limitation.

For mouse input redirection we use the basic technique of aligning the appropriate point of the child window under the mouse by repositioning the corresponding top-level layered window that contains it. We perform a hit-test in WPF to the image of the hosted window that we are rendering. The image has the same dimensions as the hosted window, and so the offset into the image gives us the appropriate position within the hosted child window that needs to be under the mouse. A little bit of simple math and coordinate transforms can calculate where the containing top-level layered window needs to be. We then move the window, z-order it to the top, and change the transparency to 1 out of 256. This tiny fraction of visibility is enough that Win32 will pass it mouse input, but transparent enough that it is not detectable to the human eye (as far as I can tell). Note that moving around a layered window, and even changing its transparency, is fairly inexpensive since the application does not need to repaint the contents.

Because we are positioning the windows so that mouse input will get to the right destination, it might be appropriate to use a mouse hook. However, that complicates things if you want to host child windows from other processes (such as a WebBrowser containing an XBAP). For simplicity, and because we are already using a timer for updating the bitmap, we also use a timer to synchronize the position and transparency of the top-level layered windows. This is not perfect, but it works reasonably well. I will leave the implementation of a mouse hook to the interested reader.

In the sample code accompanying this article, this technique is implemented in the RedirectedHwndHost class.

MDI Demo

Many people have noted that WPF does not support the Multiple Document Interface (MDI) style of user interfaces out-of-the-box. The reason for this is the age-old rationalization that we believed our time was better spent working on other features. At the time, MDI user interfaces were being deemphasized in favor of other options like tabbed interfaces. MSDN even has this discouraging language on its section on Multiple Document Interfaces:

Many new and intermediate users find it difficult to learn to use MDI applications. Therefore, you should consider other models for your user interface. However, you can use MDI for applications which do not easily fit into an existing model.

We posted our position on the matter way back in 2005. However, MDI continues to be a very popular user interface model in certain types of applications. Since WPF is a very powerful platform, it is fairly straightforward to roll your own MDI solution. Indeed, there are already many MDI solutions for WPF provided by third parties. Some are very simple technology walkthroughs, while others are professional-quality offerings. A quick Bing search shows some of what is available (this is not an exhaustive list!):

Many of these solutions suffer from the airspace issues discussed in this article, so I thought it would be instructive to show how to incorporate the various mitigation techniques in an MDI application. In the demo application on http://microsoftdwayneneed.codeplex.com, I have demonstrated building a proprietary MDI-ish solution primarily because I do not want to endorse any particular third-party offering. While the guidance we are providing is designed to assist all developers in mitigating the airspace issues in WPF applications, it should be of particular use to developers of MDI solutions, and to enterprise developers who are willing to roll their own solutions.

The demo application lets you interactively adjust many settings to turn on/off the various of the techniques discussed in this article.  However, it is fair to say that there are still some significant problems, especially in the redirection technique.  One particularly vexing problem is that some sequence causes menus to no longer respond to the mouse.  I would normally hold off publishing this whitepaper until I had resolved the issue.  However, I think the bulk of the content of this post will be of interest (maybe even use) to some of you, and don’t want to hold up publishing indefinitely.  It would be great if sufficiently interested and motivated readers would take the information I have presented and run with it.