Hi, my name is Eliyas Yakub. After working on the Windows Driver Framework product for about 6 years as an SDE and development lead, I recently took up the development lead responsibility for bus connectivity technologies such as USB and 1394.  This is my new team blog where I and my team members from all disciplines (Dev, QA, PM) will try to provide useful insight about technologies we own, insight into the working of core drivers for these technologies, common driver and hardware issues we encounter, and general client driver development tips and tricks.

In this article, I want to discuss why drivers fail to unload. Failure to unload a driver leads to poor user-experience because the system needs to be rebooted either to update the driver or to make the device functional if the user plugs the device again. At the bottom of this article, I have pointer to other articles that describe how to avoid system reboots. Here, I'm want to provide insight into how the OS unloads a driver and what would prevent a driver from unloading.

Developers generally assume that if the DriverUnload routine is called then the driver-unload process is complete and driver should have unloaded. This is not true. Invocation of DriverUnload routine and the unloading of driver-image from memory is a two separate process governed by different factors in the system. In the first step, the system makes an attempt to unload the driver by calling the registered DriverObject->DriverUnload routine. When the DriverUnload routine returns, the system drops the final reference on the driverobject to trigger unloading of driver-image from memory.

So for a driver image to unload two things need to happen:

  1. Some entity in the system (service control manager (SC), I/O or PnP-manager subsystem or another driver in kernel-mode) needs to first make an attempt to unload the driver. When that happens and if there are no open handles to any of the deviceobjects created by the driver, the system calls the registered DriverUnload routine.
  2. When all the deviceobjects are deleted and the reference count of the driverobject drops to zero, the memory-manager then unloads the image from memory.

So deleting deviceobjects is key to get getting driver-unload to succeed. To understand the mistakes that prevent driver unload, let me categorize device drivers, based on when and how deviceobjects are created, into 3 types:  legacy driver, pure pnp driver, and hybrid driver:

  • In the case of legacy driver, the deviceobjects are created either in the DriverEntry routine or in an ioctl handler. These deviceobjects are deleted either in the ioctl handler or finally in the DriverUnload routine. This is how original NT drivers were written before the invention of PnP.
  • In the case of pure pnp-driver, the driver creates a deviceobject in its AddDevice routine for every new instance of a physical device, and deletes the object before it completes the IRP_MN_REMOVE_DEVICE Irp (aka remove request).
  • In the hybrid case, the driver creates a deviceobject for every instance of the physical device but then it also creates one or more deviceobjects on the side. We call these deviceobjects control or sideband objects. There are used to have sideband communication from application directly with the driver - bypassing all the other driver stacks attached above. An example of hybrid driver is pnpdtest filter driver (availabe in the WDK and WLK) used to test pnp-drivers where the test apps sends ioctl commands directly to the filter layered in the stack to drive the test.

Let us understand how these 3 types of drivers unload:

  • If the driver is a legacy driver then a client of the driver (typically user-mode process calling SC) must explicitly make an attempt to unload the driver. If there are no open handles to any deviceobjects created by the driver, then the system will call driver's DriverUnload routine to get the driver to delete all the deviceobjects. After the unload routine returns, system will drop the reference it has taken on the driverobject. If there are no more references on the driverobject then the driver image gets unloaded from memory.
  • If the driver is a pnp driver then every time a device is removed (removal of a device is triggered by unplugging the device, disabling the device in the device manager) and if there are no handles to the device, the pnp-manager will send remove request to the stack. When that request completes, the pnp-manager will check to see if there are still deviceobjects dangling from driverobject. If there aren't any then it assumes all the devices are removed and makes an attempt to unload the driver. The driverobject could have other deviceobjects if there are other instances of the devices serviced by this driver or the deviceobject that was deleted has extra, probably leaked references on it. In the case of pnp-driver, the unload routine doesn't have any device-object to delete - unlike legacy driver.  After the unload routine returns, if there are no more references on  the driverobject then the driver image gets unloaded from memory.
  • Since a hybrid driver is also a pnp-driver, when the pnp-stack is removed as described above, the pnp manager will look to see if there is any dangling deviceobject. If the control-deviceobjects aren't deleted prior to completing the remove request, there will be one or more dangling deviceobjects. As a result, the pnp manager will skip the attempt to unload the driver. This will lead to a wedged (stuck) driver in image. This driver cannot be ever unloaded again - even from usermode using SC. The reason for that is that SC doesn't call unload if the driver is a pnp-driver. It figures out a driver is pnp or not by checking whether AddDevice routine is registered. This is one of the most common driver unload issues.

Quite a number of you write such hybrid drivers, especially in stacks where the upper drivers don't allow unknown ioctls to pass-down (e.g keyboard class driver). Windows 7 WDK contains a filter sample (both in WDM and KMDF) as part of toaster package that shows to safely handle this situation. Since KMDF has made it easy to write a bus driver, we suggest even a simpler solution to have sideband communication. That is by enumerating a child-device for every pnp-device your driver attaches to. This enables you to have to one-to-one sideband communication relationship with the pnp device. This technique is also illustrated in another sample (kbfiltr) in the WDK.

Application, services and other kernel-mode clients failing to close handle to the device when the device is being removed is another common source of driver unload problem. Whenever a client opens a handle to a pnp device, it should register for pnp notification and close the handle when the system notifies the device is being removed. There are samples in the WDK that shows how to do from an application, service and a device driver. 

  1. Notify sample (src\general\toaster\exe\notify) demonstrates how to write a pnp-aware application.
  2. Toastmon sample, found under toaster\wdm and toaster\kmdf folders, shows how to be pnp-aware driver.
  3. I didn't find sample on how to do this in a service either in WDK or SDK. There is a brief description of this in "Programming the Windows Driver Model" book by Walter Oney.

Let me briefly touch upon some debugging techniques to root-cause driver unload problem. 

  1. !devhandle <deviceobject address>  - this command walks through every process handle table to find out whether any file handle refers to this deviceobect.
  2. To tracked leaked references on the driver and deviceobject, you can follow the technique outlined in the WinDBG documentation, "Advanced Debugging" chapter - Debugging a Failed Driver Unload.
  3. !obtrace  <object address> - this new command shows stacks traces recorded by the system when a reference is added or removed. You need to enable object tracing using gflags utility. Read this section to learn more about how to use gflags.exe. For example, to use gflags.exe for a device object with permanent traces enabled do (leave out the -p if you don't want permanent traces - these stay in memory until the next reboot so use temporary traces if possible): gflags -ko -t Devi -p

Summary

List of primary reasons for a driver to fail to unload:

  1. Application, service or another driver in the system fails (doesn't close in a timely manner) to close the handle to a device.
  2. Driver fails to delete the deviceobject.
  3. Driver leaks references on the deviceobject or driverobject.

List of rules to remember when trouble shooting a driver unload issue:

  1. Driver must have registered an unload routine.
  2. All handles must be closed before system can make an attempt to unload a driver.
  3. All deviceobject must be deleted and extra references dropped before unload-routine of a pnp-driver can be called.
  4. The fact that unload-routine is called doesn't mean the driver unload is successful.
  5. For a driver to be unloaded from memory, there shouldn't be any dangling references to driverobject. There are DDIs that take explicit reference (e.g IoRegisterPlugAndPlayNotification) on the driver-object. If you forget to call the complementary function, you driver-image will be stuck even though the unload routine was called.

 More information to avoid reboots during driver install: