Have you ever looked at the work item APIs and wondered why there are two different types of work items? Or for that matter, why are there so many work item APIs? As Paul wrote last week, the work item API set has grown for Vista. Today I will try to explain how we got into this state.

Up until Windows 2000, there was only one type of work item, WORK_QUEUE_ITEM. You could embed the work item structure in your own structure and it was quite simple to use. All you to do is call ExQueueWorkItem() and you were done. There was one glaring problem with the way WORK_QUEUE_ITEMs worked.

    You could not safely unload a driver which had queued a work item.
Safe unload is not possible with this type of work item because there is no outstanding reference on your device or driver object. A reference on your device or driver object will keep your driver's image from unloading. Since there is no reference on eithe robject, the image can be unloaded before the work item has run or while the work item is executing. But what if you added your own reference and then released it when the work item ended?

For instance, if you had code that did something like this:

    typedef struct _MY_WORK_ITEM {
        WORK_QUEUE_ITEM WorkItem;
        PDEVICE_OBJECT DeviceObject;
    } MY_WORK_ITEM, *PMY_WORK_ITEM;

    NTSTATUS QueueWorkItem(PDEVICE_OBJECT DeviceObject)
    {
        PMY_WORK_ITEM pItem;

        pItem = (PMY_WORK_ITEM) ExAllocatePoolWithTag(NonPagedPool, sizeof(MY_WORK_ITEM), tag);
        if (pItem == NULL) {
            return STATUS_INSUFFICIENT_RESOURCES;
        }

        ExInitializeWorkItem(&pItem->WorkItem, WorkItemRoutine, pItem);
        pItem->DeviceObject = DeviceObject;
        ObReferenceObject(DeviceObject);
        ExQueueWorkItem(&pItem->WorkItem, DelayedWorkQueue);

        return STATUS_SUCCESS;
    }

    VOID WorkItemRoutine(PVOID Context)
    {
        PMY_WORK_ITEM pItem = (PMY_WORK_ITEM) Context;
        PDEVICE_OBJECT pDevice = pItem->DeviceObject;

        // ... do work ...

        ExFreePool(pItem);
        ObDereferenceObject(pDevice);
    }
The problem is that there is still code execute to execute after the ObDereferenceObject(pDevice); and the ending } as seen in this disassembly, so there is still a short window of time where your driver could be unloaded while your driver is still executing code.
    0:000> u WorkItemRoutine+0x23
    WorkItemRoutine+0x23

    // Put the parameter into ecx and call ObDeferenceObject
    000843e3 8b4dfc          mov     ecx,dword ptr [ebp-4]
    000843e6 ff1564a00a00    call    dword ptr [wdf01000!_imp_ObfDereferenceObject (000aa064)]

    // We still have to execute this code to return to the caller!  It is during
    // these 3 instructions that the driver can unload
    000843ec 8be5            mov     esp,ebp
    000843ee 5d              pop     ebp
    000843ef c20400          ret     4

To address this problem a new work item type, PIO_WORKITEM, was added. If the management of the reference was taken care of for the driver in another module, the driver would not have this problem anymore. This is exactly what PIO_WORKITEM and IoQueueWorkItem() does. Upon queueing the work item, the I/O manager takes a reference on the device object and then releases it after the work item routine returns back to the I/O manager. This means that all of your driver's work item code runs while the reference is held, including the code to return to the caller and it is now possible to safely unload a driver using this new work item type.

So, the problem is solved right? Well, technically yes, but the new PIO_WORKITEM type introducted a regression of sorts. The actual size of the IO_WORKITEM structure is not exposed publicly which means you can longer embed a work item structure in your own structure. This results in having to allocate a context and to allocate the work item separately. This introduces another point of failure and makes the initialization and destroy code more complex. Here is the previous code snippet modified to use the new work item type:

    typedef struct _MY_WORK_ITEM {
        PIO_WORKITEM WorkItem;
        // ...other context fields...
    } MY_WORK_ITEM, *PMY_WORK_ITEM;

    NTSTATUS QueueWorkItem(PDEVICE_OBJECT DeviceObject)
    {
        PMY_WORK_ITEM pItem;

        pItem = (PMY_WORK_ITEM) ExAllocatePoolWithTag(NonPagedPool, sizeof(MY_WORK_ITEM), tag);
        if (pItem == NULL) {
            return STATUS_INSUFFICIENT_RESOURCES;
        }

        pItem->WorkItem = IoAllocateWorkItem(DeviceObject);
        if (pItem->WorkItem == NULL) {
            ExFreePool(pItem);
            return STATUS_INSUFFICIENT_RESOURCES;
        }

        // ...initialize the rest of pItem...
        IoQueueWorkItem(pItem->WorkItem, IoWorkItemRoutine, DelayedWorkQueue, pItem);

        return STATUS_SUCCESS;
    }

    VOID IoWorkItemRoutine(PDEVICE_OBJECT DeviceObject, PVOID Context)
    {
        PMY_WORK_ITEM pItem = (PMY_WORK_ITEM) Context;

        // ... do work ...

        IoFreeWorkItem(pItem->WorkItem);
        ExFreePool(pItem);
    }
To address the embedded work item "regresssion, Vista introduced IoSizeofWorkItem() (which you can read about in Paul's article which I referenced at the top of this entry). In conclusion, it is not hard to see why there are two different types of work items and so many work item APIs in WDM. The problem set has grown over time and the OS has evolved to solve those problems.