Windows System Software -- Consulting, Training, Development -- Engineering Excellent, Every Time.

Load or Unload

An interesting crash we have seen relate to a scenario that involves unloading the driver at the same time it is being loaded. In this article, we’ll analyze what we saw, how we reached the conclusion we reached and the remedial steps we used to attempt to mitigate against this particular problem.

The Crash

We were recently given a crash dump from a system that had been under test with a file system filter driver that performs isolation – that is, it controls the cache and uses shadow file objects to distinguish between the resources that it controls and the resources that belong to the underlying file system (typically NTFS).

Analyzing the crash with WinDBG, we found two interesting threads.  Here’s the first:

        THREAD fffffa80018b8b50  Cid 0004.0044  Teb: 0000000000000000
Win32Thread: 0000000000000000 RUNNING on processor 2
        Not impersonating
        DeviceMap                 fffff8a0000060f0
        Owning Process            fffffa8001844840       Image:         System
        Attached Process          N/A            Image:         N/A
        Wait Start TickCount      264493         Ticks: 1 (0:00:00:00.015)
        Context Switch Count      60091          IdealProcessor: 2
        UserTime                  00:00:00.000
        KernelTime                00:00:01.263
        Win32 Start Address nt!ExpWorkerThread (0xfffff80002ad8530)
        Stack Init fffff88003195db0 Current fffff88003195230
        Base fffff88003196000 Limit fffff88003190000 Call 0
        Priority 13 BasePriority 12 UnusualBoost 0 ForegroundBoost 0
IoPriority 2 PagePriority 5
        Child-SP          RetAddr           Call Site
        fffff880`03194fc0 fffff800`02db4c57 nt!ObLogSecurityDescriptor+0x50
        fffff880`03195030 fffff800`02db6057 nt!SeDefaultObjectMethod+0x57
        fffff880`03195080 fffff800`02db4ee2 nt!ObpAssignSecurity+0xc7
        fffff880`031950f0 fffff800`02db76ff nt!ObInsertObjectEx+0x1e2
        fffff880`03195340 fffff800`02db6b06 nt!PspInsertThread+0x2f3
        fffff880`031954c0 fffff800`02d65da5 nt!PspCreateThread+0x246
        fffff880`03195740 fffff880`0799b3e2 nt!PsCreateSystemThread+0x125
        fffff880`03195830 fffff880`0799b6d8
Driver!SetupReadWorkQueue+0xe2
[x:\driver\isolate\workerqueue.cpp @ 125]
        fffff880`03195890 fffff880`0799c34a
Driver!SetupWorkerQueues+0x22c
[x:\driver\isolate\workerqueue.cpp @ 333]
        fffff880`03195970 fffff800`02eb32c7
Driver!DriverEntry+0x72 [x:\driver\isolate\driver.cpp @ 43]
        fffff880`031959a0 fffff800`02eb36c5 nt!IopLoadDriver+0xa07
        fffff880`03195c70 fffff800`02ad8641 nt!IopLoadUnloadDriver+0x55
        fffff880`03195cb0 fffff800`02d65e5a nt!ExpWorkerThread+0x111
        fffff880`03195d40 fffff800`02abfd26 nt!PspSystemThreadStartup+0x5a
        fffff880`03195d80 00000000`00000000 nt!KiStartSystemThread+0x16

This is the driver entry thread.  It is actually setting up various global resources – in this case it is in the middle of creating a work queue for a custom queue package that runs in this driver.

Here is the second thread:

THREAD fffffa80018b9b50  Cid 0004.0038  Teb: 0000000000000000
Win32Thread: 0000000000000000 RUNNING on processor 1
        Not impersonating
        DeviceMap                 fffff8a0000060f0
        Owning Process            fffffa8001844840       Image:         System
        Attached Process          N/A            Image:         N/A
        Wait Start TickCount      264493         Ticks: 1 (0:00:00:00.015)
        Context Switch Count      52067          IdealProcessor: 2
        UserTime                  00:00:00.000
        KernelTime                00:00:01.357
        Win32 Start Address nt!ExpWorkerThread (0xfffff80002ad8530)
        Stack Init fffff88003180db0 Current fffff880031809e0
        Base fffff88003181000 Limit fffff8800317b000 Call 0
        Priority 13 BasePriority 12 UnusualBoost 1 ForegroundBoost 0
IoPriority 2 PagePriority 5
        Child-SP          RetAddr           Call Site
        fffff880`0317f7e8 fffff800`02e391c4 nt!KeBugCheckEx
        fffff880`0317f7f0 fffff800`02df405d
nt!PspUnhandledExceptionInSystemThread+0x24
        fffff880`0317f830 fffff800`02afa06c nt! ?? ::NNGAKEGL::`string'+0x227d
        fffff880`0317f860 fffff800`02af9aed nt!_C_specific_handler+0x8c
        fffff880`0317f8d0 fffff800`02af88c5
nt!RtlpExecuteHandlerForException+0xd
        fffff880`0317f900 fffff800`02b09851 nt!RtlDispatchException+0x415
        fffff880`0317ffe0 fffff800`02ace642 nt!KiDispatchException+0x135
        fffff880`03180680 fffff800`02acd1ba nt!KiExceptionDispatch+0xc2
        fffff880`03180860 fffff880`0799b724 nt!KiPageFault+0x23a
(TrapFrame @ fffff880`03180860)
        fffff880`031809f0 fffff880`0799c477
Driver! StopWorkerQueues+0x14
[x:\driver\isolate\workerqueue.cpp @ 351]
        fffff880`03180a20 fffff880`010fae09
Driver!UnloadCallback+0xd3 [x:\driver\isolate\driver.cpp
@ 76]
        fffff880`03180a80 fffff880`010f9dcd fltmgr!FltpDoUnloadFilter+0xf9
        fffff880`03180c70 fffff800`02ad8641 fltmgr!FltpSyncOpWorker+0x2d
        fffff880`03180cb0 fffff800`02d65e5a nt!ExpWorkerThread+0x111
        fffff880`03180d40 fffff800`02abfd26 nt!PspSystemThreadStartup+0x5a
        fffff880`03180d80 00000000`00000000 nt!KiStartSystemThread+0x16

This is a thread that is unloading the driver.

Upon seeing this we note that the driver load and unload are supposed to be serialized against one another by the operating system, as there is no way for a driver to protect against this scenario.  It really does require external serialization to properly prevent this.

We did a bit of research and confirmed with our friends in Redmond that this problem is a known issue – and fixed in Windows 8.  Unfortunately the system under test (and the customer solution itself) still requires support for Windows XP as the primary platform, and Windows 7 as the secondary platform.  Windows 8 is not even on the customer’s radar yet.

Solutions to Consider

One approach to handling this issue pre-Windows 8, could involve building a multi-driver system.  The first driver would be responsible for starting the second driver in a serialized fashion.  Driver 1 would load Driver2 via ZwLoadDriver.  When this function returned successfully, Driver 1 would then call Driver 2 (via an IOCTL, FSCTL or export function) to actually perform the registration as a mini-filter.

Driver 2’s Unload routine would call back to Driver 1 to ensure that the registration call had completed successfully by serializing with an EVENT object in Driver 1.  Thus, this would ensure strict correct ordering between the two.  The only purpose for Driver 1 would be to avoid this narrow race condition.

Another potential approach that we considered was to have the DriverEntry function create a device object.  In the Unload routine, we can look at the Flags field of the device object to see if the DO_DEVICE_INITIALIZING bit has been cleared.  If it has not, then we know that there is still a risk that DriverEntry has not yet exited and we should sleep and then check again.

This relies upon the fact that the I/O Manager actually clears this bit after DriverEntry returns.

Note   It is not necessary to clear the DO_DEVICE_INITIALIZING flag on device objects that are created in DriverEntry, because this is done automatically by the I/O Manager. However, your driver should clear this flag on all other device objects that it creates.

Source: http://msdn.microsoft.com/en-ca/library/windows/hardware/ff539265(v=vs.85).aspx (Last Accessed August 2, 2013.)

Mitigation

Building a two driver system to protect against a very narrow race condition might be overkill in a situation like this.  So rather than an outright solution, what can we do to at least minimize the window in which DriverEntry could still be running?

The simplest thing we can do is make sure the driver does filter registration as its last step – after setting up all of its other internal data structures and queues.  This doesn’t entirely prevent the crash, but it minimizes the window even further.  This is ultimately the approach the owner of the driver took to solve the problem.

However, if that hadn’t been enough, we proposed using a global driver event and set it at the end of DriverEntry.  Then have the Unload wait on that event and afterwards pause for some period of time.  This wouldn’t entirely prevent the race condition but at least it would further minimize the window in which it could occur.  Thus, a short (few seconds) delay is likely to be sufficient in most production environments.

Conclusions

Since observing this particular crash, we have followed this structure for our own mini-filters: we do registration at the end of our Driver Entry function.  By doing so, we minimize the likelihood of the crash happening.

We have not explored the potential solutions or mitigations that we proposed, but we offer them to our readers for consideration in the event they need to further mitigate against this problem.

Summary
Article Name
Load or Unload
Description
Summary: a crash analysis leads to an observation and change in mini-filter registration
Author