Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results

Load or Unload

An interesting crash we have seen relate to a scenario that involves unloading the driver at the same time it is being loaded. In this article, we’ll analyze what we saw, how we reached the conclusion we reached and the remedial steps we used to attempt to mitigate against this particular problem.

The Crash

We were recently given a crash dump from a system that had been under test with a file system filter driver that performs isolation – that is, it controls the cache and uses shadow file objects to distinguish between the resources that it controls and the resources that belong to the underlying file system (typically NTFS).

Analyzing the crash with WinDBG, we found two interesting threads.  Here’s the first:

This is the driver entry thread.  It is actually setting up various global resources – in this case it is in the middle of creating a work queue for a custom queue package that runs in this driver.

Here is the second thread:

This is a thread that is unloading the driver.

Upon seeing this we note that the driver load and unload are supposed to be serialized against one another by the operating system, as there is no way for a driver to protect against this scenario.  It really does require external serialization to properly prevent this.

We did a bit of research and confirmed with our friends in Redmond that this problem is a known issue – and fixed in Windows 8.  Unfortunately the system under test (and the customer solution itself) still requires support for Windows XP as the primary platform, and Windows 7 as the secondary platform.  Windows 8 is not even on the customer’s radar yet.

Solutions to Consider

One approach to handling this issue pre-Windows 8, could involve building a multi-driver system.  The first driver would be responsible for starting the second driver in a serialized fashion.  Driver 1 would load Driver2 via ZwLoadDriver.  When this function returned successfully, Driver 1 would then call Driver 2 (via an IOCTL, FSCTL or export function) to actually perform the registration as a mini-filter.

Driver 2’s Unload routine would call back to Driver 1 to ensure that the registration call had completed successfully by serializing with an EVENT object in Driver 1.  Thus, this would ensure strict correct ordering between the two.  The only purpose for Driver 1 would be to avoid this narrow race condition.

Another potential approach that we considered was to have the DriverEntry function create a device object.  In the Unload routine, we can look at the Flags field of the device object to see if the DO_DEVICE_INITIALIZING bit has been cleared.  If it has not, then we know that there is still a risk that DriverEntry has not yet exited and we should sleep and then check again.

This relies upon the fact that the I/O Manager actually clears this bit after DriverEntry returns.

Note   It is not necessary to clear the DO_DEVICE_INITIALIZING flag on device objects that are created in DriverEntry, because this is done automatically by the I/O Manager. However, your driver should clear this flag on all other device objects that it creates.

Source: http://msdn.microsoft.com/en-ca/library/windows/hardware/ff539265(v=vs.85).aspx (Last Accessed August 2, 2013.)

Mitigation

Building a two driver system to protect against a very narrow race condition might be overkill in a situation like this.  So rather than an outright solution, what can we do to at least minimize the window in which DriverEntry could still be running?

The simplest thing we can do is make sure the driver does filter registration as its last step – after setting up all of its other internal data structures and queues.  This doesn’t entirely prevent the crash, but it minimizes the window even further.  This is ultimately the approach the owner of the driver took to solve the problem.

However, if that hadn’t been enough, we proposed using a global driver event and set it at the end of DriverEntry.  Then have the Unload wait on that event and afterwards pause for some period of time.  This wouldn’t entirely prevent the race condition but at least it would further minimize the window in which it could occur.  Thus, a short (few seconds) delay is likely to be sufficient in most production environments.

Conclusions

Since observing this particular crash, we have followed this structure for our own mini-filters: we do registration at the end of our Driver Entry function.  By doing so, we minimize the likelihood of the crash happening.

We have not explored the potential solutions or mitigations that we proposed, but we offer them to our readers for consideration in the event they need to further mitigate against this problem.

Summary
Article Name
Load or Unload
Description
Summary: a crash analysis leads to an observation and change in mini-filter registration
Author