This post was jointly authored by SNoone and PeterGV
We’re constantly learning the subtle details of how KMDF works. We came across an interesting detail today that caused us to scratch our heads to the point we had to ask our friends on the WDF development team what was going on. Maybe this will help you as some point, too.
We were using a series of Queues to sort work, based on the IOCTL Control Code. When the Default Queue (which we’ll refer to as Queue D) gets a Request, it presents that Request to our EvtIoDeviceControl Event Processing Callback. Based on the Control Code of the IOCTL, we forward the Request (using WdfRequestForwardToIoQueue) to another Queue of that we select (we’ll call this Queue S).
Simple, right?
Now, down in Queue S’s EvtIoDeviceControl Event Processing Callback, we sometimes decide that we want to stop Queue D and cancel all the Requests on the Queue. So we callWdfIoQueuePurgeSynchronously, specifying Queue D.
Also simple, right?
Wrong! Every once in a while, the call to WdfIoQueuePurgeSynchronously hung. Using the debugger extensions, we noticed that Queue D and Queue S are both listed as having one driver owned Request outstanding:
Dumping queues of WDFDEVICE 0x79e1c4b8
=====================================
Number of queues: 4
----------------------------------
Queue: 1 !wdfqueue 0x79ebdfb0 Parallel, Not power-managed, PowerOn, Cannot accept, Can dispatch, Dispatching, ExecutionLevelDispatch, SynchronizationScopeNone Number of driver owned requests: 1 Power transition in progress Number of waiting requests: 0
EvtIoRead: (0xf75086d0) Log!LogControlDeviceEvtIoRead EvtIoDeviceControl: (0xf750ae20) Log!LogControlDeviceEvtIoDeviceControl EvtIoPurgeComplete: (0xf72d044e) wdf01000!FxIoQueue::_IdleComplete EvtIoStop: (0xf750a590) Log!LogControlDeviceEvtIoStop
----------------------------------
Queue: 2 !wdfqueue 0x79ebd2a8 Sequential, Not power-managed, PowerOn, Passive Only, Can accept, Can dispatch, Dispatching, ExecutionLevelPassive, SynchronizationScopeNone Number of driver owned requests: 1 !wdfrequest 0x7a251c48 !irp 0x866a2f68 Number of waiting requests: 0
EvtIoDeviceControl: (0xf750aef0) Log!LogControlDeviceEvtIoDeviceControlSequential
Notice how the first Queue listed (with the handle 0x79ebdfb0), which is Queue D, doesn’t shown the handle of the driver owned Request? Notice how the second Queue listed (with the handle 0x79ebd2a8) DOES show a handle for the outstanding Request? This is a hint.
We were befuddled to the point that we had to ask the WDF dev lead to give us a clue. He basically said what we were seeing is to be anticipated. He said you can NEVER callWdfIoQueuePurgeSynchronously from an EvtIoDeviceControl Event Processing Callback. He pointed us to the docs for this function which read:
“Do not call WdfIoQueuePurgeSynchronously from the following queue object event callback functions:
EvtIoDefault EvtIoDeviceControl EvtIoInternalDeviceControl EvtIoRead EvtIoWrite”
http://msdn.microsoft.com/en-us/library/windows/hardware/ff548449(v=vs.85).aspx
We’ve always read this doc passage to mean:
“Do not call WdfIoQueuePurgeSynchronously from the following queue object event callback functions associated with the Queue being purged:”
And in that context, the restriction made sense. However, this function can never be called from ANY of the listed event processing callbacks, regardless of the Queue with which that Event Processing Callback is associated.
This restriction is aimed directly at the case we were experiencing. What happened was that when I was presented the Request from Queue D, and while in Queue D’s event processing callback I forward that Request to Queue S. This HAPPENS to synchronously result in dispatching of the Request from Queue S and while in Queue S’s event processing callback we attempted to Purge Queue D. By matter of coincidence, we’re also in Queue D’s event processing callback… so a deadlock occurs!
So… there you have it. Never call WdfIoQueuePurgeSynchronously from ANY of the listed EvtIo Event Processing Callbacks for ANY Queue. Now you know.
And, before you ask: Yes. We’ve already requested our friends on the WDK doc team and the static analysis tools team to make updates that might save others from our fate.