In our debugging seminars, we make a big show of telling our students to not only run !analyze –v when looking at a system crash, but to also actually read the output of the command. Note that I didn’t say to skim it casually. Nor did I say to glaze over while the output flies by (Hmmm…I wonder if they put more snacks out?). But actually read and understand what the bugcheck analysis says. Ultimately, the bugcheck code and description are the reason for the system crash, so if you don’t start out with a complete understanding of the !analyze –v output then you’re hopeless from the start.
It’s my firm belief in this step that makes me want to claw the text off the screen when I see the description accompanying the all too common PAGE_FAULT_IN_NONPAGED_AREA bugcheck:
0: kd> !analyze -v *********************************************** * * * Bugcheck Analysis * * * *********************************************** PAGE_FAULT_IN_NONPAGED_AREA (50) Invalid system memory was referenced. This cannot be protected by try-except, it must be protected by a Probe. Typically the address is just plain bad or it is pointing at freed memory.
In order to understand what my problem is with this description, let’s break it down into individual pieces:
Invalid system memory was referenced.
This is absolutely correct, no problem here. The PAGE_FAULT_IN_NONPAGED_AREA bugcheck only ever occurs when you dereference an invalid kerneladdress. The Windows Page Fault Handler has to assume that if an invalid kernel address has been dereferenced something seriously bad is going on, so it has no other choice than to crash the machine.
This cannot be protected by try-except,
Agreed! As I mentioned, the Page Fault Handler simply bugchecks if an invalid kernel address is dereferenced. This is different than if an invalid user address is dereferenced. In that case, the Page Fault Handler raises an exception that can be caught with a __try/__except block. This allows the O/S and drivers to be resilient to malicious or poorly written applications that supply invalid user buffers to I/O operations.
it must be protected by a Probe.
And so it begins…This statement is a vague half-truth, which makes it all the more annoying. If it was all wrong I could simply tell you to ignore it. If it was all correct I wouldn’t have to bother writing this article and I could be outside making snow angels.
For starters, the Probe that the sentence is referring to is in fact two different functions: ProbeForRead and ProbeForWrite. This would lead you to believe that you could avoid dereferencing invalid kernel memory if you called one of these functions before dereferencing the pointer, right? Sort of! From the docs:
The ProbeForRead routine checks that a user-mode buffer actually resides in the user portion of the address space, and is correctly aligned.
The ProbeForWrite routine checks that a user-mode buffer actually resides in the user-mode portion of the address space, is writable, and is correctly aligned.
All these APIs really do is make sure that a buffer pointer is a user mode pointer. If the pointer is a kernel mode pointer, they raise an exception that the caller can catch. Calling these APIs before dereferencing any pointer supplied by user mode is a required step in buffer validation, otherwise you run the risk of dereferencing a kernel mode address provided by a user mode caller and generating a PAGE_FAULT_IN_NONPAGED_AREA bugcheck.
In any other context these APIs make no sense for a driver. If you protected all of your kernel mode references with a Probe you would never actually dereference a kernel mode pointer. This would surely prevent you from dereferencing invalid kernel memory, so I suppose the statement isn’t entirely inaccurate, but your driver wouldn’t be very useful.
Typically the address is just plain bad or it is pointing at freed memory.
The situation has returned to normal. The bugcheck is ultimately caused by an invalid kernel pointer, so you have all of the usual reasons to look out for when it comes to why the pointer might be bad. Sure, it could be that you didn’t call a Probe function on a pointer from a buggy or malicious application, but in general this guidance will point you in the right direction.
Analyst’s Perspective is a column by OSR Consulting Associate, Scott Noone. When he’s not root-causing complex kernel issues, he’s leading the development and instruction of OSR’s Kernel Debugging Seminar. Comments or suggestions for this or future Analyst Perspective columns can be addressed to ap at osr.com.