Critical Regions are one of the more confusing and poorly documented concepts in Windows kernel mode development. Long considered something that only file system developers cared about, most developers just ignore the topic and assume that it doesn’t affect them.
While most of us are safe being blissfully ignore of Critical Regions, there has been some discussion about them recently on the NTDEV discussion forum. Given that it’s not exactly clear what a Critical Region does, let alone when you need them or when you don’t, it’s not surprising that there’s confusion about their usage. The unfortunate part is sometimes you must use Critical Regions to avoid introducing a very subtle (but nasty) denial of service attack in your code.
Before we proceed, a word for those of you coming from a user mode background: do not confuse Critical Regions with Critical Sections! Entirely, completely, 100% unrelated. Forget about Critical Sections entirely before proceeding with this article (feel free to use a hammer if necessary).
Your code enters a Critical Region by calling KeEnterCriticalRegion or FsRtlEnterFileSystem (which is just a wrapper for KeEnterCriticalRegion). Your code then leaves a Critical Region by calling KeLeaveCriticalRegion or FsRtlExitFileSystem.
The first important thing to note about Critical Regions is that they are a thread specific construct. Therefore, when you enter a Critical Region you are entering for the current thread only. Because of this, Critical Regions are most definitely not a synchronization mechanism.
What good are they then? The answer lies in an insightful comment provided with the FsRtlEnterFileSystem macro (Figure 1).
//++ // // VOID // FsRtlEnterFileSystem ( // ); // // Routine Description: // // This routine is used when entering a file system (e.g., through its // Fsd entry point). It ensures that the file system cannot be suspended // while running and thus block other file I/O requests. Upon exit // the file system must call FsRtlExitFileSystem. //
What this comment is trying to say is that entering a Critical Region prevents the current thread from being suspended. Thread suspension is performed by a Normal Kernel Asynchronous Procedure Call (KAPC), which is simply a kernel mode callback directed to a particular thread. By entering a Critical Region, your code prevents Normal KAPCs from executing on the current thread. Note that other things are done via Normal KAPCs as well (e.g. hard error popups), but for our discussion we just care that Critical Regions prevent thread suspension.
Given this, we can make two statements:
- Threads in a Critical Region cannot be suspended
- Threads not in a Critical Region can be suspended (as long as the thread is running at IRQL PASSIVE_LEVEL)
At first this sounds pretty hideous. I mean, have you ever given serious consideration to what would happen if a thread running in your driver was suspended? Most driver developers don’t think of this and, if they do, they assume that they’re immune to suspension simply by running in kernel mode. No such luck. Any code in your driver that runs at PASSIVE_LEVEL and not in a Critical Region can be suspended.
What prevents this from being a complete mess is that in order to forcibly suspend a thread, a caller must have sufficient access to the thread in question. Thus, if a process has sufficient rights and wants to pause a running thread, they’re allowed to do that. If this causes the process owning the thread to hang or otherwise malfunction, oh well! We can just document that under, “Don’t Do That.”
Why bother having Critical Regions at all then? Well, the problem comes when there’s a knock on effect that causes problems in other threads or processes. What if suspending a thread in Process A causes threads in Process B to hang? Just because I have the authority to suspend threads in Process A doesn’t mean I have that same authority with Process B, though I effectively achieved the same result. Even worse, imagine an unprivileged application that suspends own of its own threads, causing other privileged applications to hang or crash. This would be very bad indeed.
Thus, as you’re writing your code you need to ask yourself: if the current thread was suspended, would that affect any other threads in the system? Most problem cases will occur when acquiring locks; What happens to other potential users of a lock if you acquire that lock and then your thread is suspended?
Take the file system for example. The file systems use a significant amount of file and volume level locking while processing I/O requests from user mode. What if a thread is currently holding one of these locks when it is suspended? This could potentially result in a system-wide deadlock as other threads come along and try to acquire the same lock to perform I/O operations of the file or volume.
We solve this problem in one of two ways. The first way is to acquire a lock that also implicitly enters a Critical Region. Examples of these types of locks are:
- Kernel Mutexes
- Fast Mutexes
- Guarded Mutexes
However, if you acquire a lock that does not implicitly enter a Critical Region then you must explicitly call KeEnterCriticalRegion before acquiring the lock and KeLeaveCriticalRegion after releasing it. Examples of locks that do not enter a Critical Region are:
- Executive Resources (ERESOURCE)
- Unsafe Fast Mutex
If you’re using one of the above as a synchronization mechanism and therefore acquire it from multiple different thread contexts, you really need a Critical Region to keep yourself safe from a DoS.
In a nutshell, the Critical Region can be thought of simply as a way to prevent thread suspension. Threads executing in kernel mode are subject to being suspended, which is fine unless your driver creates cross thread dependencies (e.g. with a lock). In that case, it’s up to you to properly enter and leave a Critical Region at the right times.