Windows System Software -- Consulting, Training, Development -- Engineering Excellent, Every Time.

LFH Kernel Pool Allocator Challenges the Incumbent

Last reviewed and updated: 11 May 2020

Windows 10 version 1809 (build 17763), otherwise known as RS5, introduced a new pool allocator in the kernel for the first time in, well, forever. Interestingly, the newly introduced kernel pool allocator is actually the existing user mode Low Fragmentation Heap (LFH) allocator. While there’s something to be said for the, “if it ain’t broke” mentality, having a single allocator in the O/S certainly makes a certain amount of sense from a maintainability perspective. Also, the user mode heap allocator has undergone significant revisions over the years to better reflect modern security practices, so it makes sense to share those benefits with kernel mode as well.

Most of us wouldn’t even notice or care that there’s a new pool allocator (except for the fact that it broke !pool, that is, ahem). However, over the years I have debugged so many BAD_POOL_HEADER bugchecks that I was curious about how the new pool allocator responded to some obvious driver bugs. Specifically, I wondered about the following cases:

  • Buffer Overruns
  • Double Frees
  • Use After Frees

So, I seized the unique opportunity to intentionally write buggy code (and, yes, I did at one point end up with a bug in my buggy code that caused it to not be buggy). The buggy code provides IOCTLs to generate each buggy scenario and the code to handle the IOCTLs is shown in Figure 1.

    size = 268;

    allocation = ExAllocatePoolWithTag(NonPagedPool,
                                       size,
                                       'KLSO');

    DbgPrint("!pool 0x%p (size - 0x%x)\n", allocation, size);

    switch (IoControlCode) {

        case IOCTL_OSRLK_OVERRUN: {

            DbgPrint("Zeroing 0x%x\n", size * 2);

            RtlZeroMemory(allocation, size * 2);

            ExFreePool(allocation);

            break;

        }

        case IOCTL_OSRLK_DOUBLE_FREE: {
        
            DbgPrint("Freeing twice\n");

            ExFreePool(allocation);

            ExFreePool(allocation);

            break;

        }

        case IOCTL_OSRLK_USE_AFTER_FREE: {

            DbgPrint("Freeing then zeroing\n");

            ExFreePool(allocation);

            RtlZeroMemory(allocation, size);

            break;

        }

I then ran each test to pit the Windows 7 allocator against the Windows 10 19H1 allocator to see which one performed better in detecting the bugs. Note that this was not a rigorous, scientific study involving thousands of iterations. Each one was run about three times max to validate that the behavior was at least somewhat repeatable.

Now, without further ado, the results…

Overrun Challenge

Windows 7

On Windows 7 the system immediately crashed with a BAD_POOL_HEADER:

BAD_POOL_HEADER (19)
The pool is already corrupt at the time of the current request.
This may or may not be due to the caller.
The internal pool links must be walked to figure out a possible cause of
the problem, and then special pool applied to the suspect tags or the driver
verifier to a suspect driver.
Arguments:
Arg1: 0000000000000020, a pool block header size is corrupt.
Arg2: fffffa801a26bde0, The pool entry we were looking for within the page.
Arg3: fffffa801a26bf00, The next pool entry.
Arg4: 000000000412003a, (reserved)

Running !pool on second argument to the bugcheck walks the pool page and shows us where we went off a cliff:

1: kd> !pool @$bug_param2
Pool page fffffa801a26bde0 region is Nonpaged pool
 fffffa801a26b000 size:  5c0 previous size:    0  (Allocated)  Txrn
 fffffa801a26b5c0 size:  1b0 previous size:  5c0  (Free)       Free
 fffffa801a26b770 size:   c0 previous size:  1b0  (Allocated)  FMsl
 fffffa801a26b830 size:  150 previous size:   c0  (Allocated)  File (Protected)
 fffffa801a26b980 size:   c0 previous size:  150  (Allocated)  FMsl
 fffffa801a26ba40 size:  3a0 previous size:   c0  (Free)       FMic
*fffffa801a26bde0 size:  120 previous size:  3a0  (Free ) *OSLK
		Owning component : Unknown (update pooltag.txt)

fffffa801a26bf00 doesn't look like a valid small pool allocation, checking to see
if the entire page is actually part of a large page allocation...

Windows 10 19H1

Running the same test on Windows 10 produced no crash. Running !pool on the freed buffer shows a corruption of the page just like on Windows 7:

1: kd> !pool 0xFFFFCD02FB902050 
Pool page ffffcd02fb902050 region is Nonpaged pool
 ffffcd02fb902000 size:   30 previous size:    0  (Free)       ....

ffffcd02fb902040 doesn't look like a valid small pool allocation, checking to see
if the entire page is actually part of a large page allocation...

But no BAD_POOL_HEADER crash.

Interestingly, I ran the test again and this time I did hit a crash. However, it was an IRQL_NOT_LESS_THAN_OR_EQUAL bugcheck in the bowels of the heap allocator on the next allocation:

IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high.  This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.
Arguments:
Arg1: ffffcd02fb8b5022, memory referenced
Arg2: 0000000000000002, IRQL
Arg3: 0000000000000000, bitfield :
	bit 0 : value 0 = read operation, 1 = write operation
	bit 3 : value 0 = not an execute operation, 1 = execute operation (only on chips which support this level of status)
Arg4: fffff80574452c49, address which referenced memory

0: kd> kc
 # Call Site
00 nt!DbgBreakPointWithStatus
01 nt!KiBugCheckDebugBreak
02 nt!KeBugCheck2
03 nt!KeBugCheckEx
04 nt!KiBugCheckDispatch
05 nt!KiPageFault
06 nt!RtlpHpVsContextAllocateInternal
07 nt!ExAllocateHeapPool
08 nt!ExAllocatePoolWithTag
09 OSRLK!OSRLKEvtIoDeviceControl 

Overrun Challenge Winner: Windows 7. Corruption was detected immediately when the buffer was freed and we were provided a clear bugcheck description.

Double Free Challenge

Windows 7

On Windows 7 the system immediately crashed with a BAD_POOL_CALLER:

BAD_POOL_CALLER (c2)
The current thread is making a bad pool request.  Typically this is at a bad IRQL level or double freeing the same allocation, etc.
Arguments:
Arg1: 0000000000000007, Attempt to free pool which was already freed
Arg2: 0000000000001097, Pool tag value from the pool header
Arg3: 0000000004120009, Contents of the first 4 bytes of the pool header
Arg4: fffffa801b3de9f0, Address of the block of pool being deallocated

Windows 10 19H1

Running the same test on Windows 10 produced no crash. Much like last time, running the test a second time did indeed result in a system crash, though this time it was properly at the point of the second free:

KERNEL_MODE_HEAP_CORRUPTION (13a)
The kernel mode heap manager has detected corruption in a heap.
Arguments:
Arg1: 0000000000000011, Type of corruption detected
Arg2: ffff91030de00100, Address of the heap that reported the corruption
Arg3: ffff91030dd133a0, Address at which the corruption was detected
Arg4: 0000000000000000

Curious about what Arg1 == 0x11 meant, I took a SWAG and grep’d the type information for something related to “heap” and “type”:

0: kd> dt nt!_heap*type*
          ntkrnlmp!_HEAP_SEG_RANGE_TYPE
          ntkrnlmp!_HEAP_FAILURE_TYPE

That was lucky! Dumping HEAP_FAILURE_TYPE we see that 0x11 (0n17) maps to heap_failure_segment_lfh_double_free:

0: kd> dt nt!_HEAP_FAILURE_TYPE
   heap_failure_internal = 0n0
   heap_failure_unknown = 0n1
   heap_failure_generic = 0n2
   heap_failure_entry_corruption = 0n3
   heap_failure_multiple_entries_corruption = 0n4
   heap_failure_virtual_block_corruption = 0n5
   heap_failure_buffer_overrun = 0n6
   heap_failure_buffer_underrun = 0n7
   heap_failure_block_not_busy = 0n8
   heap_failure_invalid_argument = 0n9
   heap_failure_invalid_allocation_type = 0n10
   heap_failure_usage_after_free = 0n11
   heap_failure_cross_heap_operation = 0n12
   heap_failure_freelists_corruption = 0n13
   heap_failure_listentry_corruption = 0n14
   heap_failure_lfh_bitmap_mismatch = 0n15
   heap_failure_segment_lfh_bitmap_corruption = 0n16
   heap_failure_segment_lfh_double_free = 0n17
   heap_failure_vs_subsegment_corruption = 0n18
   heap_failure_null_heap = 0n19
   heap_failure_allocation_limit = 0n20
   heap_failure_commit_limit = 0n21

So there is double free detection, but for some reason it didn’t trigger on the first pass.

Double Free Challenge Winner: Very close, but Windows 7 because it was caught the first time every time we tested. Windows 10 also lost points because the debugger didn’t provide additional reason information and we only came to it through a lucky guess.

Use After Free Challenge

Windows 7

Running this test on Windows 7 produced no crash.

Windows 10 19H1

Running this test on Windows 10 produced no crash.

Use After Free Challenge Winner: Tie. To be fair, it would take a lot of extra processing in the allocator to find this bug, so not surprising that the bug was not caught by either allocator.

Overall Results

Though there are surely benefits to the new allocator, in our opinion the old allocator wins in its ability to detection corruptions of the pool in cases tested.

What About With Driver Verifier?

Of course, the best way to find pool corruptions is with Driver Verifier and Special Pool. I’m happy to report that both allocators caught all three bugs equally, so no loss in functionality there.