Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results

Unexpected Case of Bugcheck IRQL_UNEXPECTED_VALUE (C8)

Unexpected Case of Bugcheck IRQL_UNEXPECTED_VALUE (C8)

Yet another interesting case lands on our doorstep thanks to NTDEV (original post here).

I firmly believe that you have zero chance in diagnosing a non-trivial crash if you don’t understand the bugcheck code. The bugcheck code is, in fact, THE definitive reason for the crash. Of course, just understanding the bugcheck code itself is hardly ever sufficient to diagnose the problem, but it’s a fundamental to how you approach the particular crash.

The OP’s crash was fun for me because I’d never seen the bugcheck code before and I’m always happy to meet a type of system crash (especially if I wasn’t the one that caused it):

Any time I’m presented with a crash, I try to dwell on the crash code and its arguments for a while before looking at anything else. In this case I was on board and feeling pretty good about the crash as I read the description. It seems reasonable to have a crash that results in someone changing the IRQL without ever restoring it.

Once I got to the arguments though I went right off a cliff. As I followed the description and decoded the arguments I learned that:

  1. The Current IRQL is 0
  2. The Expected IRQL is 0
  3. UniqueValue is 0, so:
    1. Arg2 is an APC’s Kernel Routine and it’s 2
    2. Arg3 is an APC and it’s NULL
    3. Arg4 is an APC’s Normal Routine and it’s 0

Presumably this crash only happens if the “Current IRQL” doesn’t match the “Expected IRQL”, but what I just decoded doesn’t support that. The other arguments don’t make sense to me either because I’d expect some other kind of crash if someone queued an NULL APC or an APC with a Kernel Routine set to 2.

This got me curious as to what the crash code actually meant, so I broke out WinDbg and started poking. The call stack indicated that ndis!ndisExpandStack called some function exported by NT, which then ended up in some optimized code area and crashed the machine:

Based on the name of the NDIS function I had a guess as to what function this was, but to confirm I disassembled ndis!ndisExpandStack:

The theory at this point is that KeExpandKernelStackAndCalloutEx is the one generating the bugcheck code.

Looking at that function I see the source of the 0xC8 bugcheck and the mystery of the arguments is solved:

The bugcheck makes much more sense now. Someone’s stack expansion callback was called at DISPATCH_LEVEL (Arg2 == 2) and returned at PASSIVE_LEVEL (Arg1 == 0). That’s against the rules, thus you get a system crash.

Personally I would call this a bug in KeExpandKernelStackAndCalloutEx seeing as how it is generating an IRQL_UNEXPECTED_VALUE using invalid (unexpected?) arguments. At a minimum the documentation is currently wrong though and I have filed a bug to try to get that addressed.