Windows System Software -- Consulting, Training, Development -- Engineering Excellent, Every Time.

Seeking More from an Unhappy System with Live Kernel Reports

Even though they may feel spontaneous, a system crash is always an explicit decision by a kernel component to bring the machine down. Sometimes the machine is in such a bad state that immediately resetting the entire system is the only reasonable thing to do. For example, if the Memory Manager detects a corruption in the operating system we can’t simply let the corruption pass and possibly make things worse, so it crashes the machine to get out of a bad situation.

No leaving until the dump is solved!

At this point we’d also like to know why the system was in a bad state, so the O/S writes the contents of memory out into a crash dump file. Some poor soul can then be chained to a chair and forced to stare at the dump in WinDbg until the bug is found.

Given their usefulness, it might also be beneficial to have a crash dump not only in cases that are fatal but in ones that are simply undesirable. For example, imagine that NDIS tries to reset your network adapter but the reset takes too long. We don’t necessarily want to crash the machine because of this, but we might want to generate a crash dump so we can analyze the state of the system and figure out what’s causing the delay.

This is the idea behind the Live Kernel Reports feature introduced in Windows 8. I personally didn’t notice this feature until last year, though it instantly intrigued me. Basically, any driver in the system can call the undocumented DbgkWerCaptureLiveKernel Dump API and generate a kernel summary dump and/or a minidump in the C:\Windows\LiveKernelReports folder. This allows each driver to decide what conditions might require some additional scrutiny and non-intrusively generate a crash dump for further study. Check out the folder on your own system right now and you might even see a few dump files in there.

Out of curiosity I wanted to see which Windows drivers used this functionality on a clean install of 19H1 and received quite a few hits:

Note that this only indicates the loaded drivers uses this functionality. A string search in the C:\Windows\System32\Drivers directory uncovered even more.

To be fair, in general these Live Kernel Reports aren’t terribly interesting. After all, they represent non-fatal failures and thus shouldn’t be too much bother to the user. However, I spend a lot of time debugging system crashes that point to no obvious culprit. In those cases I like to think about what’s different about the crashing system that might give me some new clues as I try to solve the riddle. Maybe there’s a component that’s been logging errors to the Event Log for the last month. Or maybe there’s a bunch of minidumps from previous crashes that the user didn’t even notice. The crash dumps in the Live Kernel Reports folder work in the same way. For example, maybe the NIC has been generating Live Kernel Reports since the time the driver was last updated.

But I Want Live Kernel Reports Too!

I also happen to be a dev, and so I want Live Kernel Reports for my drivers too! Given that it’s an entirely undocumented feature I’m not foolish enough to want to ship code that generates them, but I see a huge value here in terms of testing. I’d like to let the test team go crazy beating on our code while the driver silently generates Live Kernel Reports along with error logs for the development team to inspect. Sure, we could just crash the machines, but in many cases that’s unnecessary and we want to see how our code recovers from error conditions anyway.

So, after kicking that around as a, “wouldn’t that be nice?” idea for a while, I finally got around to doing a quick feasibility study by generating a Live Kernel Report from a test driver. I’m not Mr. Reverse Engineer, but I can fumble around enough to come up with a function prototype with enough parameters to get me going. Note that I gave up after the first 6 parameters (mostly due to lack of time and interest, as you’ll soon see that this experiment quickly hit a brick wall):

It took me a bit to get to that point and I was pretty excited when my module linked and loaded on the target. I then called the API and………..nothing happened. No crash, no debug output, no Live Kernel Report, nothing. Never one to give up so easily, I poked around and saw that there were DbgPrintEx calls being made inside this API and this led to two debug print filters that can be set to get more information:

With those flags set I received much more interesting output when I tried to generate the Live Kernel Report:

Note the highlighted message about the “system threshold time” not being met. Presumably the numbers specified are timestamps (in decimal), so we can translate them into system times using the .formats command. Here is the result for the threshold time:

And for the current time:

Clearly the code is trying to avoid generating too many Live Kernel Reports and thus has put some arbitrary deadline in the future for the next time a dump may be written. I searched but could not find a way to bypass this (i.e. a, “no, please do wear out my SSD and fill my drive, I don’t care!” flag) but alas did not find one. This dashed any and all hopes I had for using this as a testing mechanism in the lab.

Of course, I simply could not leave this alone until I saw my driver generate a Live Kernel Report. Using the debugger I NOP’d out the offending check and, hurray, I saw my dump file!

However, not too long later the dump file disappeared on me. Checking out Process Monitor, I can see that sometime after the Live Kernel Report is generated WerFault.exe runs and cleans out the folder.

This doesn’t appear to happen every time and for every dump, but there’s some algorithm behind the scenes to make sure this folder doesn’t grow unbounded.

Not Generically Useful, But Still Useful

Sadly, but not surprisingly, this mechanism is way too undocumented and special purpose built to be generically useful. I’ll continue to look at Live Kernel Reports as part of analyzing systems though and through this experiment I’ve learned that just because there aren’t any dumps in the Live Kernel Reports folder it doesn’t mean that they aren’t being generated.