Even though they may feel spontaneous, a system crash is always an explicit decision by a kernel component to bring the machine down. Sometimes the machine is in such a bad state that immediately resetting the entire system is the only reasonable thing to do. For example, if the Memory Manager detects a corruption in the operating system we can’t simply let the corruption pass and possibly make things worse, so it crashes the machine to get out of a bad situation.
At this point we’d also like to know why the system was in a bad state, so the O/S writes the contents of memory out into a crash dump file. Some poor soul can then be chained to a chair and forced to stare at the dump in WinDbg until the bug is found.
Given their usefulness, it might also be beneficial to have a crash dump not only in cases that are fatal but in ones that are simply undesirable. For example, imagine that NDIS tries to reset your network adapter but the reset takes too long. We don’t necessarily want to crash the machine because of this, but we might want to generate a crash dump so we can analyze the state of the system and figure out what’s causing the delay.
This is the idea behind the Live Kernel Reports feature introduced in Windows 8. I personally didn’t notice this feature until last year, though it instantly intrigued me. Basically, any driver in the system can call the undocumented DbgkWerCaptureLiveKernel Dump API and generate a kernel summary dump and/or a minidump in the C:\Windows\LiveKernelReports folder. This allows each driver to decide what conditions might require some additional scrutiny and non-intrusively generate a crash dump for further study. Check out the folder on your own system right now and you might even see a few dump files in there.
Out of curiosity I wanted to see which Windows drivers used this functionality on a clean install of 19H1 and received quite a few hits:
0: kd> x *!_imp_DbgkWerCaptureLiveKernelDump ffff8ec0`fdb596d0 win32kfull!_imp_DbgkWerCaptureLiveKernelDump =
ffff8ec0`fdeaf378 cdd!_imp_DbgkWerCaptureLiveKernelDump = fffff803`d2086250 mrxsmb!_imp_DbgkWerCaptureLiveKernelDump = fffff803`d21b26e8 cldflt!_imp_DbgkWerCaptureLiveKernelDump = fffff803`d25021c8 srv2!_imp_DbgkWerCaptureLiveKernelDump = fffff805`1f7db5a8 Wdf01000!_imp_DbgkWerCaptureLiveKernelDump = fffff805`1f8380f0 WppRecorder!_imp_DbgkWerCaptureLiveKernelDump = fffff805`1f97c658 ACPI!_imp_DbgkWerCaptureLiveKernelDump = fffff805`1fa2c1e8 intelpep!_imp_DbgkWerCaptureLiveKernelDump = fffff805`1fb95040 pdc!_imp_DbgkWerCaptureLiveKernelDump = fffff805`201264b0 Ntfs!_imp_DbgkWerCaptureLiveKernelDump = fffff805`2057a158 UsbHub3!_imp_DbgkWerCaptureLiveKernelDump = fffff805`206a52d8 ndis!_imp_DbgkWerCaptureLiveKernelDump = fffff805`209ef600 tcpip!_imp_DbgkWerCaptureLiveKernelDump = fffff805`20b042e8 fwpkclnt!_imp_DbgkWerCaptureLiveKernelDump = fffff805`20c6a608 volsnap!_imp_DbgkWerCaptureLiveKernelDump = fffff805`20d2a3b0 USBXHCI!_imp_DbgkWerCaptureLiveKernelDump =
Note that this only indicates the loaded drivers uses this functionality. A string search in the C:\Windows\System32\Drivers directory uncovered even more.
To be fair, in general these Live Kernel Reports aren’t terribly interesting. After all, they represent non-fatal failures and thus shouldn’t be too much bother to the user. However, I spend a lot of time debugging system crashes that point to no obvious culprit. In those cases I like to think about what’s different about the crashing system that might give me some new clues as I try to solve the riddle. Maybe there’s a component that’s been logging errors to the Event Log for the last month. Or maybe there’s a bunch of minidumps from previous crashes that the user didn’t even notice. The crash dumps in the Live Kernel Reports folder work in the same way. For example, maybe the NIC has been generating Live Kernel Reports since the time the driver was last updated.
But I Want Live Kernel Reports Too!
I also happen to be a dev, and so I want Live Kernel Reports for my drivers too! Given that it’s an entirely undocumented feature I’m not foolish enough to want to ship code that generates them, but I see a huge value here in terms of testing. I’d like to let the test team go crazy beating on our code while the driver silently generates Live Kernel Reports along with error logs for the development team to inspect. Sure, we could just crash the machines, but in many cases that’s unnecessary and we want to see how our code recovers from error conditions anyway.
So, after kicking that around as a, “wouldn’t that be nice?” idea for a while, I finally got around to doing a quick feasibility study by generating a Live Kernel Report from a test driver. I’m not Mr. Reverse Engineer, but I can fumble around enough to come up with a function prototype with enough parameters to get me going. Note that I gave up after the first 6 parameters (mostly due to lack of time and interest, as you’ll soon see that this experiment quickly hit a brick wall):
EXTERN_C _IRQL_requires_(PASSIVE_LEVEL) NTKERNELAPI NTSTATUS DbgkWerCaptureLiveKernelDump( _In_z_ PWCHAR ComponentName, _In_ ULONG LiveDumpCode, _In_ ULONG_PTR LiveDumpParam1, _In_ ULONG_PTR LiveDumpParam2, _In_ ULONG_PTR LiveDumpParam3, _In_ ULONG_PTR LiveDumpParam4, _In_ ULONG_PTR I, _In_ ULONG_PTR Dont, _In_ ULONG Know );
It took me a bit to get to that point and I was pretty excited when my module linked and loaded on the target. I then called the API and………..nothing happened. No crash, no debug output, no Live Kernel Report, nothing. Never one to give up so easily, I poked around and saw that there were DbgPrintEx calls being made inside this API and this led to two debug print filters that can be set to get more information:
0: kd> ed nt!Kd_WER_Mask f 0: kd> ed nt!Kd_CRASHDUMP_Mask f
With those flags set I received much more interesting output when I tried to generate the Live Kernel Report:
WERKERNELHOST: WerpCheckPolicy: Requested Policy is 2 WERKERNELHOST: System memory threshold met. Memory Threshold 274877906944 bytes, SystemMemory 8589398016 bytes WERKERNELHOST: System threshold time not met. Threshold time 132051744578202800, Current time 132047486278525109 WERKERNELHOST: WerpCheckPolicy: Requested Policy 2 is higher than granted 0 WERKERNELHOST: CheckPolicy throttled dump creation for Component DBGK: DbgkWerCaptureLiveKernelDump: WerLiveKernelCreateReport failed, status 0xc0000022.
Note the highlighted message about the “system threshold time” not being met. Presumably the numbers specified are timestamps (in decimal), so we can translate them into system times using the .formats command. Here is the result for the threshold time:
0: kd> .formats 0n132051744578202800 Evaluate expression: Hex: 01d5245c`af8decb0 Decimal: 132051744578202800 Octal: 0007251105625743366260 Binary: 00000001 11010101 00100100 01011100 10101111 10001101 11101100 10110000 Chars: ..$\.... Time: Sun Jun 16 12:00:57.820 2019 (UTC - 4:00) Float: low -2.58159e-010 high 7.8296e-038 Double: 7.89244e-300
And for the current time:
0: kd> .formats 0n132047486278525109 Evaluate expression: Hex: 01d5207d`391d60b5 Decimal: 132047486278525109 Octal: 0007251007647107260265 Binary: 00000001 11010101 00100000 01111101 00111001 00011101 01100000 10110101 Chars: .. }9.`. Time: Tue Jun 11 13:43:47.852 2019 (UTC - 4:00) Float: low 0.000150087 high 7.82905e-038 Double: 7.88679e-300
Clearly the code is trying to avoid generating too many Live Kernel Reports and thus has put some arbitrary deadline in the future for the next time a dump may be written. I searched but could not find a way to bypass this (i.e. a, “no, please do wear out my SSD and fill my drive, I don’t care!” flag) but alas did not find one. This dashed any and all hopes I had for using this as a testing mechanism in the lab.
Of course, I simply could not leave this alone until I saw my driver generate a Live Kernel Report. Using the debugger I NOP’d out the offending check and, hurray, I saw my dump file!
However, not too long later the dump file disappeared on me. Checking out Process Monitor, I can see that sometime after the Live Kernel Report is generated WerFault.exe runs and cleans out the folder.
This doesn’t appear to happen every time and for every dump, but there’s some algorithm behind the scenes to make sure this folder doesn’t grow unbounded.
Not Generically Useful, But Still Useful
Sadly, but not surprisingly, this mechanism is way too undocumented and special purpose built to be generically useful. I’ll continue to look at Live Kernel Reports as part of analyzing systems though and through this experiment I’ve learned that just because there aren’t any dumps in the Live Kernel Reports folder it doesn’t mean that they aren’t being generated.