Windows System Software -- Consulting, Training, Development -- Engineering Excellence, Every Time.

Win7 Crash Redux

In the last issue of The NT Insider we published a write-up describing analysis of a crash dump in filter manager.   A couple of readers commented about the analysis and had some solid points that needed to be considered.

This also underscores an important aspect of analysis, namely that it can be helpful to obtain a second opinion on one’s analysis, precisely because it is easy to miss some important point that aids in the analysis.

Now on to the specific issues raised by the readers:

  • The value of the RSI register is in fact not null (0000009d00000000)
  • There is a backwards jump in the instruction stream that impacts on the code flow analysis
  • There is some useful information related to determining the type of trap frame that in turn tells us more about which registers are valid.

x64 Trap Frame Observations

On the x64 platform, a kernel trap frame does not capture all register state:

  • For a processor exception or trap, the OS only captures the volatile registers (RAX, RCX, RDX, R8-R11 and XMM0-XMM5) and the RBP register.
  • For a system call entry, the OS only captures RBP, RSI and RDI.  No other registers are preserved.

By using the ExceptionActive field, we can determine which type of trap frame this is (a 0 or 1 value indicates this is an exception or trap and the volatile registers plus RBP are stored).

Further, the reader observed:

You can very often get reliable nonvolatile registers on x64 from “.frame /r (frame number)”. There is also the newly-documented (but long-present) “.frame /c (frame number)” command that sets your effective context to the values obtained from .frame /r. This works using the unwind metadata generated by the x64 compiler that the debugger stackwalker uses (it also works for Itanium, if you should be debugging that, but not x86). It should _always_ give you correct nonvolatile registers if you start from the context obtained by .cxr, .thread, or .exptr.

This was useful information in general, and hopefully will help our readers further hone their debugging skills into the future.

Code flow analysis

This reader had some valid observations here:

The first constructive point is that if your debugging takes you into the game of back-tracing, you need to study whole functions. This means unassembling not just at addresses before the faulting instruction, nor after, but at all places that fragments of the function have got scattered by optimisation. Basically, you need to raise your debugging to the foothills of reverse engineering. The reverse engineer will see that the faulting instruction, “mov rax,qword ptr [rsi+20h]” at …F141 is picked up for TreeUnlinkMulti by inlining TreeUnlinkMultiDoWalk, which in turn inlines TreeLookup, which in turn inlines TreeFindNodeOrParent. The loop that the analyst has missed is actually from the start of this last subroutine. The code’s overall intention is to walk a given tree, remove the nodes that match a given pair of keys, and return these nodes as a list (linked through the RightChild members only).

The reality is that with x64 it is proving to be far more often the case that we need to back track through the code flow in order to find local variables and reconstruct the stack.  When doing a thorough analysis, it is indeed important to look at the entire function (the uf function is good for this) but this is a bit more time-consuming (and certainly more daunting to those just approaching kernel debugging).

But these are valid points.

So with this said, let’s go back and revisit our analysis. The context record shows us:

2: kd> .cxr fffff88005f45960
rax=fffffaf7072f96b0 rbx=0000000000000000 rcx=fffffa8008e13318
rdx=fffffa8007e5b550 rsi=0000009d00000000 rdi=0000000000000000
rip=fffff8800106f141 rsp=fffff88005f46330 rbp=fffffa8008e13318
 r8=ffffffffffffffff  r9=ffffffffffffffff r10=fffffffffffffe4a
r11=0000000000000001 r12=fffffa8007e5b550 r13=fffffa8007e34684
r14=0000000000004000 r15=0000000000000000
iopl=0         nv up ei pl nz na po nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00010206
fffff880`0106f141 488b4620        mov     rax,qword ptr [rsi+20h] 

And of course the registers are valid here (they are all captured in the context record,) as our reader noted.

This gives us the following stack:

2: kd> kv
  *** Stack trace for last set context - .thread/.cxr resets it
Child-SP          RetAddr           : Args to Child    
                                                       : Call Site
fffff880`05f46330 fffff880`0106c460 : fffffa80`07a96920 fffffa80`07e5b550 fffffa80`07a96920 
00000000`00000000 : fltmgr!TreeUnlinkMulti+0x51
fffff880`05f46380 fffff880`0106cbe9 : fffff880`05f48000 00000000`00000002 00000000`00000000 
00000000`00000000 : fltmgr!FltpPerformPreCallbacks+0x730
fffff880`05f46480 fffff880`0106b6c7 : fffffa80`08b93c10 fffffa80`07ca8de0 fffffa80`07b402c0 
00000000`00000000 : fltmgr!FltpPassThrough+0x2d9
fffff880`05f46500 fffff800`02da278e : fffffa80`07e5b550 fffffa80`07dfa8e0 fffffa80`07e5b550 
fffffa80`07ca8de0 : fltmgr!FltpDispatch+0xb7
fffff880`05f46560 fffff800`02a918b4 : fffffa80`07e34010 fffff800`02d8f260 fffffa80`06d17c90 
00000000`ff060001 : nt!IopDeleteFile+0x11e
fffff880`05f465f0 fffff800`02d900e6 : fffff800`02d8f260 00000000`00000000 fffff880`05f469e0 
fffffa80`08b93c10 : nt!ObfDereferenceObject+0xd4
fffff880`05f46650 fffff800`02d85e84 : fffffa80`07c3fcd0 00000000`00000000 fffffa80`07a17b10 
fffffa80`0a31e701 : nt!IopParseDevice+0xe86
fffff880`05f467e0 fffff800`02d8ae4d : fffffa80`07a17b10 fffff880`05f46940 0067006e`00000040 
fffffa80`06d17c90 : nt!ObpLookupObjectName+0x585
fffff880`05f468e0 fffff800`02d1ee3c : fffffa80`08cf07e0 00000000`00000007 fffffa80`00001f01 
00001f80`00f40200 : nt!ObOpenObjectByName+0x1cd
fffff880`05f46990 fffff800`02a8b993 : fffffa80`0a31e7e0 00000000`00000000 fffffa80`0a31e7e0 
00000000`7ef95000 : nt!NtQueryFullAttributesFile+0x14f
fffff880`05f46c20 00000000`77320eba : 00000000`00000000 00000000`00000000 00000000`00000000 
00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ fffff880`05f46c20)
00000000`0121e778 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 
00000000`00000000 : 0x77320eba

Then let’s look at the invalid address:

2: kd> !pte 0000009d00000000 
                                           VA 0000009d00000000
PXE at FFFFF6FB7DBED008   PPE at FFFFF6FB7DA013A0   PDE at FFFFF6FB40274000   PTE at FFFFF6804E800000
contains 0000000000000000
not valid

This decodes the address, finds the relevant page table entries and decodes each of them.  From this, we can tell there is nothing within this 512GB memory region (since each PXE entry corresponds to a 512GB region of the address space).

Thus, while not the null pointer indicated previously, this is still an invalid address – within a large, undefined region of the address space.

As you so choose, you can look at the first function from the stack in its entirety, we see:

2: kd> uf fltmgr!TreeUnlinkMulti
fffff880`0106f0f0 fff3            push    rbx
fffff880`0106f0f2 55              push    rbp
fffff880`0106f0f3 57              push    rdi
fffff880`0106f0f4 4883ec30        sub     rsp,30h
fffff880`0106f0f8 33ff            xor     edi,edi
fffff880`0106f0fa 488be9          mov     rbp,rcx
fffff880`0106f0fd 4883faff        cmp     rdx,0FFFFFFFFFFFFFFFFh
fffff880`0106f101 0f840c010000    je      fltmgr!TreeUnlinkMulti+0x123 (fffff880`0106f213)

fffff880`0106f107 4c89642458      mov     qword ptr [rsp+58h],r12
fffff880`0106f10c 4c8be2          mov     r12,rdx
fffff880`0106f10f 4983f8ff        cmp     r8,0FFFFFFFFFFFFFFFFh
fffff880`0106f113 0f85eb450000    jne     fltmgr! ?? ::FNODOBFM::`string'+0x504 (fffff880`01073704)

fffff880`0106f119 4889742450      mov     qword ptr [rsp+50h],rsi
fffff880`0106f11e 488b31          mov     rsi,qword ptr [rcx]
fffff880`0106f121 4885f6          test    rsi,rsi
fffff880`0106f124 7518            jne     fltmgr!TreeUnlinkMulti+0x4e (fffff880`0106f13e)

fffff880`0106f126 488bdf          mov     rbx,rdi

fffff880`0106f129 488b742450      mov     rsi,qword ptr [rsp+50h]
fffff880`0106f12e 488bc3          mov     rax,rbx

fffff880`0106f131 4c8b642458      mov     r12,qword ptr [rsp+58h]

fffff880`0106f136 4883c430        add     rsp,30h
fffff880`0106f13a 5f              pop     rdi
fffff880`0106f13b 5d              pop     rbp
fffff880`0106f13c 5b              pop     rbx
fffff880`0106f13d c3              ret

fffff880`0106f13e 488bdf          mov     rbx,rdi

fffff880`0106f141 488b4620        mov     rax,qword ptr [rsi+20h]
fffff880`0106f145 483bd0          cmp     rdx,rax
fffff880`0106f148 741b            je      fltmgr!TreeUnlinkMulti+0x75 (fffff880`0106f165)

fffff880`0106f14a 483bd0          cmp     rdx,rax
fffff880`0106f14d 720b            jb      fltmgr!TreeUnlinkMulti+0x6a (fffff880`0106f15a)

fffff880`0106f14f 488b7610        mov     rsi,qword ptr [rsi+10h]
fffff880`0106f153 4885f6          test    rsi,rsi
fffff880`0106f156 74ce            je      fltmgr!TreeUnlinkMulti+0x36 (fffff880`0106f126)

fffff880`0106f158 ebe7            jmp     fltmgr!TreeUnlinkMulti+0x51 (fffff880`0106f141)

fffff880`0106f15a 488b7608        mov     rsi,qword ptr [rsi+8]
fffff880`0106f15e 4885f6          test    rsi,rsi
fffff880`0106f161 74c3            je      fltmgr!TreeUnlinkMulti+0x36 (fffff880`0106f126)

fffff880`0106f163 ebdc            jmp     fltmgr!TreeUnlinkMulti+0x51 (fffff880`0106f141)

fffff880`0106f165 4c896c2460      mov     qword ptr [rsp+60h],r13
fffff880`0106f16a 4c89742428      mov     qword ptr [rsp+28h],r14
fffff880`0106f16f 4c897c2420      mov     qword ptr [rsp+20h],r15
fffff880`0106f174 4c8bee          mov     r13,rsi

fffff880`0106f177 4c396620        cmp     qword ptr [rsi+20h],r12
fffff880`0106f17b 0f854e450000    jne     fltmgr! ?? ::FNODOBFM::`string'+0x4cf (fffff880`010736cf)

fffff880`0106f181 4c8bf6          mov     r14,rsi
fffff880`0106f184 4c3bee          cmp     r13,rsi
fffff880`0106f187 0f8532450000    jne     fltmgr! ?? ::FNODOBFM::`string'+0x4bf (fffff880`010736bf)

fffff880`0106f18d 41b701          mov     r15b,1

fffff880`0106f190 48397500        cmp     qword ptr [rbp],rsi
fffff880`0106f194 753c            jne     fltmgr!TreeUnlinkMulti+0xe2 (fffff880`0106f1d2)

fffff880`0106f196 488b5618        mov     rdx,qword ptr [rsi+18h]
fffff880`0106f19a 488bce          mov     rcx,rsi
fffff880`0106f19d ff15b5e50000    call    qword ptr [fltmgr!_imp_RtlDeleteNoSplay (fffff880`0107d758)]
fffff880`0106f1a3 f0816630fffffeff lock and dword ptr [rsi+30h],0FFFEFFFFh
fffff880`0106f1ab 488b7500        mov     rsi,qword ptr [rbp]
fffff880`0106f1af 49895e10        mov     qword ptr [r14+10h],rbx
fffff880`0106f1b3 498bde          mov     rbx,r14
fffff880`0106f1b6 4c8bee          mov     r13,rsi

fffff880`0106f1b9 4885f6          test    rsi,rsi
fffff880`0106f1bc 75b9            jne     fltmgr!TreeUnlinkMulti+0x87 (fffff880`0106f177)

fffff880`0106f1be 4c8b7c2420      mov     r15,qword ptr [rsp+20h]
fffff880`0106f1c3 4c8b742428      mov     r14,qword ptr [rsp+28h]
fffff880`0106f1c8 4c8b6c2460      mov     r13,qword ptr [rsp+60h]
fffff880`0106f1cd e957ffffff      jmp     fltmgr!TreeUnlinkMulti+0x39 (fffff880`0106f129)

fffff880`0106f1d2 488b5618        mov     rdx,qword ptr [rsi+18h]
fffff880`0106f1d6 488bce          mov     rcx,rsi
fffff880`0106f1d9 ff1579e50000    call    qword ptr [fltmgr!_imp_RtlDeleteNoSplay (fffff880`0107d758)]
fffff880`0106f1df f0816630fffffeff lock and dword ptr [rsi+30h],0FFFEFFFFh
fffff880`0106f1e7 48895e10        mov     qword ptr [rsi+10h],rbx
fffff880`0106f1eb 4983c8ff        or      r8,0FFFFFFFFFFFFFFFFh
fffff880`0106f1ef 498bd4          mov     rdx,r12
fffff880`0106f1f2 488bcd          mov     rcx,rbp
fffff880`0106f1f5 488bde          mov     rbx,rsi
fffff880`0106f1f8 e8b3190000      call    fltmgr!TreeLookup (fffff880`01070bb0)
fffff880`0106f1fd 4885c0          test    rax,rax
fffff880`0106f200 0f85c1440000    jne     fltmgr! ?? ::FNODOBFM::`string'+0x4c7 (fffff880`010736c7)

fffff880`0106f206 488bf7          mov     rsi,rdi

fffff880`0106f209 4584ff          test    r15b,r15b
fffff880`0106f20c 74ab            je      fltmgr!TreeUnlinkMulti+0xc9 (fffff880`0106f1b9)

fffff880`0106f20e 4c8bee          mov     r13,rsi
fffff880`0106f211 eba6            jmp     fltmgr!TreeUnlinkMulti+0xc9 (fffff880`0106f1b9)

fffff880`0106f213 4983f8ff        cmp     r8,0FFFFFFFFFFFFFFFFh
fffff880`0106f217 0f8593440000    jne     fltmgr! ?? ::FNODOBFM::`string'+0x4b0 (fffff880`010736b0)

fffff880`0106f21d 488b19          mov     rbx,qword ptr [rcx]
fffff880`0106f220 4885db          test    rbx,rbx
fffff880`0106f223 750b            jne     fltmgr!TreeUnlinkMulti+0x140 (fffff880`0106f230)

fffff880`0106f225 488bc7          mov     rax,rdi
fffff880`0106f228 4883c430        add     rsp,30h
fffff880`0106f22c 5f              pop     rdi
fffff880`0106f22d 5d              pop     rbp
fffff880`0106f22e 5b              pop     rbx
fffff880`0106f22f c3              ret

fffff880`0106f230 488bcb          mov     rcx,rbx
fffff880`0106f233 e828410000      call    fltmgr!TreeUnlinkNoBalance (fffff880`01073360)
fffff880`0106f238 48897b10        mov     qword ptr [rbx+10h],rdi
fffff880`0106f23c 488bfb          mov     rdi,rbx
fffff880`0106f23f 488b5d00        mov     rbx,qword ptr [rbp]
fffff880`0106f243 4885db          test    rbx,rbx
fffff880`0106f246 74dd            je      fltmgr!TreeUnlinkMulti+0x135 (fffff880`0106f225)

fffff880`0106f248 ebe6            jmp     fltmgr!TreeUnlinkMulti+0x140 (fffff880`0106f230)

fltmgr! ?? ::FNODOBFM::`string'+0x4b0:
fffff880`010736b0 4883caff        or      rdx,0FFFFFFFFFFFFFFFFh
fffff880`010736b4 e8c72e0000      call    fltmgr!TreeUnlinkMultiDoWalk (fffff880`01076580)
fffff880`010736b9 90              nop
fffff880`010736ba e977baffff      jmp     fltmgr!TreeUnlinkMulti+0x46 (fffff880`0106f136)

fltmgr! ?? ::FNODOBFM::`string'+0x4bf:
fffff880`010736bf 4532ff          xor     r15b,r15b
fffff880`010736c2 e9c9baffff      jmp     fltmgr!TreeUnlinkMulti+0xa0 (fffff880`0106f190)

fltmgr! ?? ::FNODOBFM::`string'+0x4c7:
fffff880`010736c7 488bf0          mov     rsi,rax
fffff880`010736ca e93abbffff      jmp     fltmgr!TreeUnlinkMulti+0x119 (fffff880`0106f209)

fltmgr! ?? ::FNODOBFM::`string'+0x4cf:
fffff880`010736cf 488b4608        mov     rax,qword ptr [rsi+8]
fffff880`010736d3 4885c0          test    rax,rax
fffff880`010736d6 7408            je      fltmgr! ?? ::FNODOBFM::`string'+0x4e0 (fffff880`010736e0)

fltmgr! ?? ::FNODOBFM::`string'+0x4d8:
fffff880`010736d8 488bf0          mov     rsi,rax
fffff880`010736db e9d9baffff      jmp     fltmgr!TreeUnlinkMulti+0xc9 (fffff880`0106f1b9)

fltmgr! ?? ::FNODOBFM::`string'+0x4e0:
fffff880`010736e0 488b4610        mov     rax,qword ptr [rsi+10h]
fffff880`010736e4 4885c0          test    rax,rax
fffff880`010736e7 7408            je      fltmgr! ?? ::FNODOBFM::`string'+0x4f1 (fffff880`010736f1)

fltmgr! ?? ::FNODOBFM::`string'+0x4e9:
fffff880`010736e9 488bf0          mov     rsi,rax
fffff880`010736ec e9c8baffff      jmp     fltmgr!TreeUnlinkMulti+0xc9 (fffff880`0106f1b9)

fltmgr! ?? ::FNODOBFM::`string'+0x4f1:
fffff880`010736f1 498bd5          mov     rdx,r13
fffff880`010736f4 488bce          mov     rcx,rsi
fffff880`010736f7 e8941b0000      call    fltmgr!FindNextRightSubtree (fffff880`01075290)
fffff880`010736fc 488bf0          mov     rsi,rax
fffff880`010736ff e9b5baffff      jmp     fltmgr!TreeUnlinkMulti+0xc9 (fffff880`0106f1b9)

fltmgr! ?? ::FNODOBFM::`string'+0x504:
fffff880`01073704 e8a7d4ffff      call    fltmgr!TreeLookup (fffff880`01070bb0)
fffff880`01073709 4885c0          test    rax,rax
fffff880`0107370c 7417            je      fltmgr! ?? ::FNODOBFM::`string'+0x525 (fffff880`01073725)

fltmgr! ?? ::FNODOBFM::`string'+0x50e:
fffff880`0107370e 488bc8          mov     rcx,rax
fffff880`01073711 488bd8          mov     rbx,rax
fffff880`01073714 e8471b0000      call    fltmgr!TreeUnlink (fffff880`01075260)
fffff880`01073719 48897b10        mov     qword ptr [rbx+10h],rdi
fffff880`0107371d 488bc3          mov     rax,rbx
fffff880`01073720 e90cbaffff      jmp     fltmgr!TreeUnlinkMulti+0x41 (fffff880`0106f131)

fltmgr! ?? ::FNODOBFM::`string'+0x525:
fffff880`01073725 488bc7          mov     rax,rdi
fffff880`01073728 e904baffff      jmp     fltmgr!TreeUnlinkMulti+0x41 (fffff880`0106f131)

The current block starts at:

fffff880`0106f13e 488bdf          mov     rbx,rdi

The mistake in the earlier analysis was to miss the jump backwards several instructions afterwards:

fffff880`0106f158 ebe7            jmp     fltmgr!TreeUnlinkMulti+0x51 (fffff880`0106f141)

Thus, we really do have a small block of code under analysis, as shown below:

fffff880`0106f13e 488bdf          mov     rbx,rdi

fffff880`0106f141 488b4620        mov     rax,qword ptr [rsi+20h]
fffff880`0106f145 483bd0          cmp     rdx,rax
fffff880`0106f148 741b            je      fltmgr!TreeUnlinkMulti+0x75 (fffff880`0106f165)

fffff880`0106f14a 483bd0          cmp     rdx,rax
fffff880`0106f14d 720b            jb      fltmgr!TreeUnlinkMulti+0x6a (fffff880`0106f15a)

fffff880`0106f14f 488b7610        mov     rsi,qword ptr [rsi+10h]
fffff880`0106f153 4885f6          test    rsi,rsi
fffff880`0106f156 74ce            je      fltmgr!TreeUnlinkMulti+0x36 (fffff880`0106f126)

fffff880`0106f158 ebe7            jmp     fltmgr!TreeUnlinkMulti+0x51 (fffff880`0106f141)

The reader that pointed out the loop here also pointed out the intent of this code fragment:

The code’s overall intention is to walk a given tree, remove the nodes that match a given pair of keys, and return these nodes as a list (linked through the RightChild members only).

The analyst has identified the given tree and has in Figure 7 dumped for us the root node, as the TreeLink member of a _NAME_CACHE_NODE.  See there that the LeftChild member is corrupt but not with the value of RSI at the time of the fault. Execution will have worked some distance into the RightChild subtree until reaching a node that has the faulting RSI as either its LeftChild or RightChild member. Most plausibly, this tree was already corrupt when TreeUnlinkMulti was entered. A race condition, whether inside TreeUnlinkMulti or out, is just one of many ways that links in a tree might get corrupted.

Thus, at this point we’re pretty much at a similar conclusion of the analysis: we have a data corruption; it doesn’t seem likely the corruption occurred here but it is clear there is a data corruption.

As noted previously, we’ve seen similar data corruption – on a different computer system, but on Windows 7 x64.  In the first dump, we observed what appears to be a single bit error in memory.  By itself it led us to suspect the machine.  Seeing this on a different computer in similar circumstances makes us suspect there is some source of data corruption in the code.  While a race condition is a potential data corruption source, it’s not the onlypossibility.

Data corruption issues are often the most difficult to track down.  Frequently the source of the corruption shows up from a pattern that materializes after reviewing a number of crash dumps, not a single crash dump.  While we still do not know the actual issue here, we’ll be on the look-out for it in the future and invite our readers to share their own observations if they see it as well.

Article Name
Win7 Crash Redux
Based on feedback from the community, we revisit a Windows 7 crash dump analysis.