Windows System Software -- Consulting, Training, Development -- Engineering Excellence, Every Time.

Mitigating the $I30:$Bitmap NTFS Bug

Mitigating the $I30:$Bitmap NTFS Bug

Update 1/26

Our sources at Microsoft provided us the following info:

Microsoft is aware of a recent research blog post discussing a bug that could appear to lead to possible NTFS corruption.  

We have investigated this issue and confirmed that NTFS corruption does not result. When the symptom appears, a flag is set to run CHKDSK on the next reboot. The temporary condition is resolved when CHKDSK runs on the next reboot.  

We evaluate all reported security issues to determine whether the reported issue meets specific criteria for immediate servicing. Solutions to verified security issues that meet our criteria for immediate servicing are normally released via our monthly Update Tuesday cadence.

We encourage our customers to practice industry-standard best practices for security and data protection including a robust strategy to manage security updates, antivirus signatures, and passwords. More information on staying safe online is available at https://www.microsoft.com/en-us/security.

We’ll continue to update with any further details as we get them…

Original post follows

We were surprised by a Tweet claiming that simply accessing a path caused NTFS to report the volume as corrupt:

We met this with great skepticism (how could that possibly be??) but @jonasLk seems to have a strange knack for finding these sorts of things. After a bit of delay the magic path was revealed and, as advertised, triggered a very ugly warning:

TL;DR Is This Serious? And Just Get Me a Fix!

The file or directory is certainly not corrupt at this point. The warning it triggers is very ugly though and definitely causes a chkdsk to happen on the next boot. We also have a system here at OSR that will no longer boot after running a second chkdsk while playing with this. Between the ugly warning and the broken system here we think it’s worth mitigating until there’s a real fix released.

There’s no way to fix this problem without an update to Windows. In the meantime you can download our mitigation filter from GitHub. Signed binaries for x86 and x64 are available for you to install:

Release v1.0.0 · OSRDrivers/i30Flt (github.com)

Source code and installation instructions are available in the repo:

OSRDrivers/i30Flt: This is a simple filter that will block any attempt to access streams beginning with “:$i30:”. This stops the spurious corruption warning triggered on certain Windows 10 versions. (github.com)

I Want to Go Down the Rabbit Hole!

We decided that a bit of debugging was in order and went to the first tool of the trade: Process Monitor (ProcMon)! ProcMon will capture all of the native file system operations that occur prior to the corruption. Usually this will let us start to zero in on what operations trigger the corruption and where the error comes from. By default ProcMon will attempt to translate the native file system operations into their Win32 equivalent. This is not helpful for us so the first thing to do is to check Filter->Enable Advanced Output and disable the translation. Next we zero in on accesses to “$I30” by setting a path filter:

Now we hold our breath and execute the bad cd command (note that I used the “D:” drive in this case):

The offending operation here is reported as a FAST_IO_NETWORK_QUERY_OPEN. A few notes about this as it’s really confusing. Feel free to skip if you’re not interesting in the arcane details of the file system interface.

Queries, queries, everywhere…

FAST_IO_NETWORK_QUERY_OPEN has been a file system interface “forever”. It was originally designed to be used by LanMan Server (i.e. SMB Server) to query a bunch of attributes from the file in one query instead of multiple. Its usage has grown though because it’s convenient and it doesn’t necessarily have anything to do with the network.

That’s great and all, but the ProcMon output is misleading and the operation is actually a FAST_IO_QUERY_OPEN. This, again, is a long standing optimization in the file system interface to do one better than FAST_IO_NETWORK_QUERY_OPEN. Unlike Linux, Windows historically requires you to open the file before querying the attributes. This means over the network you need to send a request to open the file, send a request to get the attributes of the file, and then send a request to close the file. FAST_IO_QUERY_OPEN attempts to query the attributes of the file without actually opening it, thus saving the trip for the open and close requests.
ProcMon incorrectly reports the operation because, unfortunately, Filter Manager translates FAST_IO_QUERY_OPEN into an IRP_MJ_NETWORK_QUERY_OPEN “Pseudo IRP”. FAST_IO_NETWORK_QUERY_OPEN then becomes an IRP_MJ_QUERY_INFORMATION operation with FLTFL_CALLBACK_DATA_FAST_IO_OPERATION set (can’t believe you didn’t know that).

To make things even worse, newer versions of Windows 10 support yet another way to query the attributes of a file without opening it. This was specifically added to speed up stat operations performed by WSL processes. Filter Manager translates this into an IRP_MJ_QUERY_OPEN Pseudo IRP, thus guaranteeing confusion for all future generations.

That brings us back to the fact that it’s a simple attribute query that triggers the file corrupt error. We thought it might be interesting to see where the error originally starts, so we use the handy NTFS Status Debugging trick to file the first use of STATUS_FILE_CORRUPT_ERROR in NTFS:

0: kd> ed Ntfs!NtfsStatusDebugFlags 2
0: kd> ed Ntfs!NtfsStatusBreakOnStatus 0xC0000102
0: kd> g
Assertion failure - code c0000420 (first chance)
Ntfs!NtfsStatusTraceAndDebugInternal+0x3200b:
fffff801`27f14a7b int     2Ch
0: kd> k
 # Child-SP          RetAddr           Call Site
00 fffff88c`7e5d6940 fffff801`27fed088 Ntfs!NtfsStatusTraceAndDebugInternal+0x3200b
01 fffff88c`7e5d69a0 fffff801`27ff043b Ntfs!NtfsUpdateScbFromAttribute+0x668
02 fffff88c`7e5d6a90 fffff801`2801ed3f Ntfs!NtfsOpenExistingPrefixFcb+0x51b
03 fffff88c`7e5d6ba0 fffff801`2801fc40 Ntfs!NtfsFindStartingNode+0x3ff
04 fffff88c`7e5d6c90 fffff801`2801c17b Ntfs!NtfsCommonCreate+0x580
05 fffff88c`7e5d6f70 fffff801`246cd805 Ntfs!NtfsFsdCreate+0x1db
06 fffff88c`7e5d71f0 fffff801`27326ccf nt!IofCallDriver+0x55
07 fffff88c`7e5d7230 fffff801`2735bbd4 FLTMGR!FltpLegacyProcessingAfterPreCallbacksCompleted+0x28f
08 fffff88c`7e5d72a0 fffff801`246cd805 FLTMGR!FltpCreate+0x324
09 fffff88c`7e5d7350 fffff801`246cedf4 nt!IofCallDriver+0x55
0a fffff88c`7e5d7390 fffff801`24ac3c3d nt!IoCallDriverWithTracing+0x34
0b fffff88c`7e5d73e0 fffff801`24aec3ae nt!IopParseDevice+0x117d
0c fffff88c`7e5d7550 fffff801`24af566a nt!ObpLookupObjectName+0x3fe
0d fffff88c`7e5d7720 fffff801`24a00075 nt!ObOpenObjectByNameEx+0x1fa
0e fffff88c`7e5d7850 fffff801`24805fb5 nt!NtQueryAttributesFile+0x1c5
0f fffff88c`7e5d7b00 00007ffc`bf40c534 nt!KiSystemServiceCopyEnd+0x25
10 000000bd`2995ec78 00007ffc`bcd69b75 ntdll!NtQueryAttributesFile+0x14
11 000000bd`2995ec80 00007ff7`e6223e85 KERNELBASE!GetFileAttributesW+0x85
12 000000bd`2995ed20 00007ff7`e623d6a1 cmd!ChangeDirectory+0x261
13 000000bd`2995efd0 00007ff7`e62305e2 cmd!ChdirWork+0x41
14 000000bd`2995f000 00007ff7`e621c862 cmd!eChdir+0x1e822
15 000000bd`2995f280 00007ff7`e621bea1 cmd!FindFixAndRun+0x242
16 000000bd`2995f720 00007ff7`e622eba0 cmd!Dispatch+0xa1
17 000000bd`2995f7b0 00007ff7`e6228edd cmd!main+0xb3c8
18 000000bd`2995f850 00007ffc`bd4c7034 cmd!__mainCRTStartup+0x14d
19 000000bd`2995f890 00007ffc`bf3bcec1 KERNEL32!BaseThreadInitThunk+0x14
1a 000000bd`2995f8c0 00000000`00000000 ntdll!RtlUserThreadStart+0x21

We can see from this output that the error is first reported in NtfsUpdateScbFromAttribute. If we walk back up a frame we can locate the call site to NtfsStatusTraceAndDebugInternal:

0: kd> .frame /c 1
01 fffff88c`7e5d69a0 fffff801`27ff043b Ntfs!NtfsUpdateScbFromAttribute+0x668
Ntfs!NtfsUpdateScbFromAttribute+0x668:
fffff801`27fed088 mov     r9,qword ptr [rbx+0A8h] ds:002b:ffff808c`92586c18=ffff808c8c01d010
0: kd> ub
Ntfs!NtfsUpdateScbFromAttribute+0x645:
fffff801`27fed065 call    Ntfs!NtfsAttachRepairInfoPriv (fffff801`27f2fdf0)
fffff801`27fed06a movzx   eax,byte ptr [Ntfs!NtfsStatusDebugFlags (fffff801`27f7546c)]
fffff801`27fed071 test    al,al
fffff801`27fed073 je      Ntfs!NtfsUpdateScbFromAttribute+0x668 (fffff801`27fed088)
fffff801`27fed075 mov     edx,0C0000102h
fffff801`27fed07a mov     r8d,302EAh
fffff801`27fed080 mov     rcx,r13
fffff801`27fed083 call    Ntfs!NtfsStatusTraceAndDebugInternal (fffff801`27ee2a70)

Then we can disassemble NtfsUpdateScbFromAttribute to see how we might end up in this case (heavily abbreviated but thankfully there’s only one path!):

0: kd> uf Ntfs!NtfsUpdateScbFromAttribute
Ntfs!NtfsUpdateScbFromAttribute:
...
fffff801`27fecac4 call    Ntfs!NtfsLookupInFileRecord (fffff801`280125b0)
fffff801`27fecac9 test    al,al
fffff801`27fecacb je      Ntfs!NtfsUpdateScbFromAttribute+0x5f5 (fffff801`27fed015)  Branch
...
Ntfs!NtfsUpdateScbFromAttribute+0x5f5:
fffff801`27fed015 mov     eax,dword ptr [rbx+1F0h]
fffff801`27fed01b test    al,4
fffff801`27fed01d jne     Ntfs!NtfsUpdateScbFromAttribute+0xb1 (fffff801`27fecad1)  Branch

Ntfs!NtfsUpdateScbFromAttribute+0x603:
...
fffff801`27fed052 call    Ntfs!NtfsAttachCorruption_BadOrOrphanFRS (fffff801`280de594)
...
fffff801`27fed075 mov     edx,0C0000102h
fffff801`27fed07a mov     r8d,302EAh
fffff801`27fed080 mov     rcx,r13
fffff801`27fed083 call    Ntfs!NtfsStatusTraceAndDebugInternal (fffff801`27ee2a70)

So, if NtfsLookupInFileRecord returns FALSE and some flag is set then the file is reported as corrupt and the error is returned. I’ll save stepping through NtfsLookupInFileRecord, but stepping through you’ll eventually come to NtfsFindInFileRecord, which performs a case sensitive compare between $i30 and $I30. This does not pass, the routine returns FALSE, and chaos ensues. It’s interesting to note that there is a path in NtfsFindInFileRecord that will perform a case insensitive compare. This requires that the caller provide a non-zero value as one of its arguments:

fffff801`28012c19 cmp     byte ptr [rsp+0C0h],0
...
fffff801`28012c2b jne     Ntfs!NtfsFindInFileRecord+0x18f (fffff801`28012ccf)  Branch
...
Ntfs!NtfsFindInFileRecord+0x18f:
fffff801`28012ccf movzx   r8d,byte ptr [rsp+0C0h]
...
fffff801`28012ce3 mov     r10,qword ptr [Ntfs!_imp_FsRtlAreNamesEqual (fffff801`27fa2230)]
fffff801`28012cea call    nt!RtlAreNamesEqual (fffff801`24668cf0)

#if (NTDDI_VERSION >= NTDDI_WIN2K)
_Must_inspect_result_
_IRQL_requires_max_(PASSIVE_LEVEL)
NTKERNELAPI
BOOLEAN
FsRtlAreNamesEqual (
    _In_ PCUNICODE_STRING ConstantNameA,
    _In_ PCUNICODE_STRING ConstantNameB,
    _In_ BOOLEAN IgnoreCase,
    _In_reads_opt_(0x10000) PCWCH UpcaseTable
    );
#endif

Unfortunately there does not appear to be a way to change the argument to the function in this case, which rules out any kind of simple patch.

Note that there were a ton of changes in NTFS to better support case sensitivity and I suspect that this was a casualty of those changes. If you’re interested you can read all about case sensitivity in James Forshaw’s blog post:

Tyranid’s Lair: NTFS Case Sensitivity on Windows (tiraniddo.dev)

The Mitigation

When all you have is a shotgun everything looks like a clay pigeon…

Someone super wise and clever

We write a lot of file system filters here, so the immediate thought was, “We can fix this with a file system filter!” Granted we usually say this in response to any problem, but in this case it actually seemed like a good idea. So, we threw together a small filter to block any attempt to access a stream that begins with the path “:$i30:“. We do this as a case sensitive compare because that’s all that seems to matter ($I30 does not trigger the corruption warning).

We need to block IRP_MJ_CREATE, IRP_MJ_NETWORK_QUERY_OPEN, and IRP_MJ_QUERY_OPEN to sufficiently block the issue on a stock Windows 10 system. You can find signed binaries and the entirety of the code on GitHub, but here’s the majority of the logic:

    status = FltGetFileNameInformation(
                            Data,
                            FLT_FILE_NAME_OPENED |
                                FLT_FILE_NAME_QUERY_ALWAYS_ALLOW_CACHE_LOOKUP,
                            &fileNameInfo);

    if (!NT_SUCCESS(status)) {

        i30FltTracePrint(ERROR, "FltGetFileNameInformation failed! 0x%x\n",
                         status);

        goto Exit;

    }

    status = FltParseFileNameInformation(fileNameInfo);

    if (!NT_SUCCESS(status)) {

        i30FltTracePrint(ERROR, "FltParseFileNameInformation failed! 0x%x\n",
                         status);

        goto Exit;

    }

    //
    // Check to see if our evil prefix is present. Note that this is CASE
    // SENSITIVE (FALSE as Arg3)
    //
    if (!RtlPrefixUnicodeString(&i30Flti30PrefixPath,
                                &fileNameInfo->Stream,
                                FALSE)) {

        //
        // It's not the evil magic path...Don't trace anything (this is every
        // file opened or queried) just leave
        //
        goto Exit;

    }

    //
    // It's our evil path! Fail it...
    //
    i30FltTracePrint(ERROR, "Denying attempt to access %wZ\n",
                     &fileNameInfo->Name);

    Data->IoStatus.Status      = STATUS_ACCESS_DENIED;
    Data->IoStatus.Information = 0;
    callbackStatus             = FLT_PREOP_COMPLETE;

    //
    // Send an entry to the event log so there's a trace of this happening
    //
    EventWriteOperationBlocked(NULL,
                               Data->Iopb->MajorFunction,
                               fileNameInfo->Name.Length / 2,
                               fileNameInfo->Name.Buffer);

Stay Tuned for the Real Fix

We expect a real fix to be pushed by Windows Update at some point. We’ll let you know when that happens so you can delete the filter and stop using it.