Windows System Software -- Consulting, Training, Development -- Engineering Excellent, Every Time.

Windows 10: What’s New for FS/Filters?

In July 2016, Microsoft released the latest major update to Windows 10.  There are various names for this release, but Redstone 1 or RS1 for short, seems to be the one that most of the technical community uses.  The release is more officially called Anniversary Edition, and has been designated version 1607.  The changes in RS1 are also present in Windows Server 2016 (S16) as in many cases the binaries are identical between the systems.

To keep you up to speed, we’ll highlight a selected a set of changes that are likely to be of interest to people working with Windows file systems, including those developing file system mini-filter drivers.

The process we used to identify these changes was a systematic comparison of the header files (ntifs.h, ntddk.h, wdm.h, and fltkd.h), followed by some select careful observation of a running RS1 system.  While many of these changes will impact file system related drivers including mini-filters, some may impact other drivers as well.  If you still have a legacy file system filter driver, you’ll almost certainly want to be aware of these changes so you can accommodate them as you migrate to the mini-filter model.

Direct Access Memory Device Support

Windows 10 (and Server 2016) now include support for persistent memory storage devices. These NVRAM based devices use normal memory slots, but provide persistent storage, which can be used by a file system in order to obviate the need to do any RAM-based caching, due to the performance of the device itself.

Figure 1 – Persistent Memory
(courtesy of Viking Technology and used with permission)

Support for these new persistent memory devices has been present in Linux for various file systems and is now supported in Windows 10.  For those interested in the device driver aspects of this new technology, there are two new drivers:

· A Storage Class Memory bus driver (scmbus.sys)

· An SCM disk driver (scmdisk0101.sys)

SCM devices operate in one of two modes: Block Mode, or Direct Access Storage (DAS) Mode.  In Windows, the mode is chosen when the SCM device is formatted.  In Block Mode, SCM devices appear as “ordinary” storage volumes and thus maintain all existing storage semantics.  This provides perfect application compatibility, but requires I/O operations to traverse (a slightly optimized path through) the Windows storage stack.

DAS Mode is much more interesting.  It is supported in RS1 by the NTFS and ReFS file systems. The key benefit for applications using file systems that support the DAS Mode interface is it provides zero copy access. Memory mapping of the file directly maps the SCM memory into the address space, whether it is an application or the Cache Manager.

There are some behavior changes with the introduction of SCM in SAS Mode as well:

· Potentially different types of storage failure

· NTFS: no encryption, compression or TxF (transaction) support

· ReFS: no integrity streams, no cluster bands, no block cloning

· No Bitlocker support

· No volume snapshots

· No mirrored or parity support (Storage Spaces or Dynamic Volumes)

· Modification Time and USN Journal semantics are altered slightly (“last update” is the date of memory mapping)

· Directory Change Notification occurs at memory mapping time

Some file system filter drivers, notably data transformation filters (encryption/compression/HSM), may be impacted by these changes.  Filters must explicitly indicate if they support direct access storage (by setting the FLTFL_REGISTRATION_SUPPORT_DAX_VOLUME bit in their registration structure Flags field).  Otherwise, the filter cannot attach to SCM volumes.

Driver Level Changes

The DO_DAX_VOLUME bit is set in the device object of an SCM device (this is defined in wdm.h, ntddk.h and ntifs.h):

// DO_DAX_VOLUME - If set, this is a DAX volume i.e. the volume supports mapping a file directly
// on the persistent memory device.  The cached and memory mapped IO to user files wouldn't
// generate paging IO.
//
#define DO_DAX_VOLUME               0x10000000      

Filesystems that support DAX, should indicate this in their file system attributes (defined in ntifs.h):

//
//  When enabled this attribute implies that the volume supports byte addressable
//  mode.  A mode where reads / writes on mapped files happen directly on the
//  storage device, without going through the file system and the storage stack.
//
//  NOTE: This attribute only mean that the file system supports.  It doesn't
//  imply that the storage hardware is capable.  The storage hardware should be
//  a byte addressable persistent memory device, to let one map files directly
//  on the storage device.
//
#define FILE_DAX_VOLUME           0x20000000  // winnt

A file system (and filter driver) can test to see if a volume is a DAX volume by using the new FsRtl routine for this purpose (ntifs.h):

BOOLEAN
FsRtlIsDaxVolume (
    _In_ PFILE_OBJECT FileObject
    );

Because SCM type devices are memory, and memory is typically addressable in units smaller than the size of a sector, using SCM can introduce new failure modes to applications.  For example, in some SCM type devices a write operation could be interrupted mid-sector due to a system crash or power failure. There is a new I/O stack location bit (defined in wdm.h) to deal with this problem:

#define SL_PERSISTENT_MEMORY_FIXED_MAPPING  0x20    // valid only with persistent memory device and IRP_MJ_WRITE

This bit is optional, but when set indicates the SCM device is using Intel’s block translation table mechanism (defined in the NVDIMM Namespace Specification) that guarantees sector level atomic writes.  This feature provides more compatibility with the way traditional disks fail and thus minimizes the impact of unexpected types of failures when using SCM type devices.

In order to accommodate this new feature for NTFS and ReFS, there is a new Cache Manager routine for initializing the cache for persistent memory devices (defined in ntifs.h):

NTKERNELAPI
VOID
CcInitializeCacheMapEx (
    _In_ PFILE_OBJECT FileObject,
    _In_ PCC_FILE_SIZES FileSizes,
    _In_ BOOLEAN PinAccess,
    _In_ PCACHE_MANAGER_CALLBACKS Callbacks,
    _In_ PVOID LazyWriteContext,
    _In_ ULONG Flags
    );

Naturally, if your file system will support this feature, you will need to use this routine to support direct access storage as well.

The relevant new Cache Manager flag (defined in ntifs.h):

//
// The following flags are valid Flags parameter that CcInitializeCacheMapEx accepts
//

#define CACHE_USE_DIRECT_ACCESS_MAPPING         (0x00000001)

File system filter drivers attached to such devices need to understand that these do not behave like normal file systems.  For example, the normal pattern of Paging I/O is different than your filter might be familiar handling: specifically, persistent memory devices are directly accessed for cached I/O and thus do not cause any paging I/O activity.

Of course, this is just a brief overview of SCM devices on Windows.  There’s lots more to know about these new devices, that promise the possibility of a major shift in how certain data is stored on Windows systems.  We’ll write more about SCM devices in a future issue of The NT Insider.

Reparse Point Changes

In the past, NTFS has required that directories be empty prior to applying a reparse point.  Redstone now introduces support for reparse points on non-empty directories.  Note that not all reparse points support this feature: it is a characteristic of the specific reparse point value.

Microsoft has introduced a bit in the reparse point tag that will be used moving forward (from ntifs.h):

//
// The reparse tags are a ULONG. The 32 bits are laid out as follows:
//
//   3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1
//   1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
//  +-+-+-+-+-----------------------+-------------------------------+
//  |M|R|N|D|     Reserved bits     |       Reparse Tag Value       |
//  +-+-+-+-+-----------------------+-------------------------------+
//
// M is the Microsoft bit. When set to 1, it denotes a tag owned by Microsoft.
//   All ISVs must use a tag with a 0 in this position.
//   Note: If a Microsoft tag is used by non-Microsoft software, the
//   behavior is not defined.
//
// R is reserved.  Must be zero for non-Microsoft tags.
//
// N is name surrogate. When set to 1, the file represents another named
//   entity in the system.
//
// D is the directory bit. When set to 1, indicates that any directory
//   with this reparse tag can have children. Has no special meaning when used
//   on a non-directory file. Not compatible with the name surrogate bit.
//
// The M and N bits are OR-able.
// The following macros check for the M and N bit values:
//

There are reparse point tags for which the D bit is not set that may still be used on non-empty directories (they existed prior to this change).  Thus, in order to test for this situation a driver should use the new routine (defined in ntifs.h):

NTKERNELAPI
BOOLEAN
FsRtlIsNonEmptyDirectoryReparsePointAllowed(
    _In_ ULONG ReparseTag
);

In addition, there are new options to control the behavior of opening a directory with such a reparse point (defined in ntifs.h):

//  The following flags control behavior when a reparse point is encountered
//  on a directory that may be non-empty (one whose reparse tag is
//  recognized by FsRtlIsNonEmptyDirectoryReparsePointAllowed):
//
//    OPEN_REPARSE_POINT_REPARSE_IF_CHILD_EXISTS -
//    If the reparse point is on a directory that is not the final path
//    component and the next path component exists, reparse on the directory.
//
//    OPEN_REPARSE_POINT_REPARSE_IF_CHILD_NOT_EXISTS -
//    If the reparse point is on a directory that is not the final path
//    component and the next path component does not exist, reparse on the
//    directory.
//
//    OPEN_REPARSE_POINT_REPARSE_IF_DIRECTORY_FINAL_COMPONENT -
//    If the reparse point is on a directory that is the final path
//    component, reparse on the directory unless FILE_OPEN_REPARSE_POINT
//    is specified.
//
//  Specifying all three of the above flags is legal and simply means always
//  reparse on any directory reparse point.
//

#define OPEN_REPARSE_POINT_REPARSE_IF_CHILD_EXISTS               (0x00000002)
#define OPEN_REPARSE_POINT_REPARSE_IF_CHILD_NOT_EXISTS           (0x00000004)
#define OPEN_REPARSE_POINT_REPARSE_IF_DIRECTORY_FINAL_COMPONENT  (0x00000008)
#define OPEN_REPARSE_POINT_VERSION_EX                            (0x80000000)

This is one of the more interesting and potentially significant change to impact file system mini-filter drivers in RS1 and S16, since it is a change in behavior.  Previous releases did not permit attaching reparse points to directories.  This release now does.  This mechanism can then be used to detect directories where functionality or content is layered, such as in the new container support.  Filter drivers that have assumed directories with reparse points are empty must change to accommodate this new model if it impacts their functionality.

Buffer Flushing

NtFlushBuffersFileEx (defined in ntifs.h) now supports a new flush flag (defined in ntddk.h, ntifs.h and wdm.h, and even winnt.h) which is implemented by NTFS:

//
//  If set, this operation will write the data for the given file from the
//  Windows in-memory cache.  It will also try to skip updating the timestamp
//  as much as possible.  This will send a SYNC to the storage device to flush its
//  cache.  Not supported on volume or directory handles.  Only supported by the NTFS
//  filesystem.
//

#define FLUSH_FLAGS_FILE_DATA_SYNC_ONLY                 0x00000004

Thus, this ensures that the data is flushed both from the disk cache and from the CPU cache and persistently stored on disk, ideally using the underlying disk primitives (e.g., force unit access, FUA, when available) to optimally ensure that the blocks for this file are stored persistently on disk.

There is a corresponding bit in the minor function code for IRP_MJ_FLUSH_BUFFERS for this (defined in ntddk.h):

#define IRP_MN_FLUSH_DATA_SYNC_ONLY      0x04    //see FLUSH_FLAGS_FILE_DATA_SYNC_ONLY for definition of how this works

There is also a new I/O stack location bit for requesting asynchronous flush behavior (wdm.h):

//
//  IRP_MJ_FLUSH_BUFFERS
//

#define SL_FORCE_ASYNCHRONOUS           0x01
//
// SL_FORCE_ASYNCHRONOUS - a flush IRP specific flag in IrpStack to specify that the flush operation needs
// to be async. This behavior is needed by Spaces as Spaces issues flushes to disks in a pool serially and
// does not want to be blocked by disks whose flush operation is slow.
//

This impacts both file systems, which may choose to implement this new operation, as well as file system mini-filters, which should ensure they respect the behavior expected by any component using this interface.

This type of interface, permitting applications to control the caching behavior of their files, is important in high reliability systems such as databases where it is important for correctness to ensure that the data has been committed to storage.  General purpose applications should not use this because of the potential performance impact.

Correlation IDs

Correlation IDs are GUIDs that are used to uniquely identify a device across the volume stack, permitting event correlation.  There is a new routine for obtaining a volume’s correlation ID:

//
//  Routine to get a correlation ID (currently a GUID) that is common across
//  the volume stack and can be used to correlate events.
//

NTSTATUS
FsRtlVolumeDeviceToCorrelationId (
    _In_ PDEVICE_OBJECT VolumeDeviceObject,
    _Out_ GUID *Guid
    );

While there is no documentation about this routine yet, there is a code sample in the CDFS source code:

        //
        // Initialize the correlation ID.
        //
 
        if (NT_SUCCESS( FsRtlVolumeDeviceToCorrelationId( Vcb->TargetDeviceObject, &VolumeCorrelationId ) )) {
 
            //
            // Stash a copy away in the VCB.
            //
 
            RtlCopyMemory( &Vcb->VolumeCorrelationId, &VolumeCorrelationId, sizeof( GUID ) );
        }

Its use is for telemetry.  There is no matching code in the FastFat example, so it is not clear how widespread its usage is.  This is useful for drivers that need to associate specific information with a given volume in a persistent way, even if the file system instance on top of the volume might change, or if a single volume might be presented to the operating system multiple times.

Maximum Path Length Behavior

Note that as of RS1, Windows 10 now has a new registry parameter that lifts the Win32 260-character file name length limitation (MAX_PATH).  While this is not directly a kernel level change, it does indicate that file system components need to be carefully scrutinized to ensure they can handle long paths.

The new registry key (a DWORD) is:

HKLM\SYSTEM\CurrentControlSet\Control\FileSystem LongPathsEnabled

There’s also a Group Policy that can be used to lift the 260 character limit.  Look under:

Computer Configuration > Administrative Templates > System > Filesystem

The value to enable is Enable Win32 Long Paths.

The long paths setting is loaded during execution of the first Win32 file system API and is cached by Win32 for the lifetime of the process. 

Prior to RS1, you could enable long paths for NTFS for UWP apps and specifically manifested Win32 apps.

How much change this means for file systems and mini-filters is subject to debate.   It’s always been possible to use paths longer than MAX_PATH, as long as the path specified in UNC syntax (that is, the path started with \\?\).  So, while file systems and mini-filters have always technically needed to be able to handle long path names, many probably never saw long paths “in the wild.”

File Deletion (Disposition)

Microsoft has introduced three new FILE_INFORMATION_CLASS types in Windows  (defined in wdm.h):

FileDispositionInformationEx,            // 64

This involves introducing a new data structure, which is just a union of flags values (defined in ntddk.h):

#define FILE_DISPOSITION_DO_NOT_DELETE              0x00000000
#define FILE_DISPOSITION_DELETE                     0x00000001
#define FILE_DISPOSITION_POSIX_SEMANTICS            0x00000002
#define FILE_DISPOSITION_FORCE_IMAGE_SECTION_CHECK  0x00000004
#define FILE_DISPOSITION_ON_CLOSE                   0x00000008

typedef struct _FILE_DISPOSITION_INFORMATION_EX {
    ULONG Flags;
} FILE_DISPOSITION_INFORMATION_EX, *PFILE_DISPOSITION_INFORMATION_EX;

Some Windows mini-filter samples have been updated to include support for this new type of disposition, including the delete filter and name change filter.  Unfortunately, the FastFat sample has not been updated to support this new feature.

This change is introduced to allow managing more complex delete behavior that is apparent with the Linux subsystem on Windows.

Specifically, in Linux a file is deleted using unlink.  Once deleted, any open handles to the file remain valid and continue to work.  There is, however, no longer an entry within the directory for that file and thus the name may be reused.

Traditionally in Windows, file deletion is an intention that is not acted upon until the last open handle to the file is closed.  Indeed, in some scenarios, it is actually possible for an application to undo the intention, in which case the deletion does not occur.  Until the last handle is closed, there remains an entry in the directory and the file name cannot be reused.

These semantics do not mesh particularly well with one another.  Thus, Windows has changed to provide more nuanced behavior to bridge between them.  With this new behavior, the directory entry is deleted as soon as the handle where the file was deleted is closed.

As it turns out, however, due to a compatibility issue this functionality was disabled prior to RS1 release.  The Microsoft team have fixed the compatibility issue so it is once again enabled in current test builds of Windows, and is expected to be enabled in the next major Windows 10 update (“Redstone 2” AKA RS2).

Rename

RS1 includes two new rename options (defined in wdm.h):

    FileDispositionInformationEx,             // 65
    FileRenameInformationExBypassAccessCheck, // 66

The new information class FileRenameInformationExBypassAccessCheck is comparable to  FileRenameInformationBypassAccess Check. This is consumed by the I/O Manager and has the effect of disabling the security check associated with the rename operation, which can cause a deletion of the target file. Note that there is no new data structure, as it utilizes previously unused pad space within the rename structure (defined in ntifs.h):

typedef struct _FILE_RENAME_INFORMATION {
#if (_WIN32_WINNT >= _WIN32_WINNT_WIN10_RS1)
    union {
        BOOLEAN ReplaceIfExists;  // FileRenameInformation
        ULONG Flags;              // FileRenameInformationEx
    } DUMMYUNIONNAME;
#else
    BOOLEAN ReplaceIfExists;
#endif
    HANDLE RootDirectory;
    ULONG FileNameLength;
    WCHAR FileName[1];
} FILE_RENAME_INFORMATION, *PFILE_RENAME_INFORMATION;

And the corresponding new flags (defined in ntifs.h):

#define FILE_RENAME_REPLACE_IF_EXISTS              0x00000001
#define FILE_RENAME_POSIX_SEMANTICS                0x00000002

This preserves the previous semantics and adds the new POSIX semantics.  Much like the changes in delete, these are introduced to deal with the variation in behavior between Linux and Windows subsystems.  In Windows, a destructive rename will fail if the file is currently opened.  In Linux, it will succeed.  The open handle is still valid and continues to work, even though there is no longer an entry in the directory pointing to it.

For POSIX semantics to work, the application that has it open must have specified FILE_SHARE_DELETE when the file was opened.  Otherwise the rename will fail.

Microsoft has updated some of the mini-filter examples to demonstrate how to handle the new rename type, including the context filter, and name changer filter.  Fortunately, the impact for mini-filters is likely to be minimal, unless your filter needs to understand the new semantic behavior differences. In such a case, you would need to adjust your mini-filter accordingly.

Filter Manager

There were a number of small changes in Filter Manager in Redstone.  These changes were all in fltKernel.h.

The first involves the FLT_VOLUME_PROPERTIES where a previously reserved field has been converted to a flags field:

    USHORT Flags;

One flag is currently defined:

//
//  FLT_VOLUME_PROPERTIES Flags
//
//  VOL_PROP_FL_DAX_VOLUME - If set, this is a DAX volume i.e. the volume supports
//  mapping a file directly on the persistent memory device.  The cached and memory
//  mapped IO to user files wouldn't generate paging IO.
//

#define VOL_PROP_FL_DAX_VOLUME                      0x0001

A new operation for obtaining the attribution handle from the callback data was introduced:

_IRQL_requires_max_(DISPATCH_LEVEL)
PVOID
FLTAPI
FltGetIoAttributionHandleFromCallbackData (
    _In_ PFLT_CALLBACK_DATA Data
    );

And a mechanism for “propagating” IRP extension data between two callback data structures:

_IRQL_requires_max_(DISPATCH_LEVEL)
NTSTATUS
FLTAPI
FltPropagateIrpExtension (
    _In_ PFLT_CALLBACK_DATA SourceData,
    _Inout_ PFLT_CALLBACK_DATA TargetData,
    _In_ ULONG Flags
    );

Note that IRP extensions were first added in Windows 10 (1511). Neither of these two calls are documented yet.  Attribution was added for Windows 10 (1607) and is used as part of I/O rate management for containers.

As previously mentioned, Filter Manager now has a new flag that a mini-filter uses to indicate that it wishes to be notified about direct access storage volumes:

FLTFL_REGISTRATION_SUPPORT_DAX_VOLUME

Note that a filter which does not set this flag will not be asked to attach to such volumes.

Filters that perform secondary operations will need to keep this new mechanism in mind so that the attribution handle can be properly reflected between otherwise distinct calls.  A failure to do this will interfere with the I/O Rate Control Driver (iorate.sys) , which uses this information.  Thus, if your filter driver will be running on Windows Server 2016 systems, it is important to ensure you are properly passing along this information.

Summary

The Windows RS1 and S16 releases introduce a number of interesting new changes and while some are clearly described and documented, some also remain unclear to us at the present time.  Rest assured that as we expand our understanding of them, we will be sure to let you know as well!

Special thanks to Microsoft’s Shoily Rahman for assistance in completing our understanding of the Delete/Rename changes.