Windows System Software -- Consulting, Training, Development -- Engineering Excellence, Every Time.

Is DMA Cache Coherent on ARM?

Is DMA Cache Coherent on ARM?

On NTDEV we had an interesting discussion about interlocked operations, which, being an NTDEV discussion, took many twists and turns along the way.

Out of all the various tangents that occurred, one stuck out to me as it’s something that was worth highlighting: who is responsible for guaranteeing cache coherency of DMA operations on Windows? For example, imagine a simple case where the processor has some data in a cache that has not been flushed out to RAM. If a device is told to DMA to/from that memory location, who guarantees that it is working with the most recent data?

The surprising answer on Windows has always been: the driver responsible for the device. The fact that DMA operations are not cache coherent is part of the (loosely defined) Windows Driver Model contract dictated by the Kernel and HAL, which is why driver writers are required to call KeFlushIoBuffers prior to performing a DMA operation.

Of course, the majority of us have been working on x86 based architectures for quite a while now, and the x86/x64 just so happen to guarantee cache coherency with respect to DMA. This made the call to KeFlushIoBuffers an annoying trivia point that no one cared about, made worse by the fact that implementation of the API is an empty macro:

#define KeFlushIoBuffers(Mdl, ReadOperation, DmaOperation)

The Itanium architecture provided no such guarantee in some cases and thus KeFlushIoBuffers performed some processing, but no one really cared (“yeah, I’ll be sure to add that if my driver ever runs on an Itanium…” *snicker*). However, now along comes a platform that we do (or at least possibly will) care about: ARM.keflushio_arm

So, the question becomes: is DMA cache coherent on the ARM? The answer is of course in the ARM documentation, but we can get a quick answer by checking out the implementation of KeFlushIoBuffers. If it does nothing, then it’s cache coherent like the x86. If it does something, then it’s not cache coherent in some way like the Itanium. Graphing the function in IDA Pro (as shown at the left) gives us a pretty definitive answer, I think.

Quite a change from the x86 implementation! I’d hate to think what happens if you don’t call this function on the ARM.

It’s worth noting that Version 3 of the DMA APIs implicitly perform the flushing on your behalf, thus the call to KeFlushIoBuffers would not be necessary. In addition, drivers using the Windows Driver Framework don’t make these calls directly and are therefore safe (as if you needed more reasons to get rid of your WDM code!).