Windows System Software -- Consulting, Training, Development -- Engineering Excellence, Every Time.

WdfWaitLockAcquire and Code Analysis: When SAL Goes Wrong

WdfWaitLockAcquire and Code Analysis: When SAL Goes Wrong

I love Source Code Annotation Language, I really do. I’ve already blogged about how great it is in my previous post, SAL Annotations, Don’t Hate Me Because I’m Beautiful. I even spend time in our Kernel Debugging seminar discussing SAL as a way to avoid having bugs in your code in the first place.

The trouble with loving SAL is that you often find yourself in a position of defending SAL, which is a different thing entirely. I’ve discovered annotations that don’t quite work right, but I love SAL so I’m OK with its shortcomings. I’ve created annotations that no human could ever possibly parse, but see the beauty in the fact that the compiler knows exactly how I want my routine to work.

Is that parameter bytes or characters? SAL knows. Can that function raise an exception? SAL knows. And that’s the true power of SAL, the ability to very finely describe the behavior of your function, its inputs, and its outputs. But, Spider-Man Ben Parker said it best: with great power comes great responsibility.

In a recent NTDEV thread, community member Don Burn stumbled into an interesting problem caused by SAL:

    status = WdfWaitLockAcquire(Lock, NULL);

    if (NT_SUCCESS(status)) {

        WdfWaitLockRelease(Lock);

    }

In every case this is flagging with:

        warning C26165: Possibly failing to release lock in function.

It’s certainly reasonable to assume that if a lock acquire routine fails, you would return from the routine without the lock held. Thus, it would actually be an error to release the lock in that case. However, the Code Analysis warning here is saying that the caller must release the lock, even if the routine fails.

Referring to the documentation, WdfWaitLockAcquire is indeed documented to have a meaningless return value if the second parameter to the function is NULL:

The caller does not have to check the return value if the Timeout pointer is NULL, because in that case WdfWaitLockAcquire returns only after it acquires the lock.

The comment there says that the caller does not have to check the return value, but the Code Analysis warning is effectively saying that the caller must not check the return value. Referring to the SAL for the function we can see the issue:

_When_(Timeout == NULL, _IRQL_requires_max_(PASSIVE_LEVEL))
_When_(Timeout != NULL && *Timeout == 0, _IRQL_requires_max_(DISPATCH_LEVEL))
_When_(Timeout != NULL && *Timeout != 0, _IRQL_requires_max_(PASSIVE_LEVEL))
_Always_(_When_(Timeout == NULL, _Acquires_lock_(Lock)))
_When_(Timeout != NULL && return == STATUS_SUCCESS, _Acquires_lock_(Lock))
_When_(Timeout != NULL, _Must_inspect_result_)
NTSTATUS
FORCEINLINE
WdfWaitLockAcquire(
    _In_
    _Requires_lock_not_held_(_Curr_)
    WDFWAITLOCK Lock,
    _In_opt_
    PLONGLONG Timeout
    )
{

 

I’ll steal my colleague Peter Viscarola’s description of this annotation from the same post as it succinctly describes the issue:

So… note that:

_Always_(_When_(Timeout == NULL, _Acquires_lock_(Lock)))

This indicates that the lock is ALWAYS acquired… unconditionally… when there’s no timeout specified. Always.

So, the function can’t fail to acquire the lock if the Timeout parameter is NULL. That’s fine, but what’s the big deal? The problem is that the function currently doesn’t fail if the Timeout parameter is NULL. However, it’s not unreasonable to think that at some point in the future someone might change this function to return an error (it does return a status, after all). However, by creating a SAL annotation that definitively states that this function will always acquire the lock, it must now be preserved for all time.

Peter’s follow up describes the problem:

While the annotation matches the WDK WDF documentation, *I* don’t think either the annotation OR the documentation is wise.   I just doesn’t make sense to me that a function returns a status and that I MUST effectively cast the return to VOID — and must NOT check the status, and must ASSUME the call always works.  I don’t like this for a lot of reasons… it looks sloppy during code review, for one.  For another, it locks the WDF Dev Team into a specific implementation, which is just plain silly.  Suppose in KMDF V1.x, somebody decides when KMDF Verifier is enabled to check to see if the passed-in handle is valid, and return an error if it’s not.  That’s not inconceivable.  It would even be a good thing, IMHO.  But, most of all, not checking the return status Just Feels Wrong. 

Because the docs indicate that whenever Timeout is NULL the function can NOT fail, it seems to me to be entirely safe to do: 

    status = WdfWaitLockAcquire(Lock, NULL);

    if (!NT_SUCCESS(status)) {

        // trace and exit

        goto done;

    }

But of course, this fails Code Analysis. To fix the Code Analysis warning (Mr. Burn’s initial point), the best alternative is to add:    

_Analysis_assume_lock_not_held_(Lock);

to the error code.  So the code would read: 

    status = WdfWaitLockAcquire(Lock, NULL);

    if(!NT_SUCCESS(status)) {
  
        // trace and exit

        //
        // KMDF V1.15, VS 2015: Long comment here describing why
        // I'm forcing this assumption.  Yuck!
        //
        _Analysis_assume_lock_not_held_(Lock);
    
        goto done;
    }

 

Thus, the annotation not only forces this routine to preserve the behavior for future maintainers, but it also forces the caller of the function into a non-standard coding pattern. If the annotation wasn’t so stringent, we could all happily have a check for failure that never actually executed and be none the wiser. Instead we get more maintenance burden (everyone has to read that long comment and understand it), which is what we’re trying to avoid by using SAL.