Windows System Software -- Consulting, Training, Development -- Engineering Excellence, Every Time.

Some Things Are Best Left to Experts. Really.

Some Things Are Best Left to Experts. Really.

I know this is going to sound like I’m “feathering my own nest” but that’s really not my goal here. Instead, I’m trying my best to keep the unwary from getting themselves (and their projects) screwed by rushing in where angels fear to tread.

Over on our NTDEV list, we regularly get driver development questions from folks who know very little about Windows OS architecture and even less about writing Windows drivers. It’s clear that many of these questions are coming from engineers who get paid to write user-mode application code and have had a driver project suddenly thrust upon them. These people are not interested in becoming driver development experts.  Heck, they’re not typically interested in operating systems internals or driver development at all.  Rather, they just want to get the code working, shut their boss up, and go back to their real job.

Like everyone these days, these devs Google around for a sample in an attempt to collect just enough information to understand just enough about driver development to get something just working, and then hack/edit/change/modify/test it into what they want it to do.  This seems like a reasonable approach.  You don’t need to really understand anymore.  There’s the Internet!  I get it.  Heck, thanks to StackOverflow last week I became an instant expert on something called the Mersenne Twister (which, surprisingly, has nothing at all to do with weather… imagine that).

But I’m here to tell you that this approach is absolutely, positively, a Very Bad Idea™ for Windows kernel-mode software development. It never — ever– ends with a good result. This is because of four very important differences between programming in user-mode and programming in kernel-mode:

  • Windows kernel-mode programming is implicitly asynchronous.  Kernel-mode code is typically subject to parallel execution on multiple processors and/or in multiple threads on the same processor.  Sure, I realize that most user-mode devs are familiar with the basics of serialization. But most user-mode devs also don’t tend to write every bit of code with serializations and race conditions on their mind.  In kernel-mode, that’s exactly what we do.  All the time.  By necessity.
  • Windows kernel-mode programming implies a significant number of assumptions and constraints.  It needs to take into account a whole variety of implicit edge conditions.  Kernel mode programmers keep these things in mind as a fundamental part of their development process.  There’s a significant amount of “specialist knowledge” involved.  And before you roll your eyes and start accusing me of self-aggrandizement… let me add that this “specialist knowledge” isn’t very complex or difficult. Anybody could learn it. But learn it you must. Not knowing it or ignoring it leads to problems.  Driver guys, I’m talking about things like IRQL and process/thread context here (as just two basic examples). The result of all this is that you can write code kernel-mode that you test and that appears to work fine, but that fails spectacularly later under only slightly different circumstances. These different circumstances can even include timing, system configuration, or system load.
  • The consequences of failure in kernel-mode are rather more substantial than they are in user-mode.  In user-mode, what’s the worst that can happen?  A user loses the data in their program? Sure, that’s bad.  When you fail in kernel-mode (through coding error, such as dereferencing a NULL pointer, or design error such as when you re-enter some code that you didn’t realize could be re-entered) the whole system crashes or hangs.  The user risks loses all their data in everything they’re doing on their system.  That’s often worse than “bad.”
  • Getting something demonstrably working in kernel-mode is really the first step in a long process. Once it’s “working” you have to test it thoroughly, and when it fails you have to be able to diagnose the failure and fix it.  Yes… it’s just using the debugger, and yes… you can set breakpoints and single step and examine variables. And you can trace-out data to the console. And, hopefully, that’ll be enough.  Often, however, life isn’t quite that simple (driver guys reading this, think pool scribbles as an example).

Thus, an application developer might be able to “cowboy” a driver into working. Or, at least, working for a while.  But, because of the issues I’ve listed above, this can easily turn into unending iterations of “it works now, no I found another problem, I fixed that problem and now we’re all set, ooops I just found another problem.”  And no wonder:  If you start with a base of code that’s not necessarily well engineered or particularly well suited to the task at hand, and wind-up having to contort that code into ever-more hideous shapes to make it accommodate specific issues as they’re discovered, the outcome isn’t likely to anything resembling optimal, stable, or supportable in the long term.

Given all the above, what do I suggest?  Well, there are only two approaches that I think make sense:

  • Contract it out. I most strongly recommend against paying someone hourly as they try to solve your problem.  That can lead to surprises when your “simple” project turns into a multi-month learning exercise for whoever you hire, while you pay them for every hour they spend searching Google and trying code from StackOverflow. Rather, I recommend that you get a firm fixed-price quote for your solution that includes support after the code has been written.  If you can describe what you want, and the person you’re contracting knows how to build that solution, there is no reason (other than incompetence) that they shouldn’t be able to very quickly tell you how much it’ll cost you to have them build you that solution.  You might also consider having that quote include some sort of training so that you know how to build the source code you’ve been given and install it where it needs to work. Optionally, you can also opt for training that will allow you to maintain your solution in the future once the code is turned over to you.
  • Learn how Windows works and how to write drivers before trying to extend the operating system through the interface provided by the I/O Manager. Take a class or two.  Spend some time. Heck, you might even like it.  And by taking a class, you’ll avoid spending weeks randomly searching the Internet and trying to determine if the author of the page your reading knows more or less about the subject at hand than you do.

And yes, before you say it: Here at OSR we do offer services that meet both of the above criteria.  But, please… ignore that.  Take my advice and hire a consultant in Germany if you prefer.

But, whatever you do, please understand that writing a Windows kernel-mode driver isn’t like learning to write a SQL left-join.  You can spend an afternoon fooling with your SQL query until it finally returns what you need, and “screw it, it looks like it works so it’ll be good enough.”  Off you go, into the sunset, your problem solved.  In the driver world, it just doesn’t work that way.  I hope I’ve managed to make the reasons this is true a bit more clear.