Welcome to MSDN Blogs Sign in | Join | Help

He’s Gone

It has been a tough week, and a busy one.  I’m getting ready to leave on a vacation, and trying to get all those last-minute things done. 

While my last post discussed the genesis of what I call the Configuration Agent and the WDF QA Universal Setup Job, this week’s focus has been entirely on another of my oddball notions- it started as a desire to build a more precise fault injection engine for testing the KMDF loader.  With that California background, the project name was easy- Fault?  Why, San Andreas, of course! 

At a high level it was a programming model built around IAT filtering of calls from a driver into the kernel.  I wanted precision, so I could specify behavior by thread and IRQL, program each DDI independently, allow a driver to do its own filtering [i.e. get a callback with context for a DDI call under given conditions]- and with enough flexibility and locking behavior that the behavior could be reprogrammed within such a callback. For instance, in one working application of mine a filter on calls to IoCallDriver from the KMDF runtime were routed on specific threads to a callback in the test driver which then turned off all call logging, as it meant the call was leaving it’s stack location- I was using the logs from calls to various resource allocation DDI to make sure the framework did not allocate resources under conditions in which we contractually would not fail- so I wanted to stop logging before I called IoCallDriver and to restart it after it returned [I knew the driver handling it was synchronous- otherwise I would have to hook the completion side of things, of course].

So the concept went well beyond fault injection, but it’s always been a side-line.  My initial tests wound up being very white box, and a refactoring of the loader code  meant a similar rewrite of the test was necessary, but there was no time to do that…  But with Windows 7 winding up [since we’ve announced dates, that can’t qualify as a secret anymore], I have time to go back and try to at least do a few new things beyond mere maintenance.  Such time is rare and fleeting, so I’ve been busily making the most of it.

So today was busy- I worked a bit of overtime on a COM automation server (in process) exposing a scriptable interface to my engine for some of the other team members to use, and got it at least working well enough to demo.  Still have my packing to do, and dinner to make and so forth, so home I headed.

Then I turned on the television, and immediately heard the news about Michael Jackson.  Whoa…

On those old tapes I’ve been scraping for my archives I’ve got many covers of his material from the time I was playing in bar bands: Billie Jean, Thriller, Beat It- and more.  I’ve not posted them because frankly I never thought I did any of them justice- a lot of my material from those days has rough edges [anyone who listens to what I have posted can hear that], but these were just not good enough.  But add me to the billions of fans, then and since, sure…

The other thought that came early on was that in my previous post I mentioned having learned to keep some of my more controversial opinions to myself- one of those was intense skepticism about the molestation accusations.  I knew I wasn’t likely to sway anyone there- so why bother?

He’s gone, and he was younger than I am- scary stuff.  Life is fleeting, and it was an unexpected reminder of that.

But, it’s time to pack my CDs and games and controllers, write those last few checks for bills coming due during the vacation. eat that dinner, and get ready for a night’s rest prior to that final day of work, so another slapdash post, and back to work…

There was some offsetting good news today- besides finding out I really do still remember enough about COM to roll out an in-process automation server on the QT- without using ATL of course, where would the challenge be in that [answer: in learning something new for a change- you old dog!]?  But of course it falls into that semi-confidential area where I’ll just have to sit on it for a while [not that it’s likely to shatter the world if revealed anyway, but circumspection is worthy of practice even for an old blowhard like myself].

I’ll try to do something more substantial someday- these days I’ve just not got the time to really write anything too terribly technical.  My apologies to those diehards who still try to read this awful stuff I’m churning out of late…

Crazy is as crazy does

When I was an undergraduate at CalTech, my student residence- Ricketts House- had a tradition where we relieved some of the tedium and stress of the annual election for house officers [President, Vice-President, Social Vice-President, Secretary and Treasurer was the lineup if memory serves me well] by nominating and electing people to “Minor Offices”. [Looking at the Wikipedia link, I see I probably fit the Scurve profile more than I had realized- I’ve had many problems since with holding controversial and unpopular opinions- now that I’m old I just keep them to myself more than I used to].

This process was largely temporized on the spot, and usually fueled with socializing, various recreational substances and a great deal of levity.  Some offices had “duties”- for instance, the RLPL (Resident Lecturer on Pornographic Literature) was in charge of the acquired collection of said “literature”- and in self-defense I’ll add that this had been an all-male school until a year or two before I arrived.  Others (such as the “House Christian”) were largely symbolic- often made up and discarded as seemed appropriate to the assembled wits at the time.  [To put the RLPL in context, I remember utilizing the service to read a Playboy interview from the mid-60’s, of Jerry Garcia, so it wasn’t always the porn that was the draw].

My purpose for the aside and the title of this post combine.  During my tenure, while I never held an elected office, I was elected several times to one particular office- “House Crazy <a word I cannot use on MSDN, or email, or say on public broadcast- I suppose “F-bomb” comes close enough>”. 

I’ve always had a quirky sense of humor, and I displayed it a lot during those years, so I suppose I deserved the title- things like taking a philistine swipe at some of the campus art being acquired by building a “sculpture garden” that included discarded urinals I dragged up from the basements [I swear you could find anything down there, if you looked long enough], stacks of chairs, umbrellas, discarded lumber- all tastefully arranged for display of course- and I really liked building things off-the-cuff with precarious and eccentric balance to them, so there was some “art” involved, in at least one sense of the word.  The darbs sort of liked it- I doubt the Master of Student Housing liked it, since we titled the display with his name.  Never can tell, though- some people actually can take a joke [I usually try to]…

Another factor was being a pyromaniac- I can remember holding a drinking glass with some burning gasoline in it [not a lot- maybe a few millimeters, before it ignited] at one point [only because I had to find a place to toss it, to avoid injury to myself- with a bunch of people crowded around it took a while to find one- even at that somebody insisted I was throwing it at them- not the case- at that distance I wouldn’t have missed…], and at another being asked to detonate an explosive mixture someone had painted on an exterior window and then decided they didn’t feel safe either leaving or igniting- in normal form, I used a long rolled up piece of newspaper, ignited one end, set the other against the frame, and then ran, barefoot as always, across the roof tiles, which were often littered with shards of gritty nasty stuff, but my feet were long since tougher than that [of course this wasn’t on the ground floor] to get out of range.  But it was a bit of a letdown, that time- not sure it even shattered the window- where’s the fun in that?

I suppose I did earn the title [and I believe I paid for most of the damages, as well, but that would be even more off-topic stories]…

So today I find other outlets, and wanting to write something before I go on vacation without having posted anything in almost two months, I’ll pick one at random and see if it’s worth talking about.

Endless Dogfood

“Dog Food” is what we refer to new and unreleased software as, here at Microsoft.  Using it means you are helping those who are developing the software as an adjunct tester, and being a helpful sort [as are most of the other ‘softies I know], I participate when I can. But it brings its issues, particularly if you are using it for real work- and if you’re not using it for real work, then just how is it you are helping?

To test KMDF and UMDF, we use a lot of tools- many of which you know.  But to maximize the potential return of our test effort, I want to “dog food” as much of it as possible.  That’s not as simple as it sounds- these are just a few of the issues experience has shown me:

  • most products under development are built frequently, with multiple versions shared out from their “release shares”. 
  • The form in which those shares are presented [in terms of paths to specific processor architectures, for instance] has some variations.
  • Sometimes when a product goes bad, you have to get it from somewhere else, such as a private build.
  • Or you may have to stick with a previous fixed version for a while.
  • But most of the time you want to get as much as possible from the same place, because some people won’t even talk to you if they see you doing much mixing and matching.
  • If that mixing and matching includes things that go into driver packages [and it does for us], then cataloging and signing becomes an issue.
  • Worse yet, since we do our own servicing, we sometimes have to run signed content used to test our older products whose signatures have expired and are no longer valid.

We used to make horrid ad-hoc changes to handle these things- breaking paths down into “parameters” banged together in explicit paths to form a full path, for instance.  Done differently for each automated test, of course.  To get a test run by someone else, you had to communicate a complicated list of these “parameters” to whomever was running it, and they had to get all of them right.

One day I decided I’d had enough, and out of that was born one of those custom designed solutions that is probably as quirky as the problem it solves.  Not the global sort of solution Microsoft sells to people, but the kind of stuff a lot of people in IT or support roles have to build just to survive.

In the amalgam that is my work life, this became part and parcel of a task I was doing at the same time, which was to bring some commonality to all of our test setups, so it was easier to look at a machine on which a problem had been reported and know what you were actually looking at.

Simple Mind, Simple solution

So I started with some basic rules:

  1. One setup job goes out, finds all the tools we use, copies them to the test machine, and sets them up.
  2. Individual tests never go to the network for anything- if someone needs something, we add procurement of that to the setup job.
  3. The setup job will own making sure that everything is signed.
  4. The setup job is going to be very smart- no more complicated instructions.  Our most common test requirements should be its defaults, and if we need something run outside our group, all that should be needed is one reasonably mnemonic string value, used as a parameter, to identify the machine configuration we are testing.
  5. The setup job is going to log where everything that is pulled off the network came from in easily accessible form, including a history of such locations from previous runs of that same job if it gets run more than once after a new OS is installed.
  6. It will be easy to tell it to get the most recent version of anything we use- but it will also be possible to tell it to get a specific version.  Our team will own making sure that knowledge is there in a fashion that we can meet bullet #4 above.  Bet you thought I’d forgotten I’d mentioned dog food earlier, didn’t you?
  7. Everything we use is to come from a supported source- not some file someone stuck on a share somewhere that had no clear ownership [or symbol files, usually…].
  8. Experience says there will always be some need for temporary exceptions to that last one- so put all those eggs in one basket and watch that basket closely.

Well, WTT does provide some facilities aiding in what we wanted, but it was not at all close to everything we needed.  So it was time to invent.

First, I broke the job into phases:

  • An Analysis phase where it takes that single parameter, along with some database of recorded setup knowledge, and determines for itself [by checking the appropriate shares] if all the pieces needed are available, and if they are, which specific places to get them from.  The results of this phase are “cast in concrete”, and the subsequent phases will use this information and only this information- not go looking again [which causes problems when things change between the analysis and when they get around to looking, for instance].
  • A “staging” phase where anything that isn’t already on the machine is copied to the machine.  The places copied from are determined by the preceding analysis.  If it starts out on the network, and you need it on two places on the machine- this phase has to copy it to the machine first, and then from one place on the machine to the other.  It is also not a problem if later jobs also do such copying- as long as they never go to the network again.
  • Finally, a “configuration” phase where tools are installed [from MSI or such already copied in the staging process], registries tweaked, and the machine is made ready for testing.  At this point, any of our jobs should be able to be run in any order [and one benefit here was we can now test either KMDF or UMDF where we had previously been one or the other].

That done, it was time to make it a reality…

Maybe not so simple, after all

First of all I needed that way to record values so they were usable both by WTT and by scripts and command line apps and whatever else without too much intervention.  Some research and experience aforethought turned up the answer, and it was almost as old as I am- the machine’s global environment is accessible to all of them, it can be updated with simple registry operations, and you can even make the changes immediately recognizable to all the pieces that really count by broadcasting a Windows message when you finish the updating [Wei indirectly clued me in on the last part- he observed that if he used the control panel the changes were immediate, even on open command prompts, but not after using one of the early versions of this code].

Then there perhaps would have been the signing issue, except that we had been working on that one- we had found that we could catalog and self-sign everything and that seemed to cover everything [eventually this has had some holes, on pre-Vista OS, but I’m working on them when I can].  But we were using a copy from some WDK stuck on a share- see rule #7 above!

So, I now needed some way to encode directions about where to find things how those directions changed under various circumstances, and how they differed under certain test requirements, and something that would take that information and create those environment variables.  It also had to meet my logging and reporting requirements.

I wound up with another ancient solution- Windows private profiles [aka INI files].  As I’ve already taken up too much time composing this, I’ll have to continue the story at a later date…

Not all of them

Alas, all that stuff I mentioned in my last post won’t make it into the Windows 7 WDK.

As we get close to the final days of the product cycle, it gets harder to justify code changes.  In this case, my own choices in handling odd cases were my undoing.

In a nutshell, if I can’t identify the KMDF client driver version, I treat it as if it is the same as the runtime.  Since the only version I can’t identify is KMDF 1.9, and a 1.9 driver won’t even load unless the runtime is 1.9 or higher and there is nothing higher, I can’t really claim something is broken besides some text that says what the real version is.  All the rest of the relevant UI function will be correct, as de facto, it will be treated as a 1.9 driver, and that is what it is.

Well, they say there will be service packs for the WDK, perhaps this will make one.  If not, at least I had my fun, and the tool does work pretty much as it should.

I just designed it too well- yeah, that’s the excuse du jour…

Exceptions rule!

I’ve decided to revert to storytelling mode today.  So sit right back and you’ll hear a tale, a tale of my fateful trip (or click something useful- you’re the one deciding, and I’m the one typing to satisfy whatever inner demon has driven me to do this today)…

It began when someone found a bug in the static version of KMDF- the one almost nobody gets to use and which we have been trying to discontinue any support.  But people were still using it, and we had no clear migration path, and there was a bug.  So we decided to begin testing it.  Poor Neslihan got that task (well at least it wasn’t me, this time).  Soon we had added a static KMDF driver to our list of test drivers used in our daily automated testing.

Then, of course, we got the bugcheck.  As often happens, I played “Johnny-on-the-spot” and (virtually speaking) leapt onto the remote debug session accompanying the report.  Buffer overflow?  In FxDriverEntry?  But only on Itanium (I checked the other architectures that ran the same test, of course- no problems)?  What in blazes is this about?

Well, the method is well documented, and not all that hard to understand- a “cookie" gets initialized, it gets stored to the local stack frame in such a way that code overflowing buffers in the stack frame will overwrite the stored copy.  On exit, code gets called that checks that value to see if that happened.  It’s a little more complicated than that, and I’m not going to be precise, because part of that complication is related to securing the method against the sort of people who like to exploit buffer overflows [so I’m not going to make the method clear for them- they can do their own research without my assistance].  The bugcheck analysis will even tell you the two values- expected and actual, making it easy for people who really think overflows are a bad idea to make a quick guess as to where the overflow came from.

But the circumstances seemed suspicious, so I went and read the source code- H’mm- there’s a default value the cookie is initialized to…  Interesting- that’s the value we have- but it doesn’t match the expected value.

At which time, something I’ve known for years hit me like the proverbial sledgehammer- our entry code calls GsDriverEntry (which supports the stack probes inserted by the GS compiler switch, hence its name) when it finishes.  GsDriverEntry initializes the cookie, which means that that was why the values didn’t match.  Like the sword of Damocles, this had been hanging over our heads for years- the first time the compiler decided to do stack probes in our entry code, everything would break.  Ouch…

I’ll leave a bunch of story out here- things got “interesting” at that point, but the problem eventually got solved, both for us and for other people facing similar issues, as well.  Short answer is that everyone now initializes the cookie right at the start, just as should happen…

But a while back I had occasion to tweak the build process for the WdfVerifier WDK applet, and afterwards I ran it briefly to make sure I didn’t break anything in the process.  Oh, joy of joys- NONE of the 1.9 client drivers are being properly identified.  They are all identified as “Inbox”, which is what I do when the client identification method I described last year fails on me.

Already beginning to panic, I think- did someone change FxDriverEntry, and I didn’t even notice it?  So I go to our source control system, and look at the change history.  The most recent change is the fix for the problem with the stack probes (yes it occurred on static, but what scared me then was it could have happened to any version, because were weren't doing things properly).  But that just calls library routines to initialize said cookie… Oh, blazes, those routines must be calling an import I wasn’t accounting for!!!  Why???  Because I NULL the IAT to force access violations, and I handle the exception by giving up on getting the true version, and fall back to calling it “Inbox”- which is, after all, exactly what I am seeing.  Oh, well- so much for my having thoroughly considered all the test and product consequences of that change when it was made…

Well, easy enough to find out what that import might be.  One nice trick Ilias showed me is that you can open any binary in WinDbg as a crash dump, and happily resolve symbols and disassemble code.  So I pick a random driver on my dev box, and do so.

What, no imports?  But, but--- hmm- it uses a fixed address (load of a 64-bit immediate value into RAX, since my dev box is an X64, thank you- you x86 dinosaurs can keep your wimpy processors).  What’s with this?

The answer lies in wdm.h, of course- KeQueryTickCount turns out to always be a macro- on x86 and IA64, it accesses KeTickCount [which is an import, and if you followed my earlier tale, it’s clear adding an import to my existing hack-o-matic mechanism is a trivial task]- but on X64, it accesses an essentially hardcoded address in a “SharedUserData” area.  This snippet is from wdm.h, so you can see for yourself…

#define KI_USER_SHARED_DATA 0xFFFFF78000000000UI64

#define SharedUserData ((KUSER_SHARED_DATA * const)KI_USER_SHARED_DATA)

#define SharedInterruptTime (KI_USER_SHARED_DATA + 0x8)
#define SharedSystemTime (KI_USER_SHARED_DATA + 0x14)
#define SharedTickCount (KI_USER_SHARED_DATA + 0x320)

#define KeQueryInterruptTime() *((volatile ULONG64 *)(SharedInterruptTime))

#define KeQuerySystemTime(CurrentCount)                                     \
    *((PULONG64)(CurrentCount)) = *((volatile ULONG64 *)(SharedSystemTime))

#define KeQueryTickCount(CurrentCount)                                      \
    *((PULONG64)(CurrentCount)) = *((volatile ULONG64 *)(SharedTickCount))

My. my. my- this is a problem- no import I can just tweak to point into my code.  For a plus, at least the value can’t change (if it did, existing drivers would, after all, cease to work)- well, at least not easily, so I’ll save any remaining paranoia about that for another time.  But I’m going to access violate trying to access that address in user mode (not to mention the reams of experimental evidence I had just accumulated by noticing the problem in the first place)…

So in reading about exception handling, a phrase happens to catch my eye- putting it in my words, it says that when continuing an exception, you can alter the context record supplied to you with the exception (and I know this process well, but that’s perhaps a story for a different time).  Whoa- that must mean I can change the contents of the registers programmatically, and then tell it to repeat the failed instruction!  Now, I can’t find that illustrated in any of the samples the material I’m reading points me to [nor could I find it an an internet search, although the latter was by no means exhaustive].  But that HAS to be what it means, right?

So I start coding- first the half dozen or so lines needed to handle the KeTickCount import.  Then an exception filter [only for AMD64, of course].  The logic is my usual bit of precise work: If it is an access violation, and if it is a read, and it is a read of that exact address, and one of the integer registers in the context record has that same address, then change that register’s entry in the context record to the address of the proxy I’ve decided to keep in WdfVerifier, and tell the OS to repeat the failed instruction.  I began with RAX, because after all, that’s what I had seen in my investigation, and it seemed the most likely place for a while, but I added the whole set since under the circumstances, it seemed unlikely to do any harm.  Anything that didn’t match that exact pattern, and I gave up- just execute the handler, which does nothing.  The attempt to run the driver entry code to extract the client version will fail, but WdfVerifier keeps running, and the machine itself is still quite safe from my hackery.

Worked the first time, it did (not counting my usual compiles to get rid of typos).  Problem solved.  Total work time from start to finish- something like 5 or 6 hours- good enough- after all, I did have to do some research…  Of course, there’s always somebody who can do it better, faster, and quicker- and they usually jump out of the woodwork if I start talking about things in that fashion, but it’s my story, so I’ll brag now and regret it later…

Exceptions do indeed rule!  I love it when a plan comes together…

A glimpse now at handling the new import…

    //  The descriptors are an array of entries per module, each terminated with an all-0 entry.
    //  Each entry has an RVA for the module image name, and a second for a 0-terminated list of RVAs to the structures used for
    //  resolving names of exported entries from the module (in loading these get resolved to real addresses and are plugged into
    //  the loaded image's import address table in the same order).  So we now know how to write an image loader if we need to...

    //  First, find the descriptor for the KMDF loader, and quit if it isn't there.
    DWORD   StringCopy = (DWORD) -1, WdfBind = (DWORD) -1, InitEvent = (DWORD) -1, TickCount = (DWORD) -1; 

    //  Then look for the Bind and Unbind entry points.  Case counts!

    enum {OutdatedTechnology, HasBind, HasUnbind, OneCoolDriver};
    
    BYTE        TargetIdentification = OutdatedTechnology;

....

            else if  (IsKernel)
            {
                if  (0 == strcmp("RtlCopyUnicodeString", (PSTR) NameDescriptor.Name))
                    StringCopy = ImportsIndex;
                else if  (0 == strcmp("KeInitializeEvent", (PSTR) NameDescriptor.Name))
                    InitEvent = ImportsIndex;
                else if  (0 == strcmp("KeTickCount", (PSTR) NameDescriptor.Name))
                    TickCount = ImportsIndex;
                
                ImportsIndex++;
            }
.....


        DWORD       IATSize, RelocationsSize;
        LONGLONG    OurFakeTickCount = 0x8badf00ddeadbeefI64;   //  ersatz Tick count         
        PVOID*  ImportAddresses = (PVOID*) 
            ImageDirectoryEntryToDataEx(DriverImage, TRUE, IMAGE_DIRECTORY_ENTRY_IAT, &IATSize, &Unused);
        
        PIMAGE_BASE_RELOCATION  Relocations = (PIMAGE_BASE_RELOCATION)
            ImageDirectoryEntryToDataEx(DriverImage, TRUE, IMAGE_DIRECTORY_ENTRY_BASERELOC, &RelocationsSize, &Unused);

        if  (NULL == ImportAddresses)
        {
            FreeLibrary((HMODULE) DriverImage);
            return  true;
        }

        //  Plug them in here, and we are good to go!

        DWORD   OldProtect;

        if  (!VirtualProtect(ImportAddresses, IATSize, PAGE_READWRITE, &OldProtect))
        {
            FreeLibrary((HMODULE) DriverImage);
            return  true;
        }

        memset(ImportAddresses, 0, IATSize);

        ImportAddresses[StringCopy] = FakeOutStringCopy;
        ImportAddresses[WdfBind] = CollectVersion;
        if  (InitEvent != (DWORD) -1)
            ImportAddresses[InitEvent] = FakeEventInit;
        if  (TickCount != (DWORD) -1)
            ImportAddresses[TickCount] = &OurFakeTickCount;

        VirtualProtect(ImportAddresses, IATSize, OldProtect, &OldProtect);

So for some other glimpses, this is one part of the other hackery (complete with my elementary-level annotations):

class AllKMDFDrivers
{
    MachineKey&             Owner;
    KMDFDriverList&         InstalledKMDFDrivers;
    LoadedDrivers&          CurrentlyLoadedDrivers;
    LoaderDiagnosticsFlag   LoaderFlag;
    String                  RuntimeVersion;
    DWORD                   Major, Minor, Build;

    bool            ServiceUsesKMDFLoader(__in PCWSTR ServiceName, __in RegistryKey& Service, __out MyBindInfoAlias& Binding);
    static void     FakeOutStringCopy(__in PVOID, __in PVOID);  //  Fake RtlCopyUnicodeString entry
    static void     FakeEventInit(__in PVOID, __in ULONG, __in ULONG);  //  Fake KeInitializeEvent entry
    static ULONG    CollectVersion(__out MyBindInfoAlias& BindingOut, __in PVOID, __in MyBindInfoAlias& BindingIn, __in PVOID);
#if defined(_AMD64_)
    static LONG     Filterx64Exception(__in EXCEPTION_POINTERS* ExceptionInfo, __in LONGLONG& FakeTickCount);
#endif

While the rest of it looks like this (and again, this is partial)- yes, feel free to hate my stylistic indifference to the herd- perhaps I’ll get fired for this, and all can breathe a vast sigh of relief…

First, the code that calls the filter

        MangledDriverEntry  CrashAndBurn = (MangledDriverEntry) ((PBYTE) DriverImage + EntryRva);

        __try
        {
            CrashAndBurn(Binding, &Binding);
        }
#if !defined(_AMD64_)
        __except(EXCEPTION_EXECUTE_HANDLER)
#else     
        __except(Filterx64Exception(GetExceptionInformation(), OurFakeTickCount))
#endif
        {
        }

and now our filter:

#if defined(_AMD64_)
/**********************************************************************************************************************************

LONG    AllKMDFDrivers::Filterx64Exception(__in EXCEPTION_POINTERS* ExceptionInfo, __in LONGLONG& FakeTickCount)

Ahh, the joys of low-level intervention by the truly incorrigible!  On AMD64, we will get an exception when the code tries to get
the tick count.  This value resides at a known address (which cannot change because if it did, all existing drivers would then fail
to work- although I suppose someone could resort to what I am about to do- bring it on, I'll see if I can keep up with the absurd
arms race).

So, first, verify from the exception record that we are getting an access violation of some sort reading that known address.  Then
see if one of the integer registers has that address (start with RAX, since that's where it currently is, and I bet it doesn't
change much).  If it does, change it to point to the value given to this function (which resides in WdfVerifier, since I made it
a reference, and will let the compiler play enforcer), and then continue execution.

In all other cases, execute the handler, which will do nothing (meaning the version will continue to be unknown).

**********************************************************************************************************************************/

LONG    AllKMDFDrivers::Filterx64Exception(__in EXCEPTION_POINTERS* ExceptionInfo, __in LONGLONG& FakeTickCount)
{

    //  cf definition of SharedTickCount in wdm.h
    static const DWORD64    KnownTickCountAddress = 0xFFFFF78000000320UI64;

    if  (NULL == ExceptionInfo || NULL == ExceptionInfo->ExceptionRecord || NULL == ExceptionInfo->ContextRecord)
        return  EXCEPTION_EXECUTE_HANDLER;

    switch  (ExceptionInfo->ExceptionRecord->ExceptionCode)
    {
    case    EXCEPTION_ACCESS_VIOLATION:
    case    EXCEPTION_IN_PAGE_ERROR:    //  Unlikely, but what the heck...
        if  (2 > ExceptionInfo->ExceptionRecord->NumberParameters)
            return  EXCEPTION_EXECUTE_HANDLER;
        break;

    default:
        return  EXCEPTION_EXECUTE_HANDLER;
    }

    if  (EXCEPTION_NONCONTINUABLE == ExceptionInfo->ExceptionRecord->ExceptionFlags)
        return  EXCEPTION_EXECUTE_HANDLER;

    if  (EXCEPTION_READ_FAULT != ExceptionInfo->ExceptionRecord->ExceptionInformation[0] ||
        KnownTickCountAddress != ExceptionInfo->ExceptionRecord->ExceptionInformation[1])
        return  EXCEPTION_EXECUTE_HANDLER;

    if  (KnownTickCountAddress == ExceptionInfo->ContextRecord->Rax)
    {
        ExceptionInfo->ContextRecord->Rax = (DWORD64) &FakeTickCount;
            return  EXCEPTION_CONTINUE_EXECUTION;
    }
    else if  (KnownTickCountAddress == ExceptionInfo->ContextRecord->Rbx)
 

The rest I leave as an exercise to the reader, having already put all 0.378 of you through the treadmill today…

Progress

When I first got the idea of digging up my old tapes and digitizing them before I lost them for good, I was chatting with another ‘softie (I believe it may have been Craig Ziegler, who manages the test team for the WDK) at the big party during WinHEC 2008 (so I did do something there besides play Halo 3).  At about that time, the music being played switched to the Donna Summer classic “(She works) Hard For The Money”.  That was one of the tunes I wanted to see if I could capture one of my “covers” of.

Perhaps in part because of that Grateful Dead influence, I was always a very improvisational player- that song was a part of my bands’ repertoire through much of the 80’s, both when I played the bass, and later after i switched to the guitar.  I always liked it, and my approach to it was always laden with drama- lots of energy, catchy motifs [well, I though they were catchy], strong attacks on key parts- perhaps overly bombastic- but hey, that’s how I thought it should be played and I was the one with the instrument.  While I was aware of the underlying theme the song had, I always treated it more as an anthem for women in the workforce.  In part because my career included an era where things were a lot tougher than they are now.  “She works hard for her money, and you better treat her right”.

I’ve managed to find two instances that I play in my office (and in my car, and on my xBox, thanks to that fine SanDisk USB thumb drive I got as a WinHEC freebie)- one where I play the bass in a live performance [and I thought it captured the nuance of that night well]- the other with guitar- but it was at a practice, and fairly early one- so the leads aren’t very inspired, there are presentational inconsistencies [more than usual], and I make a few more flubs than usual.  I know I had at least one solid live performance on guitar, but I may have lost that tape.  Ahh well, now that I’m practicing again, maybe I can recreate those lost glories- not that anyone but me actually cares, but one ought to have some purpose in life, right?

At any rate, while listening to the two of them back-to-back last week, the basic idea of this article came to me.  So perhaps this will be less half-baked than others have been…

I’ve been fortunate enough to have had several women as managers or coworkers [I also employed quite a few at the Laundromat, had more sisters than brothers, and my mother was herself a strong influence on me].  I owe most of them some debt of gratitude.

My first long-term job after graduating from Caltech was at IBM in Owego, NY.  The interviews there were interesting to me- I had two in their Electronic Design Automation group, and it was one of those experiences where you just light up and enjoy it.  I’d been working as an electronics technician on and off since graduation (one of the things I had acquired and needed to rid myself of during my college years was a distaste for computer programming, which had been one of my first loves prior to that- I had decided I’d rather work toward being a EE, instead), and had been thinking about application of computers to digital circuit design and testing.  I’d discussed some of them with coworkers and had the “not possible- way too hard- you’re crazy to think like that” kind of response- but here I was interviewing with a group that was doing exactly that- and it was working quite well.  Since I’d already been thinking along those lines, I could ask intelligent questions, etc.  The interviews went well.

But I did terrible things at Caltech- things like skipping exams because I knew I would get a passing grade without taking them [not a good grade, just a passing one]- so I had a rather poor GPA.  Low enough I did not meet the corporate hiring guidelines.  The hiring manager seemed to consider me a bit of a nerd, and he also judged a lot on those numbers.  His manager was a woman- she had attended grade school near Caltech, and I had also interviewed with her.  She intervened, and I was hired in spite of some resistance.  So I definitely owe a woman my first good start in the high-tech industry.

But the things I used to hear!  That she had to agree not to have a family life in order to have the management job, for instance.  Or later, assertions of tokenism.  In those days, I was pretty naive- so it was hard to judge how much was true, how much was cattiness, how much was just plain made up out of spite and jealousy.  So I may have accepted more of it as truthful than I should have.  But it didn’t seem fair.  So as she progressed (at least as far as a divisional vice president, last I knew) I gave a few silent cheers, and didn’t contribute to or spread all the gossipy stuff.

The working environment certainly wasn’t friendly- that much was true.  One incident I recall later in my tenure there was my mentioning pointedly in an employee poll the plight I observed of one of my fellow engineers.  The men hired at about the same time as she was had been given desirable assignments and good career paths- high-profile design and engineering tasks.  Her assignment was to track equipment being checked in and out of the equipment closet.  When she agitated for something a bit more technical, she was given a project that was clearly doomed to spectacular failure (budget had already been spent, technical approach was flawed in the opinion of most of the experienced engineers on the team, etc).  I had a distinct sense that this was deliberate and retributive and found it reprehensible.  Such opinions of course had to be written, and my writing style is distinct enough it was clear who raised the issue- only deliberate obfuscation of submitted material could have hid it, and no place I’ve ever been that conducts such surveys resorts to that.  So I shouldn’t have been surprised that the next project that came along that I had a strong interest in was given to her.  But that was probably a good choice, as it turned out- I pushed way beyond my limits in those days- she was a bit more sensible.

Now I don’t think what I was seeing matched the desires of the top management of the company, even then [and I’m equally certain it would not be regarded positively now].  I’m merely discussing perceived changes from my viewpoint.  Cultural changes- both societal and within other aggregations such as a corporation, take time.  Personal change as well, [and there’s been a bit of that, fortunately] for that matter.

After coming to Microsoft, I experienced what at the time I would have termed a more “liberal” working environment.  Family leaves, etc.  While women were rare, there wasn’t quite the same sense of class difference.  I tried to take some of those lessons to heart when I was the boss [the Laundromat again].  I tried to provide flexible arrangements, allowed employees to bring children to work when there were day care problems, etc.  The pay was terrible (as I’ve noted elsewhere, the venture bankrupted me in all but the legal sense of seeking the protection), but I wouldn’t have paid a man more.  I always paid more than minimum wage- but not much more.  So I suppose in that sense my employment of them could be deemed exploitation- but that certainly wasn’t my intent.

I’ll skip over a few bits of reminiscence- don’t need them.  Today I find myself working for a woman again [and one could even argue she was promoted to the spot over my head- it happens I don’t have a problem with that, though].

Now I could go on about how well I think that turned out [she’s a good leader, I’ve said it before, and I still believe it, and try to do what I can to help], but that’s not what I intend.  It’s the difference in environments across that span of time, and the direction of the progress as I see it.  Sometimes she works from home- sometimes she has to drop things to attend to family and child care issues.  Nobody sees anything wrong with that, as far as I can tell.  It doesn’t affect her being an effective leader or manager.  The organization supports it, and we as her team support it.

After mentally composing much of that, I found we have new board member- again a woman, and computer science professor…

With a daughter of my own soon to enter the work force- it’s good to consider these things.  There has been progress- the journey may not be over, perhaps many are not satisfied with the pace- but at least it’s the right direction.  I consider myself fortunate to work for the company I work for- that it values these things, and makes these efforts- to produce a more inclusive and diverse workforce.  One that respects these things and its employees.

Now Playing: Troy- Ghost in the Shell: Stand-Alone Complex Soundtrack- Monochrome

Posted by BobKjelgaard | 2 Comments
Filed under:

Using C++ in a KMDF driver part 1- a pattern for using contexts as objects

This is an article I’ve started probably close to a dozen times since I started this blog, but never published.  In part because of all the heat the topic of using C++ in the kernel generates, and the rest perhaps because of my reaction to that heat.

So I’ll get one thing off my chest at the start and perhaps that will be enough to let me proceed.

I’ve been writing drivers with C++ in the kernel (or the Win 3.1/9x/ME VMM) pretty much since I began using the language (in the very early 90’s).  I routinely use paged code (paged data not so much- I’ve never had designs where there seemed to be any benefit to it).  That spans more than a decade- in fact, its close to two decades now.

I’ve NEVER had a problem with the issues raised in the paper.  I’m not even going to link to it.  If you want to find it- it isn’t hard to find.  On top of that, in a determined effort to find people who had, I didn’t find many- and the one clear case I did find was cured with a #pragma that would have made sense if you were a C programmer and had a basic understanding of where VTABLES and such get emitted to.  So I personally have a sense of mismatch between my own experience and the strongly worded severity there.

I’ve been told that if I persist, I’ll deserve all the paging problems I get [and that nobody will help me with them].  Well, I do get them (always have), and I’ve been developing long enough in this environment that I don’t really need anybody’s help to find the root cause of a bug like that.  The causes are always the ones I remember- acquiring a spin lock in pagable code- making a routine pagable that can be called at elevated IRQL- all the usual ways to screw up in that other language that is the hallmark of a true first-class kernel developer.  But none of them had anything to do with my choice of programming language.  Not once.  Not saying that it never happened across all those years, because in the early ones, the ability to catch a bug like that was severely limited- there was no Driver Verifier, no static analysis tools.  So it may not have been noticed early on.  But now that I have them, it’s still not happening…

Now there can be plenty of reasons for that- but I look at what people take away from the paper, and I do a lot of the things they think aren’t safe.  I use polymorphism and inheritance freely.  I don’t use multiple inheritance a lot, but I have used it and have not observed problems arising from its use.  On the other hand, there are common features that I don’t use as a matter of personal preference or style that may bear on this.  I don’t expose the implementation of functions in header files (meaning the compiler is not going to start out by inlining them and then give up later, dumping them in some unintended segment) with the exception of trivial accessor functions.  I don’t use templates (I’m not sure they were a language feature when I started, but at any rate, during my learning curve I never needed them, so while I can handle code with templates, I don’t use them myself).  I almost always code my own constructors, destructors, copy constructors and assignment operators (I usually have to- I prefer references to pointers and if you have reference members, the compiler can’t generate default code for most of those routines).

So now that its clear I’m not going to walk the company line on that topic [the opinions I expressed ought obviously to be clearly my own], I’ll proceed to something more useful…

Leveraging the KMDF Object Model

KMDF provides a nice object model, with managed lifetimes and one of the most delightful observations I had in my first KMDF driver was that it was easy for me to blend this with my usual coding patterns.

As an aside, I’ll note that I never use sample code to learn anything.  I take the reference materials and code to them.  If there are samples, I treat them as the last resort- and I will deliberately change as much as I can of them [in part to see what knowledge of things that can go wrong wasn’t explicitly represented in the sample].  I judge the quality of the reference material by how rarely I have to look at a sample to figure something out [yes, I don’t find much reference material i would call “good”].

So my first KMDF driver was a software bus driver, and I didn’t go near toaster in writing it.  In fact it was one of those “ActiveX” (OLE automation) test drivers I mentioned in our DDC presentation.  Now we had some samples for them, too- and not surprisingly, virtually no reference material.  The samples were fastidious about one thing- all the COM parts were in C++ (not a lot of choice there)- but all the parts using KMDF were in C- precisely because of said paper.  I might add that this was even though no attempt was made to mark any of the code or data pagable in those samples.  Well, I’m a stubborn <expletive of your choice here>, so I decided then and there I was going to write the whole driver in C++ in spite of the objections I received.  I was still in my “trial period” so they could always fire me if they wanted to, but the job market was good enough at that time that I was pretty sure I could find something…

Back to the proper topic- one fine thing is that most of the KMDF macros are agnostic enough they can handle at least straightforward C++.  That was one of the happiest discoveries of that time.

So the basic pattern as I use it:

  1. Declare a static function in your class that takes a WDFOBJECT as input and returns a pointer to an object of your class.
  2. Declare a class-specific placement form new operator that takes as its additional input the type of handle you expect your object to live in the context of (that is, it can be more precise than the preceding function can and you can benefit from stronger typing in C++)
  3. Declare a class-specific delete operator that basically does nothing if you need to have your destructor invoked.
  4. If you have such a delete operator, also declare a static member with a void return that takes a WDFOBJECT as input.
  5. Use the WDF_DECLARE_CONTEXT_TYPE_WITH_NAME macro to get the compiler to write that first function for you.
  6. Have your new operator implementation use the first function to return the address of the underlying context (you basically ignore or validate the size parameter) from the passed-in handle.
  7. When creating the context, and you need your destructor called, use a WDF_OBJECT_ATTRIBUTES structure with the EvtCleanupCallback set to the routine in item 4.
  8. Code that routine to use the routine in item 1 to get the context address out of the object handle- and delete that pointer.  This causes your destructor to be called at cleanup time (which is much more sensible than destroy time) and your do-nothing delete operator will also be invoked (or inlined out of existence if your compiler is any good).

There- you “create” your object when the KMDF object is created (or via a WdfObjectAllocateContext call if you add your object later), and “delete” it when the object dies.  But KMDF manages the memory and most of the object lifetime for you.  Sure works for me (a lot).

The following snippet is from that first driver (I’ve since dropped the usage of “C” on class definitions in my general drive toward anarchic style).  This is slightly convoluted because I have logic allowing only one instance of a device with this driver- so I’ve deliberately intermingled driver-level and device-level usages (always pushing those boundaries- but I think that’s a good way for an SDET to think).  I’ll admit this is slightly doctored (I removed some things related to the COM technology as that I can’t disclose, plus I tried to include support for the standard bus interface and that just complicates things without illustrating this method), but it should show I practiced what I am preaching…

class CTargetTestBus
{
    static WDFDEVICE                Owner;              //  WDF device that "owns" this bus object

    static void*    operator new(size_t size, WDFDEVICE OwningDevice);
    CTargetTestBus(WDFDEVICE OwningDevice);
    ~CTargetTestBus() {}

    //  Private callbacks (ie, accessed from within this class' code)
    static EVT_WDF_IO_QUEUE_IO_DEFAULT      OnIoDispatchDefault;

    static void operator delete(void*) {}
public:

    NTSTATUS    NewChild(LPCWSTR Name, int InstanceID);
    NTSTATUS    RemoveChild(LPCWSTR Name, int InstanceID);
    void        DoneWithBus();

    static CTargetTestBus*  GetThisTargetTestBus(__in WDFOBJECT Object);    //  WDF macro writes this code
    static CTargetTestBus&  GetTheBus(bool& BusPresent);
    
    //  Driver callbacks (public, because used in DriverEntry)
    static EVT_WDF_DRIVER_DEVICE_ADD        OnDriverDeviceAdd;
    static void             OnDriverUnload(IN WDFDRIVER Driver);

    //  This one is accessed from a member in CChildInfo
    static EVT_WDF_CHILD_LIST_CREATE_DEVICE OnAddNewChild;
};


// Macros to get the context
WDF_DECLARE_CONTEXT_TYPE_WITH_NAME(CTargetTestBus, CTargetTestBus::GetThisTargetTestBus);

Now playing:  Me (that old recording of Johnny B Goode again)!

Posted by BobKjelgaard | 2 Comments
Filed under: , ,

Automated Jobs and Library Jobs

I received an IM late yesterday evening (my time) from “James” (via the Web) asking what the difference is between an automated job and a library job.  Unfortunately, I wasn’t around to answer it (I started work about 2 AM PDT today, and the IM was from about 9:30 PM PDT yesterday), but it’s rare I get asked something I can flip an answer to off the top of my head, so…

I’m assuming this is in the context of the Device Test Manager (DTM) in the Windows Logo Kit (WLK).  That is a derivative of a larger internal-use product we call WTT (Windows Test Technologies), so I’ll assume the same answer applies to both, because I’m sure it does (and if I’m wrong, feel free to correct me).

If you’re a programmer, the fastest rough analogy is that an Automated Job is like a C / C++ “main” while the Library job is like a “function” in the runtime library (or some DLL, etc).  The important difference here being that a job can “call” a library job and pass it parameters, while you cannot “call” an automated job- you can schedule it (and give it parameters) just like you can invoke a program (and give it parameters) from a command shell- but we don’t have analogs like piping and redirection of I/O etc in this execution environment.

They are a lot alike- both can be scheduled and given run-time parameters, and possess fairly similar capabilities.

So why not make all Automated jobs Library Jobs?  One strong reason is that Automated Jobs can be constrained [have a set of circumstances spelled out to detail the requirements for running the job- one common one in device testing is the presence of a particular kind of hardware, for instance] while library jobs cannot.  Another one I use occasionally perhaps ventures into the realm of the political- making a job an automated job so that it doesn’t wind up being called as a library job if you expect such a usage to have untoward consequences.

The best practice, especially initially, is to make most jobs library jobs- this gives you the most flexibility as you combine them later.  You can work on them in pieces and then combine the pieces fairly easily.

Well, James- if you’re still searching for that answer, perhaps this will help.  If not, maybe it will help someone else later.

Motivation

What makes a person do what they do?  One thing we try to be at Microsoft (and another good attribute to have) is self-critical- to be able to objectively [to the best of our innate abilities,as it is hard to avoid some subjectivity in this domain] analyze our actions, our strengths, our weaknesses- in many aspects of our work.

Two other key parts of the Microsoft culture that bear upon this article, and which is in some sense motivated by a nexus of all of these:  the intrinsic value of open and honest communication, and a sense of respect to others with whom we deal that tries to assume the best interpretation when multiple alternatives present themselves.

The final element in today's mix is some of my own motivators- to blog, to work on WDF, and to work for Microsoft.  I'll start with the final one.

Microsoft is an amazing place to work.  The Core Operating Systems Division- the heart and soul of a platform literally used on a global basis.  I was a part of it years ago [and it was a thrill to be there then, when it was the Systems Division, and Steve Ballmer was just the president of the division], and to be back there again- at a place close to the kernel, which is a place I'd always hoped to be but doubted I'd ever achieve.  Those bottom-most layers, where hardware meets software- the foundation that holds up the entire software building.  I should pinch myself to make sure it's really true.

But its much much more than just the raw technology.  I am surrounded by people of great skill and intellect, united in one form or another with a common purpose- the broad purpose of "improving human lives through the application of technology" is one way it could be stated at the company level.  But where I sit, it's focused a bit more on the hardware, the people who make that hardware and see having it work on Windows as a means for them to achieve their personal goals [financial, professional, and so on].  We want drivers to work on Windows.  We want to make them easier to use.  We want that whole end-user experience to be better, safer, and more trustworthy.  We want the people who make that hardware and build those drivers find their jobs getting easier and the process more efficient and effective for all.  We think about it and talk about it in some form practically every day.  Historically, I think a lot of progress has been made, and we aren't resting upon that- we're pressing forward.  We strive to improve and achieve some form of excellence.

Better yet to me, though, is that these are people I can trust- those values of honesty and integrity and ethical behavior aren't shams.  I see that as well in interactions throughout my day.  And I see that respect for diverse and differing viewpoints- it isn't always easy, and in none of this am I trying to say everything is perfect.  But it's a good place to work and good people to be working with, and we're in a reasonable sense making our living trying to do good things.  In a way, it is living a dream [sometimes the technology turns it into a nightmare, but that's the nature of human frailty- we push our limits, sometimes we have to fall back].

So I'm passionate about my job and about Microsoft.  I blog because of this.  When I was trying to solve that installation problem in my initial set of articles, I tried to live up the corporate standards in my treatment of both sets of customers [the engineer at the company using our technology and the end users who were experiencing this issue]- the fact that they closely align with my own beliefs certainly helps- but that alignment is what helps drive me.  It leads me to take hours out of my day at times investigating other issues, or doing other tasks.

In the first article I wrote over the weekend, this is some background to further explain myself:  I spent at least a couple of hours reading megabytes of logs to determine what had happened on an end user machine for another support engineer at a hardware company.  It was an experience I'd mentioned before- the problem wasn't just affecting his driver, it had affected others.  Somewhere out there what was probably a perfectly good piece of hardware got discarded as useless- it's manufacturer derided as producing garbage.  This person's company was in danger of the same thing.  The first conclusion was unfair, and the second would have been.  That offends a basic sense of justice and fairness I find myself afflicted with...

So I got upset, and I vented.  In response, I got this reply.  It is capable of ambiguous interpretation, whether deliberately or just through a lack of finesse in English [some of the constructions hint at someone for whom it is a second language- although I am capable of that and worse at times, particularly when tired].  So at first, I opted for the less confrontational interpretation, and simply replied with what was on my mind already, anyway.  I'm never at my best when angry, and I try not to let it happen [and to let it pass when it does].  I still need to take some of those more curative steps, but I will get them done.

But at the moment, my inner priorities have me blogging again- in part because of that second interpretation.  My deepest and sincerest apologies up front if I have misjudged and was closer to the truth with my first estimation, by all means.  But the ambiguity exists, the fire is in my heart, and I shall answer:

Registration on MSDN provides access to useful features- my registration, for instance, let' me update my blog from anywhere.  If we intended it as a deterrent, it would be a lot harder than it was.  Sites where I've seen this particular bad advice [just delete KMDF] weren't hard to find [and I really don't remember seeing many registration barriers]- but the threads were old, and in my own view, there were better uses of my time than trying to challenge each utterance individually.  Of course, all that happened BEFORE I saw those logs and some of the real effects of that advice.  Perhaps I should have tried that.  But I can't unwind the clock, and the genie is out of the bottle.

As for the quality of my own posts- I know I'm not perfect.  People that actually take the time to know me know that about me.  But I do know what happens when you remove the KMDF service or delete the binaries, or even stop making it boot start [all of this specifically referring to Vista and above].  I've done all those things in testing it- repeatedly.  On multiple machines.  My parents have a Vista machine, and I wouldn't do it to them.  And I would be every bit as angry at someone who gave them the same advice I was ranting about.

Marketing?  I have seen my blog featured for brief periods on the WinHEC and WHDC sites [and in both cases,  I found it a bit disconcerting- I'm sure I shall always pale in comparison to Doron, or to Mark Russinovich, and I can easily rattle off many more names- I take it as a rotation of sorts meant to give me some success as a blogger, but how well that chance works out is up to me].  I suppose it's a form of marketing, so point taken.  But I can't see yet what's wrong with it- I'm no star, and I doubt I ever shall be.

Sycophants?  Doubt I've met any- one can agree with something, even enthusiastically, and not be a sycophant.  Contrarian- same thing only flipped around.  Both agreement and disagreement have value in a conversation meant to go somewhere and achieve a goal- I see that in my job every day, as well.  With all due respect, that seems to be an unfair and dismissive characterization (whether directed toward me or not).  Real people do these things- they innately deserve more than being dismissed as stereotypes.

Hiding all Microsoft content from the Internet?  Or just MSDN?  Or just my own blog?  If it's the latter, there are times I'd be inclined to agree- I often denigrate my own work- I can be one of my harshest critics.  But perhaps the thought that since its in those indices (that's the proper plural for index, by the way), anybody can see for themselves what a blowhard I am can be a form of consolation.  If my incompetence rises to the level of termination, perhaps those indices will save a future potential employer from a grave error.  On the other hand, if it was one of those earlier two- that's just totalitarian.  If you don't care about the walled garden, don't look.  Why be bothered that others do?  If the underlying claim is that the content is harmful, I'll just not agree- I used that content at times when I wasn't an employee and didn't particularly like the company or think it was a good thing it held its market position.  It may not have been perfect, but it wasn't deliberately harmful [and yes, claims like that are indeed something that upset me- I just don't see a point in wasting time dealing with them- not unless there's something concrete and not just endless assertion, which is all I was presented with here].

But to go back to thinking the best, I can take this again as a criticism of the ineffective nature of my choices in how to blog.  It's too hard to separate the good stuff from the rest of my verbage.  That is something that I am aware of, and occasionally try to correct.  But it's hard for me to not be me, especially when blogging- the act of composition drives itself in these directions.

As my final afterthought I'll go back a bit further- my responses on NTDEV a couple of weeks back were fueled by what seemed to me to be an endless stream of negative criticism, shallow, caricatured stereotypes and unfair, unwarranted assumptions about who and what I and my colleagues are.  Calvin Guan was in some ways an innocent- his comment was just the verbal straw that broke the Bob camel's back.  I've worked with him briefly and he's not a bad sort at all.  But my tone may not have been respectful enough...  I did want to counter those opinions though, I felt they delivered a skewed picture [and hence that subsequent post].

So, I'll leave this missive in those indices, and let things fall out as they may.  Tomorrow I'll go to work and try in my own way to make the world a better place.  I'm lucky to have a place where I can do that, and luckier still to have some people about who really accept that as being part of my motivation [yes, I also like having a place to sleep that's warm, and food, and so forth, too- those are additional motivators]...  Beyond that, I'm not sure how concerned I should need to be.

Posted by BobKjelgaard | 2 Comments
Filed under:

Patrick the Prankster

One aspect [at least on occasion] of life at Microsoft is some good-natured [well, at least,  hope that’s what it is] practical joking.  Patrick pulled off a couple of gems recently (I won’t name the targets, but they’d be well-known to the community at large).

The One True Popper

In the first one- a key senior member of our development team went on a well-deserved vacation, including some well-earned time with his latest progeny.  He returned in early January to find a sign on his office door announcing [a paraphrase on my part] it had been child-proofed.  As for the office?  Layers and layers of bubble wrap- the floor, walls, duct taped in several layers about the chair, and more…  For a while there, you could tell when he was in the office by tracking the pops…

The Developer’s Day Hath Come

In the second case, another member of the development team went abroad to give several of our WinHEC presentations [including the one Ilias and I did at the original event] at WinHEC events in Asia.  A picture of him appeared in a Chinese-language web report.  When we attempted to read the article, the weaknesses in machine translation from Chinese to English were very apparent, and were [in English] quite hilarious.

At a team luncheon, he was presented with some more of our prankster’s handiwork- a photoshopped image of a cereal box (WinHECies) with his photo and snippets of several phrases from the article, along with some advertising blurbs [“Help Fight Crashes” “You can make a difference!”].  I’d be tempted to post a copy, but having had to tolerate people [when I was running my laundry business] who pointedly looked down upon Asian immigrants because of similar issues with English, I thought it better not to provide anything to play into those prejudices.  But it was a masterful job [and in good taste- don’t misjudge my concerns].

It would be nice to see an engineer get that sort of star treatment, wouldn’t it?

For my parting thought- I should feel lucky- all he’s done to me lately is obliterate me in Halo…

Posted by BobKjelgaard | 0 Comments
Filed under:

Bad Advice No Matter Where it Came From

Just a short post- in the “don’t do this at home” category.

I was recently asked to assist someone with another KMDF installation issue, and in this case, the Internet proved itself to be everything it shouldn’t be- a treasure trove of unchallenged bad advice.

I don’t care where you read it- deleting wdf01000.sys and wdfldr.sys from %windir%\system32\drivers of a machine is:

  1. So rarely necessary as a problem-solving step you shouldn’t even be thinking about doing it.
  2. Very likely to make things worse than they were before you started.  If you are using any OS beginning with Windows Vista, it can turn your machine into an unusable paperweight, and you may find it very difficult to recover.  Plus there are enough drivers in general use now even the earlier OS are not immune.

What I was looking at was a machine where someone attempted to “cure” a KMDF 1.5 installation issue this way [I was fortunate enough to have the entire setupapi.dev.log, so I could actually tell within a couple of hours when this was done].  On Windows Vista, no less- now doing that requires overriding system file protection, so it isn’t like we haven’t tried to save you from yourself.  This particular user was very lucky- no boot devices using the framework [and this is the first Vista machine where I’ve ever seen this].

But once you’ve done this, you’ve given yourself one huge problem- none of the already installed drivers work, and no amount of reinstalling them is going to save you. 

Why?

Because our coinstallers won’t even attempt to update an OS where KMDF is a part of the OS unless they were released AFTER the OS was.  They don’t need to- it is already there, built-in.

Nobody’s driver is going to work after those files are deleted, until an update is applied.  If your Vista machine has had the SP1 upgrade installed [and by now, the vast majority of them have], then it is using KMDF 1.7- if you delete our binaries then NOBODY can fix it for you, because that is the current released version [1.9 is still in Beta].  There is no driver on the market you can install that will fix this, because we haven’t given anyone a package that is capable of that.

The only cure is to find the right binaries and put them back where they belong.  Then consider not taking advice from some internet blowhard [if you feel like considering me as one- hey, so be it- but I actually helped engineer the product, and that claim is verifiable- what’s your source’s claim of expertise based upon?].

My apologies for the tone here, but I track access to this blog, and I know “deleting wdf01000.sys” is a search term that pops me up.  So maybe some advice here may save you a world of grief.  There are better ways to solve your problem, whatever it is- even the problems where this was actually suggested as a “working” solution at one time.

Now Playing: Bob Kjelgaard- “Let’s Escape” (not available anywhere)- just me and my acoustic with a song a couple of my college buddies wrote [and I renamed, revised, and arranged]…

Posted by BobKjelgaard | 2 Comments
Filed under: ,

Well, yes and no...

I got involved (probably quite foolishly) in a discussion on the NTDEV mailing list that really wasn't much of my business yesterday, but I thought I'd expand a bit on my thoughts there, anyway.

It probably sounds in one of my first posts there that I'm completely disagreeing with Peter Viscarola, although in fact, I don't.  I just think there's a lot more to the story than that- both the reasons why things are as they are, and the broad general effect it has.

The Purina Effect

I think I'll structure the rest as a Q & A with some imaginary and perhaps occasionally hostile interviewer- we'll see where that takes me.  Not that there's a need for this, but it's my blog, so I get to make these choices...

  • So why do so many people at Microsoft run the latest of everything?
  1. Because we hire a lot of technophiles- we are people who like new technology and believe innately in its transformative power.  We know such people do well in our environment, and make great contributions.  Its a good trait to have if you're seeking a job here.  You'll get picked on sometimes for using Office 2003 for the same reason some people will give you grief about out of date fashions, or a stodgy car, or...  So the desire for the latest and greatest is inbuilt.
  2. Because so many of us are involved in developing new products, we understand quickly the economics of their test, and that includes the idea that finding problems earlier is fixing them cheaper.  In part due to that first bullet, there's usually a ready pool of people available to help test your new product, and that's a good thing.  Most of the time, the problem is keeping them away from things before you're prepared for their input.  This is also good over the long run- we call it "dog food" [short for "eating the dog food"], and it is well-baked into the corporate culture.  It occurs on voluntary and ad-hoc bases and in planned deployments.  We always use things before you do.  We always will.  [And unfortunately we will still miss things- there is no perfect test process, at least, not that I know of].  I began running Windows 7 within weeks of the release of Windows Vista, and used it for most of my work (and most of my blog posts, except at times like this when I post from home).  Unlike most driver developers, I am used to an OS whose code changes each and every day, and exists in multiple forms at nearly the same time.

Just looking at the last couple of years, I've run beta or earlier versions of a couple of OS, Office, Windows Live, Halo 3, and Halo Wars.  I didn't have the big XBox Live update because I knew I wouldn't have time to work with it [but I know people who did].  I'm not particularly on the bleeding edge, usually, even with all that.

  • Why do things change and old ways no longer work?

Sometimes, it is much as Peter said [with my demurrals above flavoring it a bit]- a lot of focus on the new.  But many times these are done deliberately, and "cool" [in the sense of some form of hipness] may occasionally be a factor, but not one that gets much weight [sticking to the OS- different products, different markets, differenf factors].

Most of the cases I can think of were the result of analysis of real data- watching and observing how things are really used.  Looking for the problem areas- trying to find ways to minimize the number of steps to accomplish the most important and the most common tasks.  Making things work better, and more easily.  I don't think that should be news- I expect several of the Windows team blogs say so directly or indirectly.  Now we think of that as "cool", but that's not the sense the word was being used in in that thread.

But we don't always know the full impact of changes sometimes, and it never hurts to check things in the real world.  That's one reason we have beta tests.  The TARGETPATH issue is (in my opinion) something we will take as a lesson for the future, but I also think this is proof the beta test process works.  Would it have been better for it to be this way in the final version?  Was the entire beta so good that the only conclusion was that we would not change anything from the feedback resulting from the beta, making it some sort of trial balloon we'd pull back, buff up, and then reissue as the final thing?

  • Is your "love of the new" [I won't repeat calling it the Purina Effect, although perhaps they'd be happy I find the name mnemonic for pet food] the sole reason you miss down-level breaks?

No, we also miss them, in my opinion, because:

  1. We develop blind spots in our test procedures and our thinking about how our products are used.  Most test scenarios have large variability and we have to narrow down the problem spaces to develop workable solutions.  Sometimes this process leaves holes.  More on that in a minute.
  2. Peculiar to this case is the mechanisms through which our build tools reach the WDK.  As I'm sure has been discussed on the Engineering Windows 7 blog, we put a lot of time and effort into improving our internal processes, and that definitely included how we build Windows.  The new mechanisms for binplace versus the old targetpath mechanisms has been a big win for us in terms of working with the source code.  But to minimize the impact to product teams, a lot of the initial legwork for it was done by the tool developers, and pushed out to us along with the tool changes.  I did some rework in our test code [things I privately had but weren't part of the main Windows source], so I was aware of some of this work, but I suspect that for most developers, this just happened [and that's good- it saves a lot of R&D money when things work that way].  This "magic elf" phenomenon may have helped develop something of a blind spot.
  • Did you, personally, test the WDK at all?

Actually, yes [and it has its own test organization, as well].  I and other members of our test team have installed it, built samples, run them, read the documentation and the source code, used the tools, and filed bugs as a result of what we've found.  That's happened more than once.  Periodically we have held what we call "bug bashes" where everyone in the organization that includes our team goes to an out-of-office location, splits up into teams, and competes to see who can find the most bugs, or the most interesting bugs, and so on.  But no, we didn't take SP1 DDK code and run it against the latest WDK [and there are multiple reasons that wouldn't work for WDF, anyway].  But I wouldn't be surprised to hear someone's considering that after all of this.

  • Is it your fault this happened?

I certainly ask myself that- or at least, if I shouldn't have done more, or paid more attention, and if I had, maybe I could have realized TARGETPATH would be an issue and flagged it before it went out the door.  But probably not- it was one of those mostly deprecated things that stayed in the WDK long after its need in Windows proper had passed on.  I developed a blind spot.

I bet I'm not the only SDET in Device Platform Technologies (which encompasses the WDK and WDF groups) who's doing that kind of thinking these days, although I certainly haven't conducted a survey on the subject.  I said above I'd get back to test holes, and this is a good place for it.

They exist.  We prefer when they're complicated enough it makes some sense to have missed them, but we're human enough it doesn't always work that way.  In a way, it should be hard to get too big a sense of oneself as a computer programmer.  The blasted thing always does exactly what you tell it to do.  You often don't tell it the right thing, even when you're sure that you have.  But being human, we get that sense sometimes anyway.  Maybe I had more of it than I should have when I made my first post on that thread.

Time will tell.

  • Layoffs?

Not funny.  I've been laid off twice in my career, both times as an SDE, and both times in a difficult economy, and this economy is worse than it was either of those times.  Nobody I know well is affected, but I am on some mailing lists where affected people made their brief farewells.

I sympathize with their situation and hope for the best for them.  I'll add that as far as I'm concerned, whomever hires them gets a good deal, because I can't remember the last time I saw a Microsoft FTE I thought wasn't a good employment catch.  I'll also add that I have a lot of faith in the people who made those decisions- I've seen enough to believe it had to be done, and it wasn't done lightly.

  • So why did you hint at unemployment?

Someone taking that thread and blowing it up into a news story about Microsoft not trusting its own products?   Not that it would (or will- I can't predict the future on something like this) get me terminated, but I wouldn't blame anyone but myself if it did.  For that matter, I may have said more about internal affairs than I should have.  Business procedures (and that includes security procedures) are proprietary information and covered by employment agreements.  By not checking first, I may have doomed myself, and that's life [corporate at any size- the principles don't differ that much from place to place].

I don't think I was discussing anything other than recommended best practices, though, so I'm not really as worried now as I was when I first thought about it.

Next time I intend to discuss something [anything] else- that's a certainty.

When Progress Is Required…

Today’s post is going to be a quick overview of one of the great new features in KMDF 1.9- built-in support for “guaranteed forward progress”.

What is it, and why would I want to have it?

The essential case occurs in storage device drivers.  If one of the system’s paging files resides on that device, then the driver is going to be asked to read and write memory pages to and from the paging file.  Failure of this I/O isn’t an option- the system itself could be dead if it were to fail.

Since it is common for drivers to subdivide requests into smaller pieces (for instance), this can mean that the driver may need IRPs or MDLs or other items to complete this work.  But if it asks for them from the system, those calls may fail.  So, to guarantee forward progress under low-memory conditions (and of course paging is critical in such cases), it is a common practice to pre-allocate IRPs, MDLs and other items that may be needed when memory is tight, and to use them only when this situation occurs.  Basically a rainy day account used to keep things moving under stress, but not touched in the normal course of events.

So what’s this new stuff about?

Well, KMDF has to create WDFREQUEST objects for incoming I/O, and this is a memory allocation.  So, beginning with KMDF 1.9, you can now set a “Forward Progress Policy” on a WDFQUEUE object while setting up your device.

This policy consists of these things:

  • A number of WDFREQUEST objects that the framework is to pre-allocate, use only as needed, and not let go of.
  • An optional event callback that allows you to create additional items  for each of these reserved requests ass they are created (so you can do all of your pre-allocation when the policy is established).
  • A similar callback for normal requests (so all requests can be assumed to have similar contexts in your IO callbacks).
  • A choice of policies regarding when this feature is utilized (more on that in the next list).

The choices I alluded to (things the framework will do when it fails to get a functional WDFREQUEST for an incoming IRP destined for your queue) are these (they are mutually exclusive):

  1. Always use a reserved request in the event of a failure to wrap an incoming request in a WDFREQUEST (if you have the callback which adds items to a normal request, failure of that callback also triggers this condition- a factoid that came in handy when testing this feature, of course).
  2. Provide an optional “examination” callback which will look at the incoming IRP and decide whether you want to just fail it or use a reserved request.
  3. Examine the IRP to see if it is paging I/O (the OS marks these)- if it is, then use a reserved request, otherwise, fail it.

A new DDI was added to set this policy on a queue, and another was added to allow you to see if a given WDFREQUEST is a reserved request or not.

If requests continue to fail, and all the reserved requests are in use, then the incoming requests get pended and remain in a list at the queue- as each reserved request is completed, it will get recycled as needed until there are no more failing requests to guarantee progress on.  So a queue with this configured becomes a counted queue of sorts under low-memory conditions.

For the most part you can treat these requests in the normal fashion- you can forward them to other queues, for instance.  When completed, they return to their owner- the bookkeeping behind all of that makes this a challenging feature from a test perspective.

Details on this are in the Win7 Beta WDK- see WdfIoQueueAssignForwardProgressPolicy and WdfRequestIsreserved as a starting point for further reading- this post is just meant for the mile-high overview [so I feel like I did something besides practice my guitar during today’s long builds].

Now playing: Yuki Kajiura- Madlax OST Volume 2- Lost Command

Posted by BobKjelgaard | 0 Comments
Filed under:

Final Thoughts on KMDF Installation for 2008

I should get at least one post in this month, and of the topics I want to discuss [but haven't had time because I've either been on other tasks or forced away from my blog by inclement weather], this one is the easiest.  I'll cover a few errors that have arisen within the last month or so related to installation of KMDF that are worthy of note.

Things Not To Be Done (But They Were Done, Anyway)

  1. Don't alter the settings of the KMDF runtime service in your installer.  They belong to Microsoft, and you shouldn't be touching them!  We have an unfortunate situation that arose recently where one vendor's installer changed the load group for our runtime, potentially [and not just theoretically, as it turned out] breaking a subset of the previously installed boot devices that were also using KMDF.  This is an automatic blue screen at boot [and very difficult to fix] for the affected user, but it is the previous vendor  [and Microsoft, of course] who gets blamed in the message on that screen.  There are cases where we will tell you what to set them to- specifically a device that ships in an OEM system and has to be used during the earliest (or "text-mode") portion of setup.  But that mechanism doesn't involve an installer, and has so far been disclosed only on a need-to-know basis.  Hands off our stuff, please!
  2. Make sure that you use the DDInstall section name [and this includes any decorations such as NT or NTAMD64] for your device when naming the Coinstallers and WDF sections in your INF.  If you don't do the first, then our coinstaller doesn't get copied or invoked, and if you don't do the latter, then we won't install your device if the coinstaller is invoked.  One case I saw formed both of these from a service name instead, meaning the device would only work if the required version of KMDF was already there.  We are improving the ChkInf tool to catch this error in the future, but until it arrives, you'll have to verify this yourself.
  3. The first name in the KmdfService directive in your DDInstall.WDF section has to match the first name in the corresponding AddService directive in your DDInstall.Services section.  This is particularly critical if you are using the KMDF 1.5 coinstaller, as it can result in a permanently broken installation [similar to this problem] for everyone [until a KMDF 1.7 device comes to that machine and fixes things, or someone takes the steps to fix it manually].  If you check your logs, you'll see tons of errors in %windir%\setupact.log if you do this.  Since the ChkInf work is going on now, I'll make sure I recommend we check this as well...

Please note I named no names- to err is human- to blame is often pointless...

Other Trivia

Well, my capturing the music performances of my 20's and 30's has had some benefit- it's given me two hobbies- a new one is capturing this stuff and processing some of the noise and warts out of it.  The second is I'm playing my guitar again- keep my Guild acoustic in my office, and my left hand already has developed a fine set of calluses.  I try to do my practicing during long builds at 3-5 AM or so to avoid disturbing the neighboring offices- wouldn't want to interfere with the Windows 7 WDK production effort, after all...

Hope all have a fine 2009!

Now Playing: One of my final performances of Chuck Berry's Johnny B Goode (ca 1987)  Narcissism abounds for me these days (but I'll get over it soon enough- I always do)

Posted by BobKjelgaard | 0 Comments
Filed under:

WinHEC Presentations you might want to view

Since I was asked in response to my previous post "so what else is new in WDF?", how about this link to the presentation Eliyas and Peter made on exactly that topic at this year's WinHEC?

I suppose I could also flog this link to the presentation Ilias and I made [but it's not quite so broadly of interest- WDF coinstallers and introduction to WDF logo requirements and tests].

Now Playing: Turning Point (a bar band I played in during the 80's)- The Other Woman (ahh, I could play a decent bass once in a while, back in the day)

Which reminds me, I've put that one up here (about 2 MB WMA) if you really want to hear it yourself.  I've put up a smattering of others available from this page(including a few after I switched to lead guitar and vocals)...

Now Playing: Grateful Dead- Europe '72- Sugar Magnolia

Posted by BobKjelgaard | 0 Comments
Filed under: , ,

Queue Callbacks are NOT Dispatch Routines

One of our SDETs recently made an error in a Win7 WDK sample that prompted a discussion.  After I explained the error, Wei gave me some feedback that my explanation made more sense than the WDK documentation did.  So I'll try to elaborate a bit on what I see as the underlying misconception, and perhaps it will help others avoid this conceptual error.  The title perhaps gives my key point away...

The Wild and Wooly World of the Original Driver Model

I'm with some of the other purists here- some will call this the WDM driver model, but this model is older than that- within Windows, it was in the original release of Windows NT [and it's based on VMS, IIRC- that's even older].  The driver object has a table of dispatch routines, which are what the I/O manager calls to tell the driver it has to process some form of I/O request [in the form of an I/O request packet or IRP].  There is one entry for each "major code" [which is an enumeration of types of I/O request which is somewhat arbitrary, but for compatibility reasons is unlikely to change much- the advent of WDM added several new major codes, for instance, but the previous ones remain].  This isn't meant to be a tutorial, so that's enough explanation for now.

The main things to consider about dispatch routines for this discussion:

  • There are instances [for instance, top level driver] where you can safely assume you are called at passive level and in context of the calling process and thread.
  • You are almost always called at passive level anyway.
  • There is some synchronization of I/O done by the IO subsystem (Serialization on a handle and of non-overlapped I/O, for example), but you cannot generally assume any synchronization between routines or even between a routine and itself.

That last bullet is where a lot of problems come in- unless you design the right kind of test cases, you can miss synchronization issues in a driver.  But the first two bullets also matter, particularly if you are trying to be a good citizen and make code and data paged when they are only relevant at passive level.

WDF Offers Abstractions With More Precise Semantics (If You Accept Them)

WDF handles dispatch routines for you through its request queuing models [and I use the plural deliberately, there are multiple models to let you tailor your driver better].  One thing in particular these provide is the capacity to synchronize request processing in multiple ways:

  • The sequential queue dispatching model guarantees that only one request is processed at a time- a new request is not dispatched until the previous one is completed.  This is the simplest model, but it isn't desirable for a lot of applications.  But if it works for you, it's the simplest mechanism of all to understand.
  • New in KMDF 1.9 is a counted parallel queue, which can have a bounded number of concurrent requests- once this limit is reached, new requests are not dispatched until an existing one completes.  [The sequential queue can be called a counted queue of one request].  If some or all of these requests need the same resource, you will need additional synchronization, of course.
  • A device can use device level synchronization on its queues- this means that only one callback at a time will execute, and a new request will not be dispatched until the current callback completes.  This is not the same as sequential dispatching in that it affects all queues, and the point at which the next request is dispatched is different.  If your callback doesn't complete a request [which is often the case, it is often enough for it to simply set up an operation], then the dispatch will occur sooner than it would in a sequential queue and you have overlapping of requests [potentially quite a bit of it, depending upon how you manage this sort of asynchronous I/O].  This model can also be extended to work items, Dpcs, and timers parented to this device.  If your callback DOES complete a request, the dispatch occurs a bit later [because the callback exit and not the completion triggers the dispatch of the next request].  This model is good if you have a device level resource you want to serialize access to, but you don't have to wait for completion to use it for a further operation (something along the lines of a command FIFO, for instance)
  • For further granularity, you can synchronize at queue level instead- in this case, callbacks within a queue are serialized with respect to each other, but not with respect to other queues- again this can be extended to work items, etc, but in this case, make sure they are parented to the queue and not the device.  This model is good if you have independently programmable subdevices, each of which still requires some additional synchronization.

The error I alluded to was related to paging and the last two types of synchronization.  You can specify that synchronization has to happen at passive level.  If you do, then a passive level locking mechanism is used, and you can make your callbacks paged code.  If someone sends you a request at dispatch level, a work item will be used by the framework to dispatch that request to you [the caller will get STATUS_PENDING back].

But if you DON'T state you want passive level execution, then the potential is always there that you have to synchronize with dispatch level code.  That means a spin lock is necessary.  If you are using either of the last two synchronization mechanisms, and have not explicitly used passive level synchronization [this has to be set at the device level or driver level] you MUST NOT make the code paged, because it will ALWAYS be called with the spin lock held [and hence at dispatch level].

Yes, you may know the I/O came to the dispatch routine at passive level, but the callback is not a dispatch routine- if you've asked for a synchronization model that isn't workable at passive level, your callback won't be there, because the framework had to change IRQL to meet your synchronization requirements.  Be careful what you ask for- you might well have received it!

One afterthought about queues- you can use multiple queues and even take requests from one and forward them to another.  So if, for instance, you have some particular IOCTL that would be perfect for sequential dispatch, you can use an IOCTL handler in a top-level queue, create a sequential queue for just that particular IOCTL and forward the requests from one to the other.  You still get the benefits of cancellation handling and a more focused model when you do this.

I'm still not sure this is clearer, but at least it isn't another Life At Microsoft tag, this time around!

Now Playing: Seatbelts(Yoko Kanno) Cowboy Bebop- "Bad Dog No Biscuit"

Posted by BobKjelgaard | 1 Comments
Filed under:
More Posts Next page »
 
Page view tracker