Larry Osterman's WebLog

Confessions of an Old Fogey
Blog - Title

Critical Driver or Cargo Cult Programming?

Critical Driver or Cargo Cult Programming?

  • Comments 37

I've been self hosting Vista on my laptop since sometime in January.  Every Monday morning, without fail, I installed the latest build available from the "main" windows branch, and tried it.

There have been good builds and bad builds - the first few were pretty painful, everything since sometime in March has been wonderfully smooth.

But sometime late in May, things changed for the worse.  Weekly builds installed just fine on my main development machine, but my laptop would get about 3/4ths of the way through the install and stop after a reboot complaining about a problem with the critical system driver <driver>.sys.

Of course, I filed a bug on the problem and moved on - every week I'd update my laptop and it'd fail.  While I was away on vacation, the guys looking into the bug finally figured out what was happening. 

The first part of the problem was easy - something was causing <driver>.sys to fail to load (we don't know what).  But that didn't explain  the unbootable system.

Well, the <driver>.sys driver is the modem driver for my laptop.  Eventually one of the setup devs figured the root cause.  For some totally unknown reason, their inf has the following lines:

[DDInstall.Services]
AddService=<driver>_Service_Inst

[<driver>_Service_Inst]
StartType=0

If you go to msdn and look up DDInstall.Services, you get this page.

If you follow the documentation a bit you find the documentation for the service install section which describes the StartType key - it's the same as the start type for Windows services.

In particular, you find:

StartType=start-code
Specifies when to start the driver as one of the following numerical values, expressed either in decimal or, as shown here, in hexadecimal notation.
0x0 (SERVICE_BOOT_START)
Indicates a driver started by the operating system loader.

This value must be used for drivers of devices required for loading the operating system.

0x1 (SERVICE_SYSTEM_START)
Indicates a driver started during operating system initialization.

This value should be used by PnP drivers that do device detection during initialization but are not required to load the system.

For example, a PnP driver that also can detect a legacy device should specify this value in its INF so that its DriverEntry routine will be called to find the legacy device, even if that device cannot be enumerated by the PnP manager.

0x2 (SERVICE_AUTO_START)
Indicates a driver started by the service control manager during system startup.

This value should never be used in the INF files for WDM or PnP device drivers.
0x3 (SERVICE_DEMAND_START)
Indicates a driver started on demand, either by the PnP manager when the corresponding device is enumerated or possibly by the service control manager in response to an explicit user demand for a non-PnP device.

This value should be used in the INF files for all WDM drivers of devices that are not required to load the system and for all PnP device drivers that are neither required to load the system nor engaged in device detection.

0x4 (SERVICE_DISABLED)
Indicates a driver that cannot be started.

This value can be used to temporarily disable the driver services for a device, but a device/driver cannot be installed if this value is specified in the service-install section of its INF file.

So in this case, the authors of the modem driver decided that their driver was a boot time critical driver - which, as the documentation clearly states is only intended for drivers required to load the operating system.

So I'll leave it up to you to decide - is this an example of cargo cult programming, or did the authors of this modem driver REALLY think that the driver is a critical system driver?

What makes things worse is that this is a 3rd party driver - we believe that their INF is in error, but we can't fix it because it's owned by the 3rd party.  Our only choice is to baddriver it and prevent Vista from loading that particular driver.  The modem chip in question hasn't been made for many, many years, the vendor for that chip has absolutely no interest in supporting it on Vista, so we can't get it fixed (the laptop is old enough that it's out of OEM support, so there's no joy from that corner either - nobody wants to support this hardware anymore).

Please note: This is NOT an invitation for a "If only the drivers were open source, then you could just fix it" discussion in the comments thread.  The vendor for the modem driver owns the rights to their driver, they get to choose whether or not they want to support it, not Microsoft.

 

  • Not really related in any way, but I quickly tried 5381 before 5384 (beta 2) and in my laptop the 5381 for reasons unknown felt faster and didn't seem to have some of the nasty issues present in B2 which I am running now. I've disabled Indexing and UAP but still 5381 felt faster. But since I have no data to back this up I just might be hallucinating.

    Looking at the performance events is also tough since the if one place causes disk trashing then it'll affect dozen other things and the log shows as if everyone of those services or drivers were stalling.
  • This sounds like a job for Drew in the past!

    It's not so hard to make that fix and re-sign everything with your test cert. As long as the test root certificate is installed on your box everything will work.

    For that matter the driver wasn't loading because it doesn't have a signature.

    This might help:
    http://download.microsoft.com/download/9/c/5/9c5b2167-8017-4bae-9fde-d599bac8184a/64bitDriverSigning.doc

    Anyone outside Microsoft can go buy a certificate as explained in the doc and make your scenario work.

    I guess with all the time spent externally evangelizing the driver changes in Vista there wasn't enough attention paid to internal developers. :-(

    My job here is done. Back to being Drew in the present . . .
  • Why not support driver compatability workarounds like you do for applications?  Just have a compatability setting that essentially states "for driver X, override setting Y in the INF."

    So, in the case of this modem driver, disallow the use of "StartType=0".

    I don't think you will violate any copyright laws by chaning the way you interpret a data file. Otherwise every new version of a compiler could technically be illegal.

    Oh, an please tell me this sort of thing would not happen today, and would have been detected by the WHQL process.  I'd like to think that certification does serve a purpose.
  • Unfortunately this is a 32bit platform and the problem isn't that the driver isn't signed.  I wish it was, that would make it "easy".

    We don't know what's wrong with the driver, and making it a critical boot driver makes it essentially undebuggable (the kernel debugger doesn't work on critical boot driver errors).
  • Presumably there's something different between the environment Vista is presenting to the driver and that presented by the version of Windows it was originally released for?  Or have the StartType codes changed at some point since it was written?

    Either way, I guess it's one of those issues where a 3rd party does something bone-headed and whatever you do, MS get to take the blame :|  
  • > If you go to msdn and look up DDInstall.Services, you get
    > this page.

    You linked to
    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/devinst_r/hh/DevInst_r/inf-format_d402e9dc-1a6f-423c-b80e-43dd5779b4cc.xml.asp

    You need to link to
    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/DevInst_r/hh/DevInst_r/inf-format_10bcb43e-0799-4dff-981f-2d8c4bf8f835.xml.asp

    And this was a lucky day because "sync toc" worked.

    If you agree with customers that MSDN's table of contents could be improved, perhaps you could suggest that to the maker?

    Meanwhile I'll bet you could fix the inf file yourself.  If the driver works when loading on demand, then you'll still be able to use it.  Even cargo cult inf file editing persons have been known to accomplish such feats.
  • Running on a chk build may give you clues about why the driver isn't loading.

    I have no idea why it would be boot start.

    I thought all boot start drivers needed signatures regardless of architecture. Maybe the design has changed (again), though.

    And I realized that I was spreading a piece of bad info in my last comment. I forgot that CI doesn't actually use the cert stores. And they may be planning to remove CI's approving of test-signed code by RTM. I have no idea where the final decision on that fell.
  • Or then again maybe you should ignore what I said about a chk build. I just re-read what you said about the debugger. The kd won't do anything? Crazy. I'd hope that there's a log somewhere . . .
  • Why can't the kernel debugger work?  I haven't debugged critical boot time drivers, as I work on drivers that load a lot later.  Of course... you could have always changed the .inf to have the driver start later, rebuilt the install image, and debugged the driver that way.  Note, I'm not saying to release it that way.  Of course, that debugging would just be for your information, but it might be interesting and point to a bug in windows.
  • Um, not to ask the obvious question, but why don't you change the start type?
  • Doesn't Windows allow you to patch stuff on load? Thus, couldn't you patch the INF file as it is read without modifying the original file?

    [Note: I don't actually know anything about how Windows Patches files for compatibility]
  • He could change it for his single install, but that's not the point.

    The point is that the OEM provided INF sets the start type incorrectly, and Microsoft cannot change it, because they don't own the driver.  The problem comes in including this driver with Vista, as it will cause this same problem if you happen to own that particular device.
  • Baljemmet, something changed, we don't know what.  The starttypes haven't changed since long before NT 3.1 shipped.

    Jeff: Why can't it be debugged? Because the kernel debugger is loaded after the critical drivers are loaded, it can't be used to debug them (at least I can't, others might be able to) :(

    Dispensa, we can't change the start type because it's not our driver.  Maybe it really DOES need to be a critical driver for some reason we're not aware of.  And I can't change the start type after the fact because I can't boot the OS to change the start type because a critical driver isn't loading (Catch 22).

    And Manip, we don't do that appcompat stuff for drivers (as far as I know, I'm not a driver guru).  And the IHV owns their INF file, we don't.  What happens if we decide to unilaterally change their INF file and we break stuff by doing it?  The IHV has told us that the're not supporting this device any more, and the OEM has explictly removed my laptop from their list of supported machines (for ANY operating system), so there's not much we can do about it at this point.
  • The solution for the catch-22 is, of course, boot another OS and change the seting from outside. Having just installed Vista, I noticed you can boot from the CD, choose recovery something and get a command prompt. From there, you could mount HKLM of the borked OS, and changre the start type in the registry.

    Not a long-term solution though..
  • Just wondering - how old is the laptop? And could you at please tell us who the vendor is so the rest of us can try to use an alternate vendor if we are planning on buying kit that we intend to use for more than X years? (Where X is the age of your laptop)

    It's not libel if it's true.
Page 1 of 3 (37 items) 123