March, 2006

Larry Osterman's WebLog

Confessions of an Old Fogey
  • Larry Osterman's WebLog

    There are times I'm glad Windows doesn't have a public bug database

    • 26 Comments
    From Bruce Schneier's 'blog, I ran into the following Bugzilla bug report.

    Paraphrased, the bug reports an information disclosure vulnerability in Firefox apparently caused a couple to break up after she discovered a list of web sites he was visiting while surfing the net.

    It's not clear from the bug if it was an actual information disclosure bug, or if it was a user error, but the thing I found fascinating was the comments that were left in the bug:

    Honey, I would
    think you would be the LAST person to be bothered by this. Not only did was he
    using your computer to be unfaithful, he wasn't smart enough to cover his
    tracks, and you got to know about it BEFORE buying the goods. If you're really
    THAT upset about finding out, take him back and pretend you never knew, or hold
    it over his head and use it to keep him in line.
     

    And:

    I would also like to add that this seems more like a 'feature' at this point.
    There is a button under Tools > Options > Privacy > History.  On the History
    tab, I believe, you will see a button for 'Erase My Relationship', underwhich,
    there are check boxes for '..because he is a cheating bastard', '..because he
    is a sneaky bastard who demeans me by thinking he can go behind my back', and
    '..because he jerked is insultingly stupid, so stupid, he deserved to be snared
    and he would have been a liability to have in your life, long-term'.

    The list of colorful comments goes on and on, only a couple of which have anything to do with the actual bug.

    We don't tend to have anything NEARLY as interesting in our internal bug reporting databases (although there have been some quite "interesting" comments in the comments on the threads).  I'm not sure if this is a good thing or not :)

     

     

    BTW: To cut off any discussion about whether or not it would be a good thing for there to be a public Windows bug database, I've not made up my mind.  The bottom line is that I'm not sure if public bug databases can scale to a project the size of Windows.

     

  • Larry Osterman's WebLog

    Reorgalicious

    • 14 Comments
    Brian Valentine may get all his news from the Official Newspaper of the ExchangeWindows division (no Brian, I haven't forgotten you abandoned us :)), but I get all my news from Slashdot :)

    Today, I learned, much to my surprise that I'd been re-orged once again.  Mushroom city, I tell you, we're all mushrooms.

     I've already gotten several emails from my family asking how the re-org is going to affect me.

    And, as always, I've answered back "Not in the least".  I've been here for 21.5 years, and lived through countless reorgs. In all that time, there has only been one or maybe two re-orgs that have affected what I do on a day-to-day basis.  Microsoft just loves re-orgs, they're a fact of life here.

     

    On the other hand, I'm pretty enthusiastic to be working under Steve Sinofsky, I remember when he started as a developer working on MFC, he and I have exchanged emails in the past, he's a pretty cool guy.

    Btw, on the /. article (for those that click on the link), I have NO idea where that 60% thingy came from, as best as I can figure, the site that published the article pulled that information totally out of their hat.  In addition, if you think about it, it's a nonsensical comment.  According to the wikipediaa, Windows contains 40 million lines of code (I have no idea if that's accurate or not).  But assuming that it is, and assuming that Vista had the same amount of code that XP had, that means that Microsoft would be re-writing 24 MILLION lines of code.  In two months (Vista's only slipped for 2 months according to this press release).  Now Microsoft programmers are good, but they aren't THAT good.  Anyone who's ever worked on a project that involves more than a thousand or so lines of code understands how utterly laughable that is.

     

    But that's why I read /., it gives me opportunities to spew Coke all over my monitor :)

     

     

     

    PS: Before everyone asks, yes, I did get all the emails on Wednesday along with everyone else, it was just funnier to wait for the /. article.

     

  • Larry Osterman's WebLog

    Why add a throw() to your methods?

    • 19 Comments

    Towards the end of the comments in my last "What's wrong with this code" , hippietim asked why I added a throw()attribute to the destructor of the CCoInitializer.  The answer's pretty simple.  If you add a throw() attribute around routines that never throw, the compiler can be clever about code motion and optimization.  Consider the following totally trivial:

    class MyClass
    {
        size_t CalculateFoo()
        {
            :
            :
        };
        size_t MethodThatCannotThrow()
    throw()
        {
           
    return 100;
        };
       
    void ExampleMethod()
        {
            size_t foo, bar;
            try
            {
                foo = CalculateFoo();
                bar = foo * 100;
                MethodThatCannotThrow();
                printf(
    "bar is %d", bar);
            }
            catch (...)
            {
            }
        }
    };

     

    When the compiler sees this, with the "throw()" attribute, the compiler can completely optimize the "bar" variable away, because it knows that there is no way for an exception to be thrown from MethodThatCannotThrow().  Without the throw() attribute, the compiler has to create the "bar" variable, because if MethodThatCannotThrow throws an exception, the exception handler may/will depend on the value of the bar variable.

    In addition, source code analysis tools like prefast can (and will) use the throw() annotation to improve their error detection capabilities - for example, if you have a try/catch and all the functions you call are marked as throw(), you don't need the try/catch (yes, this has a problem if you later call a function that could throw).

  • Larry Osterman's WebLog

    What's wrong with this code - part 18, bonus answer

    • 3 Comments

    So the answer to my bonus question was too easy, "hippietim" figured it out in the first comment.  The problem is that the loop:

    for (LONG i = 0 ; i < imageCount ; i += 1)
    {
        hr = images->item(CComVariant(), CComVariant(i), &image);
       
    if (FAILED(hr))
        {

     

    Leaks all the "image" object references except the last one. If you're lucky enough to be running a debug version of the runtime library, the code will assert, but that doesn't always happen.

    The fix, of course is to move the "image" variable to the correct scope.

    for (LONG i = 0 ; i < imageCount ; i += 1)
    {
        CComPtr<IDispatch> image;
        hr = images->item(CComVariant(), CComVariant(i), &image);
       
    if (FAILED(hr))
        {

    Other errors pointed out in the comments (mea culpas): Several people (Aaron, Vladimir, Miral) pointed out that instead of exit()ing, I should have used throw hr;, they're right, I was mixing metaphors.

    There's one other thing that came up in the comments, it's worth its own post (to increase visibility) so I'll post that tomorrow.

     

  • Larry Osterman's WebLog

    What's wrong with this code, part 18, plus a bonus bad code

    • 18 Comments
    So the last "What's wrong with this code" article was dead easy, I knew it was likely that people would find it such.

     

    patria found the answer on the 4th comment, and I think that Mike Dimmick put it best:

    Well, if you're going to use the Resource Acquisition Is Initialization idiom, use it consistently:
     

    CComPtr (an autoptr for COM objects that auto-addref's and releases the object) uses RAII, but the code didn't consistently use RAII  - instead it pretended that it wasn't using RAII.  Since CComPtr never throws, it's easy to treat it as a super pointer, but Mike's right - you have to be careful about lifetime issues, and that's exactly what went wrong in this example.

    The problem is that when you call CoUninitialize, you need to ensure that you've released all references to any COM objects you might hold, if you don't, the DLL that hosts the COM objects is almost certainly going to have been uninitialized.

    So let's present a "fixed" version of the code, using Mike's example of adding an RAII style object to work around the lifetime issue:

    class CCoInitializer
    {
    public:
        CCoInitializer( DWORD dwCoInit )
        {
            HRESULT hr;
            hr = CoInitializeEx( NULL, dwCoInit );
           
    if (FAILED(hr))
            {
               
    throw hr;
            }
        }

        ~CCoInitializer() throw()
        {
            CoUninitialize();
        }
    };

    int _tmain(int argc, _TCHAR* argv[])
    {
        HRESULT hr;
        CCoInitializer coInitializer(COINIT_APARTMENTTHREADED);
        CComPtr<IHTMLDocument2> document;
        CComPtr<IHTMLElementCollection> images;
        CComPtr<IDispatch> image;
        LONG imageCount;

        hr = document.CoCreateInstance(CLSID_HTMLDocument);
       
    if (FAILED(hr))
        {
            exit(hr);
        }
            :
            :
        hr = document->get_images(&images);
       
    if (FAILED(hr))
        {
            exit(hr);
        }
        hr = images->get_length(&imageCount);
       
    if (FAILED(hr))
        {
            exit(hr);
        }
       
    for (LONG i = 0 ; i < imageCount ; i += 1)
        {
            hr = images->item(CComVariant(), CComVariant(i), &image);
           
    if (FAILED(hr))
            {
                exit(hr);
            }
        }
       
    return 0;
    }

    While I was fixing the code, I added a bit of additional stuff.  Unfortunately, the new code introduced yet another bug.

    Btw, for those playing along at home, I know that this doesn't actually work, code to load up the HTML document is in the omitted section :).

    In addition, the absence of code to check for the coInitializer object throwing is NOT a bug.  There's no way of recovering from this exception, and the exception handling paradigm states that if you don't know how to handle an exception, you let your caller handle it.

  • Larry Osterman's WebLog

    What's wrong with this code, Part 18

    • 14 Comments

    This may be the shortest "Bad Code" I've ever done, but it keeps on surprising me how many times I see this problem (people asked me questions about it twice in the past week).

     

    // BadCode18.cpp : Defines the entry point for the console application.
    //

    #include "stdafx.h"
    #include <windows.h>
    #include <tchar.h>
    #include <wininet.h>
    #include <urlmon.h>
    #include <mshtml.h>

    int _tmain(int argc, _TCHAR* argv[])
    {
        HRESULT hr;
        CComPtr<IHTMLDocument2> document;

        hr = CoInitialize(0);
       
    if (FAILED(hr))
        {
            exit(hr);
        }

        hr = document.CoCreateInstance(CLSID_HTMLDocument);
       
    if (FAILED(hr))
        {
            exit(hr);
        }

        CoUninitialize();

       
    return 0;
    }

     

    That's all it takes, I consciously chose not to add stuff to obfuscate the problem.

    Btw, for those who've been reading this blog for a while, I covered this exact same issue in a different form a while ago.

     

  • Larry Osterman's WebLog

    Another year, another post

    • 10 Comments
    Well, this year I didn't miss the anniversary of my first blog post.

    I still can't quite believe it's been two years and over 500 posts (ok, it's only 501, but that's still over 500 :)).  My posting rate's dropped as Vista's getting closer to shipping, I keep letting other things get in the way, but...

    Some of my favorite posts (aka a trip through memory lane):

    One in a million is next Tuesday - an oldie, but a goodie.

    What are these "Threading Models" and why do I care? - a brief introduction to one of the most confusing aspects of COM programming.

    Larry's Rules of Software Engineering #1: Every software engineer should know roughly what assembly language their code generates.

    Larry's Rules of Software Engineering #2: Measuring Testers by test Metrics Doesn't. - this one made it into a book :)

    Me Too! - Bedlam DL3

    How do I divide fractions? - One of the first posts inspired by Valorie, which generated some of the largest number of comments.

    A Parable - Another Valorie inspired post.

    It was 20 years ago today - my 20th anniversary post.

    What does Style Look Like - the last post in my series on programming style - it includes links to the other articles.

    Concurrency - My other major series from last year, which again includes pointers to the other articles in the series.

    How do you play a CD - this is the last in a series of posts I made back in April and May last year where I showed a number of different ways to play the contents of a CD.

    Moving Offices - Again - It's just funny :)

    Remember the Blibbet - Actually I learned the origin of this badge just yesterday. 

    What I did on the 4th of July - proof that Larry makes stupid mistakes.

    Larry goes to Layer Court - a peek into some of the quality processes in Windows.

    Early Easter Eggs and Why no Easter Eggs

    Anyway, that's enough memories :)

     

    I've enjoyed the past two years, and once again, thanks for putting up with me :)

  • Larry Osterman's WebLog

    Fun with names

    • 10 Comments
    The other day, someone sent an email to an internal mailing list asking about a "typo" in the eventvwr.

    It seems they noticed a number of events coming from the "bowser" event source, and they were convinced that it had to be a typo.

     

    Well, it's not :)  The name of the component is bowser, and I wrote it back in NT 3.1...

     

    The bowser is actually the kernel mode portion of the Computer browser service.  It also handles receiving broadcast mailslot messages and handing them.  When I originally described the functionality, my boss at the time (who was rather opinionated) said "What a dog!  Why don't we call it the bowser?" 

    For various technical reasons we didn't want to call the kernel component browser.sys (because it messed up the debugger to have two components with the same name), so the name bowser just stuck.

    Thus was born the name of the "misspelled" system component.  Nowadays the bowser is essentially gone (for instance, I can't find it on my XP SP2 installation), but the name lives on in eventlogs everywhere...

     

  • Larry Osterman's WebLog

    Book Review: Silence on the Wire

    • 5 Comments
    For Christmas, Valorie got me a copy of Michal Zalewski's "Silence on the Wire".  I have a fair amount of respect for Michal as a security researcher, he's done some realy interesting stuff, so I was looking forward to reading it (I have no idea where Valorie found it, I didn't even realize the book existed).

    "Silence on the Wire" describes itself as "a Field Guide to Passive Reconnaissance and Indirect Attacks" (I know that because it's on the front cover of the book).  In it, Michal discusses Information Disclosure vulnerabilities and the various ways that information can leak out from a system, even when that system is protected by a firewall.  He also discusses (although not in as much detail) ways that you can mount indirect attacks against a host.

     

    I finished it a while ago, and found it "interesting".  Overall, it was a reasonably enjoyable read, but I have to be honest and say that I'm not really sure that the book actually met the discription on the cover.  There were also several mysterious (to me) diversions during the course of the book.

    For instance, Chapter 2 starts with a huge discussion about how von Neumann computers work, including how memory gates are assembled, etc.  While   The end of the chapter discusses a way of of using detailed timing analysis to as a covert channel to detect information leaking from sensitive calculation.  The hardware discussion was interesting stuff, I'm not sure why it needed to be in a book on passive analysis (and realistically, Charles Petzold did a better job of it in his book "Code").

    There are similar digressions throughout the book (although none as notable as this one).

    One of my favorite portions of the book was the one with the pretty pictures ;).  In it he discusses a fascinating analysis of the pseudo random number generator that's used to generate TCP/IP sequence numbers.  He showed a series of pictures and some analysis for a series of operating systems, ranging from good to not so good.  I do wish he had used more up-to-date operating systems in his analysis, the book was printed in 2005, but he uses examples from Mac OS 9, and Win98 and NT4, and none from Win2K3, or OS X.

    Some of my problems with the book are:

    While he does a good job pointing out ways information can leak out, he doesn't really provide ways of mitigating the flaws.  That's a shame, because it limits the usefulness of the book IMHO. 

    In addition, he doesn't go back and discuss how vendors have responded to vulnerabilities.  A good example of this is his discussion of the GUID.  As originally designed, GUIDs were tied to a particular network adapter, and Michal discusses some of the issues associated with this.  However, starting in Windows 2000, all UUIDs created no longer have this association with the hardware, he never mentions that fact. 

    This latter issue means that even if a vendor responded and removed a potential vulnerability, a reader won't know about it, which is a shame, because it leads the user to believe that there are unaddressed security issues in the vendors product.

    Overall, I enjoyed reading the book, I found much of the information presented to be fascinating (and a bit scary).

  • Larry Osterman's WebLog

    More OOBE experiences - D&D Online

    • 33 Comments

    Back in January, I wrote about the OOBE of my iRiver H10 player, and I've got another horrid first run story today.

    Daniel's been pestering us to get DnD Online, and yesterday it arrived.  I figured I'd install it for him (to save him the trouble).

     

    Man, talk about hideous first run experiences.  First off, the CD installed SLOWLY. Now the machine I'm running this on isn't the fastest on the planet, but I can rip CDs in way less time than the game installed.  My guess is that they were decompressing data on the fly or something.  But slow installations aren't a huge issue, I know how hard it can be to copy tons of data onto a machine.

    My biggest complaints came when I launched the application.

    It popped up a pretty splash screen, and some status text flashed on the screen about checking for web sites, etc.  Then it hung.  I waited for about 5 minutes and no progress, it just hung.  What was worse is that the app didn't show up in the task manager list so I had to find dndlauncher.exe in taskmgr and kill it manually.

    So I restarted.  This time it started and made it through the initial UI, and presented a new loader screen.  The loader started downloading two versions of the client executable.  That was wierd, the game's only been online for 7 days and there are already 2 new versions of the client available?  No big deal.  One thing I noticed was that the download was SLOW - 10KB per second according to the progress meter.  Looking at the network traffic in taskmgr, it wasn't receiving any data, the client was just slow.

    And then it hung downloading the client executable.  This time it DID have an entry in the taskbar, but I couldn't right click on it to stop it, I had to go back to the task manager to kill it.

    Third try, this time it got through downloading the client programs, and it started patching game data.  There were 50(!) patches available for the game.  Again, this is a game that's been online for all of 7 days, and there were ALREADY 50 patches for game data?  And once again, the launcher hung downloading the patches.  And I'm still getting 10KB/second download speeds.

    Fourth try (I'm getting pretty annoyed at this point), and it starts downloading more patches.  This time, the patches came in quickly - 75KB/second.  My guess is that their load balancing solution on their patch servers doesn't work, and some of the patch machines were overloaded.

    And again the game hung after downloading all the patches.

    The game also installs a notification area icon, this time I clicked on it.  A menu flashed on the screen really quickly, and then disappeared.  So back to taskmgr to kill the launcher app.

    On the 5th time, I was finally allowed to log in and start the game, but still....  4 hangs of the client app that required taskmgr intervention to recover?  10KB/sec download speeds?

     

    And then there's the notification area icon.  By default, the game installs itself into the notification area, and it's set to download game patches every 4 hours.

    Every 4 hours?  They patch this game frequently enough that you need to check for patches EVERY FOUR HOURS?!!

    Mindboggling.

     

    I've not played the game beyond racing through the character creation mechanism, this is Daniel's game to play, I have absolutely no opinions about the relative quality of the game (although it seemed to be very pretty for the 2 minutes I played it)

    I know this is a major new game in its first week or so of retail release, so it's expected that things may be overloaded - there were 10 new characters in the entry area when I logged in, so the game servers are clearly being hammered, but still...

     

  • Larry Osterman's WebLog

    Audio in Vista, the big picture

    • 29 Comments

    So I've talked a bit about some of the details of the Vista audio architecture, but I figure a picture's worth a bunch of text, so here's a simple version of the audio architecture:

    This picture is for "shared" mode, I'll talk about exclusive mode in a future post.

    The picture looks complicated, but in reality it isn't.  There are a boatload of new constructs to discuss here, so bear with me a bit.

    The flow of audio samples through the audio engine is represented by the arrows - data flows from the application, to the right in this example.

    The first thing to notice is that once the audio leaves the application, it flows through a very simple graph - the topology is quite straightforward, but it's a graph nonetheless, and I tend to refer to samples as moving through the graph.

    Starting from the left, the audio system introduces the concept of an "audio session".  An audio session is essentially a container for audio streams, in general there is only one session per process, although this isn't strictly true.

    Next, we have the application that's playing audio.  The application (using WASAPI) renders audio to a "Cross Process Transport".  The CPT's job is to get the audio samples to the audio engine running in the Windows Audio service.

    In general, the terminal nodes in the graph are transports, there are three transports that ship with Vista, the cross process transport I mentioned above, a "Kernel Streaming" transport (used for rendering audio to a local audio adapter), and an "RDP Transport" (used for rendering audio over a Remote Desktop Connection). 

    As the audio samples flow from the cross process transport to the kernel streaming transport, they pass through a series of Audio Processing Objects, or APOs.  APOs are used to provide DSP on the audio samples.  Some examples of the APOs shipped in Vista are:

    • Volume - The volume APO provides mute and gain control.
    • Format Conversion - The format converter APOs (there are several) provide data format conversion - int to float32, float32 to int, etc.
    • Mixer - The mixer APO mixes multiple audio streams
    • Meter - The meter APO remembers the peak and RMS values of the audio samples pumped through it.
    • Limiter - The limiter APO prevents audio samples from clipping when rendering.

    All of the code above runs in user mode except for the audio driver at the very end.

  • Larry Osterman's WebLog

    Hello Mrs. Osterman

    • 6 Comments
    Absolutely nothing technical today, just a shout-out to someone I love :).

    At some point, about 5 or 6 years ago, Valorie decided that it was time for her to go back to school to get her teaching certificate.  It turns out that her college courses didn't quite meet the entrance requirements for the local schools that offer Masters in Teaching (MiT) programs, so about 6 years ago she started taking an almost full time course load at various local schools.  In addition to working six to eight hours a day in our kids classroom, she also took two or three classes  per semester filling in the gaps in her previous degree.

    Two years ago, Valorie started in the MiT program at CityU taking a full time masters course load while continuing to work in the 5/6 classroom (this time as a teacher's aide).

    She's now about 3 months away from graduation, and today she started the final major step in finally receiving her degree - today's her first day as a student teacher.

     

    I know it's been a long hard 6 years for her, I've seen how hard she's worked achieving one of her lifelong goals.

     

    So, if you'll excuse the potentially inappropriate paraphrase:

    "So here's to you Mrs. Osterman"

     

    Congratulations sweetheart, it's been a long road but the end is finally in sight.

     

  • Larry Osterman's WebLog

    Annoying coding tricks

    • 33 Comments
    I'm sure I've linked to The Daily WTF, on it Alex Papadimoulis collects egregious programming mistakes and distributes them one per day.

    This one isn't really that hideous, but  I ran into this construct the other day while working on some stuff and it just flat-out annoys me :) (the code's been heavily sanitized to protect the innocent)

        static BOOL fWasntDoingSomething;
        BOOL fDontDoSomething;

        fDontDoSomething = DecideIfWeShouldntDoSomething();

        fOldValue = InterlockedExchange( &fWasntDoingSomething, fDontDoSomething);
        if ( fOldValue ^ fWasntDoingSomething)
        {
            :
            :
        }
     

    I almost don't even know where to begin on this one.  It's three lines of code (and 2 lines of variable declarations), chock full of badness.

    But the thing that really got my goat (and the thing that caused me to write this post) was the use of XOR when != would just as well.  By using XOR, the author of the code guaranteed that whoever was looking at the code would have to sit and think about what the code was doing - for some reasons, the logic table for XOR isn't sitting at the front of my short-term memory.

    And then there's the variable names.  I don't know about y'all, but I just HATE trying to wrap my head around negative Boolean variables, especially when they're used as double negatives (!fDontDoSomething).  They always make me need to think twice when I see them. 

    Wouldn't it have been SO much better if the code had been:

        static BOOL fWasDoingSomething;
        BOOL fDoSomething, fOldValue;

        fDoSomething = DecideIfWeShouldDoSomething();

        fOldValue = InterlockedExchange( &fWasDoingSomething, fDoSomething);
        if ( fOldValue != fWasDoingSomething)
        {
            :
            :
        }

    ?

    Ah, I feel MUCH better now :)  Venting always helps :)

     

  • Larry Osterman's WebLog

    Psychic Perf Analysis, or "RegFlushKey actually DOES flush the registry key"

    • 19 Comments
    One of Raymond's more endearing features is what he calls "Psychic Debugging", it even made his wikipedia entry (wow, he even has a wikipedia entry, complete with picture :))

    There's a variant of Psychic Debugging called "Psychic Perf Analysis".  It works like this:

    I get an IM from one of Ryan, one of the perf guys. 

    Ryan: "Hey Larry, we just found a great perf bug that caused a 3 second slowdown in Windows boot time"

    Me: "Let me guess, they were calling RegFlushKey in a service startup path."

    <long pause>

    Ryan: "Who told you?"

     

    One of the things people don't realize about RegFlushKey is that it actually flushes the data that backs the registry key (doh!).  Well, flushing the data means that you need to write it to disk, and the semantics of RegFlushKey ensure that the data's actually been committed - in other words, the RegFlushKey API is going to block until all the disk writes needed to ensure that the data backing the key is physically on the disk.  This can take hundreds and hundreds of milliseconds.

    In Ryan's case, it was complicated because the service was calling RegFlushKey from a DllMain function (Doh!) which held the loader lock, which meant that all the other services in that process were blocked, and there were other services that depended on those services, and...  You get the picture, it REALLY wasn't pretty.

    The documentation for RegFlushKey explicitly says that "In general, RegFlushKey rarely, if ever, need be used", and it's right.

    Why did I know that this was a problem?  Well, when we first deployed the new audio stack into Vista, we were blocked from RI'ing into winmain because the audio service degraded the boot time of Windows by 3/4 of a second (yes, we measure boot time performance to the millisecond, and changes that degrade the system boot performance aren't allowed in).  When I looked at the perf logs of the boot process, I noticed a significant number of writes occurring during the start of the audiosrv service.  I chased it down further, and realized that the writes correlated almost perfectly with some code that was modifying the registry.  I dug deeper and discovered a call to RegFlushKey that we had mistakenly added.  Removing the call to RegFlushKey fixed the problem.

  • Larry Osterman's WebLog

    Useful service tricks - Debugging service startup

    • 22 Comments
    For the better part of the past 15 years, I've been working on one or another services for the Windows platform (not always services for windows, but always services ON windows).

    Over that time, I've developed a bag of tricks for working with services, I mentioned one of them here.  Here's another.

    One of the most annoying things to have to debug is a problem that occurs during service startup.  The problem is that you can't attach a debugger to the service until it's started, but if the service is failing during startup, that's hard.

    It's possible to put a Sleep(10000) to cause your service startup to delay for 10 seconds (which gives you time to attach the debugger during start), that usually works, but sometimes service startup failures only happen on boot (for autostart services).

    First off, before you start, you need to have a kernel debugger attached to your computer, and you need the debugging tools for windows (this gets you the command line debuggers).  I'm going to assume the debuggers are installed into "C:\Debuggers", obviously you need to adjust this for your local machine.

    One thing to keep in mind: As far as I know, you need have the kernel debugger hooked up to debug service startup issues (you might be able to use ntsd.exe hooked up for remote debugging but I'm not sure if that will work). 

    This of course begs the next question: "The kernel debugger?  Why on earth do I need a kernel debugger when I'm debugging user mode code?".  You're completely right.  But in this case, you're not actually using the kernel debugger.  Instead, you're running using a user mode debugger (ntsd.exe in my examples) that's running over the serial port using facilities that are enabled by the kernel debugger.  It's not quite the same thing.

    There are multiple reasons for using a debugger that's redirected to a kernel debugger.  First off, if your service is an autostart service, it's highly likely that it starts long before the a user logs on.  So an interactive debugger won't really be able to debug the application.  Secondly, services by default can't interact with the desktop (heck, they often run in a different TS session from the user (this is especially true in Vista, but it's also true on XP with Fast User Switching), so they CAN'T interact with the desktop).  That means that when the debugger attempts to interact with the user, it can't because it flat-out can't because the desktop is sitting in a different TS session.

    There are a couple of variants of this trick, all of which should work.

    Lets start with the simplest:

    If your service runs with a specific binary name, you can use the Image File Execution Options registry key (documented here) to launch your executable under the debugger.  The article linked shows how to launch using Visual Studio, for a service, you want to use the kernel debugger, so instead of using "devenv /debugexe" for the value, use "C:\Debuggers\NTSD.EXE -D", that will redirect the output to the kernel debugger.

     

    Now for a somewhat more complicated version - You can ask the service controller to launch the debugger for you.  This is useful if your service is a shared service, or if it lives in an executable that's used for other purposes (if you use a specific -service command line switch to launch your exe as a service, for example).

    This one's almost easier than the first.

    From the command line, simply type:

    sc config <your service short name> binpath= "c:\debuggers\ntsd.exe -d <path to your service executable> <your service executable options>

     

    Now restart your service and it should pick up the change.

     

    I suspect it's possible to use the ntsd.exe as a host process for remote debugging, I've never done that (I prefer assembly language debugging when I'm using the kernel debugger), so I don't feel comfortable describing how to set it up :(

    Edit: Answered Purplet's question in the comments (answered it in the post because it was something important that I left out of the article).

    Edit2: Thanks Ryan.  s/audiosrv/<your service>/

     

Page 1 of 1 (15 items)