May, 2005

Larry Osterman's WebLog

Confessions of an Old Fogey
  • Larry Osterman's WebLog

    Why does Windows share the root of your drive?

    • 25 Comments

    Out-of-the box, a Windows system automatically shares the root of every hard drive on the machine as <drive>$ (so you get C$, D$, A$, etc).

    The shares are ACL'ed so that only members of the local administrative group can access them, and they're hidden from the normal enumeration UI (they're included in the enumeration APIs but not in the UI (as are all shares with a trailing $ in their name).

    One question that came up yesterday was why Windows does this in the first place.

    The answer is steeped in history.  It goes way back to the days of Lan Manager 1.0, and is a great example of how using your own dogfood helps create better products.

    Lan Manager was Microsoft's first attempt at competing directly with Novell in networking.  Up until that point, Microsoft produced an OEM-only networking product called MS-NET (I have a copy of the OEM adaptation kit for MS-NET 1.1 in my office - it was the first product I ever shipped at Microsoft).

    But Lan Manager was intended as a full solution.  It had a full complement of APIs to support administration, supported centralized authentication, etc.

    One of the key features for Lan Manager was, of course, remote administration.  The server admin could sit in their office and perform any administrative tasks they wanted to on the computer.

    This worked great - the product was totally living up to our expectations...

    Until the day that the development lead for Lan Manager (Russ (Ralph) Ryan) needed to change a config file on the LanMan server that hosted the source code for the Lan Manager product.  And he realized that none of the file shares on the machine allowed access to the root directory of the server!  He couldn't add a new share remotely, because the UI for adding file shares required that you navigate through a tree view of the disk - and since the root wasn't shared, he could only add shares that lived under the directories that were already shared.

    So he had to trudge from his office to the lab and make the config change to the server.

    And thus a new feature was born - by default, Lan Manager (and all MS networking products to this day) shares the root of the drives automatically to ensure that remote administrators have the ability to access the entire drive.   And we'd probably have never noticed it unless we were dogfooding our products.

    Nowadays, with RDP and other more enhanced remote administration tools, it's less critical, but there are a boatload of products that rely on the feature.

    Note1: You can disable the automatic creation of these shares by going to this KB article.

    Note2: The test lead for the Lan Manager product was a new hire, fresh from working at Intel who went by the name of Henry (Brian) Valentine.

  • Larry Osterman's WebLog

    Turning the blog around - End of Life issues.

    • 51 Comments

    I'd like to turn the blog around again and ask you all a question about end-of-life issues.

    And no, it's got nothing to do with Terry Schaivo.

    Huge amounts of text have been written about Microsoft's commitment to platform stability.

    But platform stability comes with an engineering cost.  It gets expensive maintaining old code - typically it's not written to modern coding standards, the longer that it exists, the more heavily patched it becomes, etc.

    For some code that's sufficiently old, the amount of engineering that's needed to move the code to a new platform can become prohibitively expensive (think about what would be involved in porting code originally written for MS-DOS to a 128bit platform).

    So for every API, the older it gets, the more the temptation exists to find a way of ending its viable lifetime.

    On the other hand, you absolutely can't break applications.  And not just the applications that are commercially available - If a customer's line-of-business application fails because you decided to remove an API, you're going to have to put the API back.

    So here's my question: Under what circumstances is it ok to remove an API from the operating system?  Must you carry them on forever?

    This isn't just a Microsoft question.  It's a platform engineering problem - if you're committed to a stable platform (in other words, on your platform, you're not going to break existing applications on a new version of the platform), then you're going to have to face these issues.

    I have some opinions on this (no, really?) but I want to hear from you folks before I spout off on them.

  • Larry Osterman's WebLog

    Why I removed the MSN desktop search bar from IE

    • 16 Comments

    I was really quite excited to see that the MSN Desktop Search Team had finally released the final version of their MSN Desktop Search toolbar.

    I've been using it for quite a while, and I've been really happy with it (except for the minor issue that the index takes up 220M of virtual memory, but that's just VA - the working set of the index is quite reasonable).

    So I immediately downloaded it and enabled the toolbar on IE.

    As often happens with toolbars, the toolbar was in the wrong place.  No big deal, I unlocked the toolbar and repositioned it to where I want it (immediately to the right of the button bar, where it takes up less real-estate).

    Then I locked the toolbar.  And watched as the MSN desktop search toolbar repositioned itself back where it was originally.

    I spent about 10 minutes trying to figure out a way of moving the desktop search bar next to the button bar, to no success.  By positioning it in the menu bar, I was able to get it to move into the button bar when I locked the toolbar, but it insisted on being positioned to the left of the button bar, not the right.

    Eventually I gave up.  I'm not willing to give up 1/4 inch of screen real-estate to an IE toolbar - it doesn't give me enough value to justify the real-estate hit.

    Sorry guys.  I'm still using the desktop search stuff (it's very, very cool), including the taskbar toolbar, but not the IE toolbar.  I hate it when my toolbars have a mind of their own.

    Update: Someone on the CLR team passed on a tip: The problem I was having is because I run as a limited user.  But it turns out that if you exit IE and restart it, the toolbar sticks where you put it!

    So the toolbar's back on my browser.

  • Larry Osterman's WebLog

    How do you know what a particular error code means?

    • 16 Comments

    So you're debugging your program, and all of a sudden you get this wierd error code - say error 0x00000011.  How do you know what that message means?

    Well, one way is to memorize the entire Win32 error return code set, but that's got some issues.

    Another way, if you have the right debugger extension is to use the !error extension - it will return the error text associated with the error.  There's a similar trick for dev studio (although I'm not sure what it is since I don't use the devstudio debugger)

    But sometimes you're not running under windbg or devstudio and you've got a Win32 error code to look up.

    And here's where the clever trick comes in.  You see, there's a complete list of error codes built into the system.  It's buried in the NET.EXE command that's used for network administration.

    If you type "NET HELPMSG <errorno>" on the command line, you'll get a human readable version of the error code. 

    So:

    C:\>net helpmsg 17
    The system cannot move the file to a different disk drive.

    It's a silly little trick, but I've found it extraordinarily useful.

     

  • Larry Osterman's WebLog

    SPOILER WARNING ABOUT THE AMAZING RACE - Yes, there is justice in the world...

    • 21 Comments

    Massive Edit: Spoiler Warning (Sorry about that - I figured front page on MSNBC.COM was enough :()

     

     

     

    Uchenna and Joyce won.

    And they did it with style - even though they were at the finish line, they STILL waited to scrounge enough money to pay their cabbie.

    Wow.  That was awesome.

    And Rob and Amber didn't win :)  The look on Robs face when Uchenna and Joyce got on their plane to Miami was priceless.  I was actually talking to Valorie about how Rob was obnoxious but likable.  When I saw that he wasn't even willing to acknowledge Uchenna on the plane (Amber did), he lost ALL respect.

    Yeah, I'm hooked :)  Go figure.

    Edit: I'm REALLY sorry about the spoiler thingy :(

    Edit2: 2nd try at improving the "spoiler-free-ness" of the post.  Not that it's obvious that I don't do this often.

     

     

  • Larry Osterman's WebLog

    My New Monitor's Here! My New Monitor's Here!

    • 41 Comments

    With apologies to Steve Martin.

    I just got my new monitor (after the whole Office Move thingy I decided I didn't want to move the big monitors again).  It's a Del 2001FP which does 1600x1200 natively.

    Oh man, I don't know WHAT I was thinking of in waiting to get this puppy.  I have a 2001FP at home on my home machine, but I hadn't realized just how nice it was as a work monitor.

    This is one SWEET monitor.  It's SO crisp.

     

    Now all I need to do is to figure out how I can justify a second :)

     

  • Larry Osterman's WebLog

    Error Codes, again...

    • 18 Comments

    One of the tech writers in my group just asked a question about documenting error codes.

    I've written about my feelings regarding documenting error codes in the past, but I've never actually written about what it means to define error codes for your component.

    The critical aspect of error codes is recognition of the fact that error codes are all about diagnosibility. They're about providing enough information to someone to figure out the cause of a problem.  This is true whether you use error codes or exceptions, btw - they're all mechanisms for diagnosing failures.

    Error codes serve two related purposes.  You need to be able to provide information to the developer of an application that allows that developer to diagnose the cause of a failure (or to let the developer of an application determine the appropriate corrective action to take in the event of a failure).  And you need to be able to provide information to the user of the application that hosts your control to allow them to diagnose the cause of a failure.

    The second reason above is why there are APIs like FormatMessage which allow you to determine a string version of system errors.  Or waveOutGetErrorText, which does the same thing for the multimedia APIs (there's a similar mixerGetErrorText, etc).  These APIs allow you to get a human readable error string for any system error.

    One of the basic requirements for any interface is that you define the errors that will be returned by that interface.  It's a fundamental part of the contract (and every interface defines a contract).

    Now your definition of errors can be simple ("Returns an HRESULT which defines the failure") or it can be complex ("When the frobble can't be found, it returns E_FROBLE_NOT_FOUND").  But you need to define your error codes.

    When you define your error codes, you essentially have three choices:

    1. You can choose to simply let the lower level error code bubble up to your caller.
    2. You can choose to define new error codes for your component.
    3. You can completely define the error codes that your component returns.

    There are pros and cons to each of these choices.

    The problem with the first choice is that often times the low level error code is meaningless.  Or worse, it may be incorrect.  A great example of this occurs if you mess up the AEDebug registry key for an application.  The loader will attempt to access this registry key, and if there is an error (like an entry not found), it will bubble the failure up to the caller.  Which can result in your getting an ERROR_FILE_NOT_FOUND error when you try to launch your application, even though the application is there - the problem is that the AEDebug registry key pointed to a debugger that wasn't found.  But bubbling the failure up has killed diagnosibility - the actual problem had to do with the parsing of a registry key, but the caller has no way of knowing that.  This is also yet another example of Joel's Law of Leaky Abstractions - the lower level information leaked to the higher level.

    The problem with the second choice is actually that that it hides the information from the lower level abstraction.  It's just the opposite - sometimes you WANT the abstraction to leak, because there is often useful information that gets lost.  For instance, in the component on which I'm working, RPC_X_ENUM_VALUE_OUT_OF_RANGE, RPC_X_BYTE_COUNT_TO_SMALL, and a couple of other RPC errors are mapped to E_INVALIDARG.  While E_INVALIDARG is reasonably accurate (these are all errors in argument), RPC returned specific information about the failure that hiding the error masks.  So there has been a loss of specificity about the error, which once again hinders diagnosability - it's harder to debug the problem from the error.  On the other hand, the errors that are returned are domain specific.

    The third choice (locking down the set of error codes returned) is what was done in my linked example.  The problem with this is that it locks you into those error codes forever.  You will NEVER have an opportunity to change them, even if something changes underneath.  So when the time comes to add offline storage to your file system, you can't add a "tape index not found" error to the CreateFile API because it wasn't one of the previously enumerated error codes.

    The first is a recipe for confusion, especially when the lower level error codes apply to another domain - what do you do if CreateThread returns ERROR_PATH_NOT_FOUND?  The third option is simply an unmitigated nightmare for the long term viability of your system.

    My personal choice is #2, even with the error hiding potential.  But you need to be very careful to ensure that your choice of error codes is appropriate - you need to ensure that you provide enough diagnostic information for a developer to determine the cause of the failure while retaining enough domain specific information to allow the user to understand the cause of the failure.

    Interestingly enough CLR Exceptions handle the leaky abstraction issue neatly by defining the Exception.InnerException property which allows you to retain the original cause of the error.  This allows a developer attempting to diagnose a failure to see the ACTUAL cause of the failure, while allowing the component to define a failure that's more germane to its problem domain.

  • Larry Osterman's WebLog

    Exceptions as repackaged error codes

    • 15 Comments
    One of the comments on my philosopy of error codes post from last week indicated that all the problems I listed with error codes were solved by exceptions.

    The thing that I think the writer missed is that CLR (and Java) exceptions serve two totally different design patterns w.r.t. error handling.

    You see, CLR exceptions solve both the "how do I report an error" problem, AND the "what information should be contained in my error report" problem.  The first part of the solution has to do with the asynchronous nature of exceptions - any statement can potentially throw an exception, and a caller is expected to catch the exception.  The second part is about what information is carried along with the error information.

    IMHO, the System.Exception object is just another kind of error code object - it's functionally equivalent to an HRESULT combined with the IErrorInfo interface.  It's job is to provide sufficient context to the caller that the caller can determine some kind of reasonable behavior based on the error.

    In fact, you could almost consider an exception hierarchy based off of System.Exception as a modern implementation of an X.400/x.500 OM error structure (X.400/X.500 OM errors are complex nested structures that described the source of the error, suggested recovery modes, etc).

    The interesting thing about x.400/x.500 error codes is that they were sufficiently complicated that they were almost completely unusable.  Most people who manipulated them took the highly complex data structure and mapped it to a simple error code and operated off of that error code.  Why?  Because it was simpler - checking against a particular error code number was far easier than parsing the OM_error structure.

    The good news for the "System.Exception as an uber error code" is that it's relatively easy to determine what kind of error failed from the strong type information that the CLR provides, which means that the "deconstruct the rich information into a simpler version" pattern I just mentioned isn't likely to happen.

    But you should never believe that "exceptions" somehow solve the "how do I return sufficient information to the caller of my API" problem - exceptions per se do not, even though an object hierarchy derived from System.Exception has the potential of solving it.  But the "throw an exception to handle your errors" design pattern doesn't.

    As a trivial counter example, consider the following C++ class (I'm just typing this in, don't expect this to work):

    class Win32Wrapper
    {
       HANDLE Open(LPCWSTR FileName)
       {
            HANDLE fileHandle;
            fileHandle = CreateFile(FileName, xxxx);
            if (fileHandle == INVALID_HANDLE_VALUE)
            {
                throw (GetLastError());
            }

        };
       DWORD OpenError(LPCWSTR FileName, OUT HANDLE *FileHandle)
       {
            *FileHandle = CreateFile(FileName, xxxx);
            if (&FileHandle == INVALID_HANDLE_VALUE)
            {
                return GetLastError();
            }
            else
            {
                return NO_ERROR;
            }
        };
    };

    This rather poorly implemented class has two methods.  One of them uses exception handling for error propagation, the other returns the error code.  The thing is that as far as being able to determine corrective action, the two functions are totally equivalent.  Neither of them give the caller any information about what to do about the error.

    So claiming that the "throw an exception as a way of reporting an error" paradigm somehow solves the "what should I do with this error" problem is naive.

  • Larry Osterman's WebLog

    Moving offices - again

    • 19 Comments

    Editors Note: This was posted last Thursday evening, and was promptly lost in a blog rollback.  Apologies to those who have already read it.

    I moved my office today.

    I hate office moves.

    I don't know how many times I've done it (it's been well over a dozen, maybe as many as 20).  Moves, like reorgs, are simply a fact of life at Microsoft.   As we grow (either by hiring new people or by constructing new buildings), people get shuffled around.  Sometimes they're small (one or two people).  Sometimes they're huge (over a thousand people).  The really big ones can take three or four days to accomplish.  We had one move in Exchange where they literally emptied everything from the building, moved it all into the parking garage, and moved it all back to the new location.  Microsoft tries not do the three or four day moves, because they lose work days during the move, for big moves, they try to do them over the weekend.

    Most of the time, I get a fairly long time in my new office - usually over a year, once I actually had three years in a single office.  But sometimes I don't quite get that long.  My shortest time was two weeks (but I've already told that story).

    I've always hated office moves.

    This time I'd really just gotten settled into the new office (I moved there back in December) - I had only 3 boxes left to unpack from the last move when I got told that I'd be moving yet again.

    This move actually a good thing - our team lost 5 offices a couple of weeks ago so we needed to move people up to some vacant offices upstairs.  In addition, one of the leads from our team left the division to take a really awesome job over in COSD (Core OS Division).  So his office, a north facing office on the 4th floor with a panoramic view of Mt. Baker and the Cascades was available.

    Since four of the people who I work with the most are also on the 4th floor, this move made a huge amount of sense - the people I work with most will be closer to me (except my boss and a couple of the other people I work with daily are still on the other floor).

    But that doesn't change the fact that I hate office moves.

    When I move offices, I usually pack 14 moving boxes worth of stuff.  I've got it down to a system at this point - I pack up off&on during the week before the move, and I find someone in my building who isn't moving.  On the day of the move, I pack up the rest of my stuff (usually just a final box), and move all my Lego models and my artwork into the non-moving person's office.  The final thing is to power down all my equipment and put tags on everything (including the speakers and the mice).  I also make sure I get my chair and anti-static mat - the movers have forgotten it sometimes.

    Then I leave for the night (or weekend, depending on the size of the move).  Overnight, gremlins come in move my stuff to the new office and plug everything in.

    The next morning, I come in, refreshed from a good nights sleep and start by fixing the things the gremlins got wrong (you can count on gremlins getting something wrong - maybe it's the KVM being plugged into the wrong monitor plug on the video adapter, maybe it's the speakers from my test machine being plugged into my dev machine (or maybe it's both sets of speakers plugged into my test machine)).  You can never quite tell what's going to happen but it's almost always trivial to fix.  I also make sure that I got all my boxes (sometimes they get lost or misplaced).  I unpack the final box I packed during the move (since it's invariably the stuff I use the most). 

    I then go to the non-moving person, get my Legos and set them up again.  Sometimes this can be a pain, because I have a bookshelf that's dedicated to the legos - but it's got three shelves in it, and the corporate furnature standards say you only get two shelves in a bookshelf - so I've got to scrounge up a 3rd shelf somewhere.  But I get the legos put up, hook up my office boombox (a 15 year old Yamaha office stereo that's still going strong), and I'm good to go.

    Next, over the next couple of weeks, I unpack my boxes, and distribute the stuff out where it belongs (manipulative puzzles go on the guest table, books on the other bookshelf, you get the idea).

    All in all, it's a pain in the neck but it really only costs me about a half a day of absolute downtime (plus any extra days off from multi-day moves) - the rest is aggregated over enough working days that I don't care.  It's a pain, but it's not THAT big a deal.

    But I still hate office moves.

    Especially this one, which was a phone-only move.

    You see, a phone-only move is one where the IT department moves your phone (they did that a day early too :)) but that's it.  You're responsible for getting your stuff to where it goes.

    What this means is that I spent all day today moving my stuff from my old office to my new office.  Back and forth with load after load of stuff.  No boxes, because for a phone-only move, they don't give you boxes.  You've got to schlep your stuff yourself.  I started just after I posted todays blog post and finished at 7:30 this evening.

    Fortunately for me, someone in the area of my old office had brought their personal handtruck into work - this thing is a thing of beauty - not only is it a normal handtruck, it's got 4 wheels and a tray that snaps on turning it into a two level cart.

    It was also convenient that the service elevator is about 20 feet from my old office and about 10 feet from my new one.  That made things easier. But the fire doors on the service elevator kept on closing.  And I couldn't prop them up because they were fire doors.

    That came in real handy when I was moving the Star Destroyer.  It just barely fit on the cart but it was far better than carrying it by hand (I've had to do that a couple of times).  But the bookcase didn't fit, which meant I had to carry it by hand (not a big deal, it was just bulky once I took the shelves off).

    The books and Legos went up before lunch, the afternoon was spent hauling computers and setting them up again (again, the handcart was my friend - I've got a 20 inch CRT and a 19 inch CRT and they're HEAVY).  By about 5:30PM, I was totally wiped - towards the end of the afternoon, I was dripping sweat like it was coming out of a leaky faucet (sorry if that's TMI).  Valorie and Daniel came at about 6:30 to help me move the last bits of stuff.  I'd still not be done if they hadn't come and provided that last burst of energy.

    On the other hand, except for some framed artwork that I need to put up tomorrow, everything I had up in my old office is up in the new office.  And I've got some more wall space to put up some stuff that wasn't up in the old office, like my dead mouse collection.

    I still need to spend a bit of time working on the new office to make sure that I can fit everything - right now, for example, my laptop doesn't have a home, and I need to fix that.  I also need to clear out some room - the desk is too cramped for my tastes right now (and I need to fiddle with the height of it and my monitor).

    And I still hate office moves.  Especially phone-only office moves.  The ONLY saving grace that they have is that my office is now totally set up - I don't have to do the unpacking thingy, which is nice.

    Before someone asks, if I'm moving from one building to another (and I haven't done that since 2003), I take the Legos home with me and bring them back the next day.  I've not yet had to do that with the Star Destroyer, I think it'll survive the trip (it's fairly well architected).  I've moved the Statue of Liberty five times now without too many horrible mishaps (although I did have to rebuild her torch twice before I realized how weak her arm was). 

    I forgot - did I mention that I hate office move?  Just wanted to make sure everyone knew that.

     

  • Larry Osterman's WebLog

    Larry's Rules of software engineering, Part 4 - Writing servers is easy, writing clients is HARD.

    • 24 Comments

    Over the past 20 years or so, I've written both (I wrote the first NT networking client and I wrote the IMAP and POP3 servers for Microsoft Exchange), so I think I can state this with some authority.  I want to be clear - it's NOT easy to write a server - especially a high performance server.  But it's a heck of a lot easier to write a server than it is to write a client.

     

    Way back when, when I joined the NT project (back in 1989ish), my job was to write the network file system (redirector) for NT 3.1.

    Before that work item was assigned to me, it was originally on one of the senior developers on the team's plate. The server was assigned to another senior developer.

    When I first looked at the schedules, I was surprised.  The development schedule for both the server AND the client was estimated to be about 6 months of work.

    Now I've got the utmost respect for the senior developers involved.  I truly do.  And the schedule for the server was probably pretty close to being correct.

    But the client numbers were off.  Way off.  Not quite an order of magnitude off, but close.

    You see, the senior developer who had done the scheduling had (IMHO) forgotten one of the cardinal rules of software engineering:

    Writing servers is easy, writing clients is hard.

    If you think about it for a while, it actually makes sense.  When you're writing a server, the work involved is just to ensure that you implement the semantics in the specification - that you issue correct responses for the correct inputs.

    But when you write a client, you need to interoperate with a whole host of servers.  Each of which was implemented to ensure that it implements the semantics in the specification.

    But the thing is, the vast majority of protocol specifications out there don't fully describe the semantics of the protocol.  There are almost always implementation specifics that leak through the protocol abstraction.  And that's what makes the life of a client author so much fun. 

    These leaks can be things like the UW IMAP server not allowing more than one connection to SELECT a mailbox at a time when the mailbox was in the MBOX format.  This is a totally reasonable architectural restriction (the MBOX file format doesn't allow the server to support multiple clients simultaneously connecting  to the mailbox), and the IMAP protocol is mute on this (this is not quite true: there are several follow-on RFCs that clarify this behavior).  So when you're dealing with an IMAP server, you need to be careful to only ever use a single TCP connection (or to ensure that you never SELECT the same mailbox on more than one TCP connection).

    They can be more subtle.  For example the base HTML specification doesn't really allow for accurate placement of elements.  But web site authors often really want to be able to exactly place their visual elements.  Some author figured out that if you insert certain elements in a particular order, they can get their web site laid out in the form they want.  Unfortunately, they were depending on ambiguity in the HTML protocol (and yes, HTML is a protocol).  That ambiguity was implemented in one way with one particular browser. 

    But every other browser had to deal with that ambiguity in the same way as the first browser if they wanted to render the web site properly.  It's all nice and good to say to the web site author "Fix your darned code", but the reality is that it doesn't work.  The web site author might not give a hoot about whether the site looks good for your browser, as long as it looks good on the browser that's listed on the site, they're happy campers. 

    The server (in this case the web site author) simply pushes the problem onto the client.  It's easier - if the client wants to render the site correctly, they need to be ambiguity-for-ambiguity compatible with the existing browser.

    Ambiguity is a huge part of what makes making clients so much fun.  In fact, I'm willing to bet that every single client for every single network protocol implemented by more than one vendor has had to make compromises in design forced by ambiguities in the design of the protocol (this may not be true for protocols like DCE RPC where the specification is so carefully specified, but it's certainly true for most other protocols).  Even a well specified protocol like IMAP has had 114 clarifications made to the protocol between RFC 2060 and RFC3501 (the two most recent versions of the protocol).  Not all the clarifications were to resolve ambiguities (some resolved spelling errors and typos), but the majority of them were to deal with ambiguities.

    Clients also have to deal with multiple versions of  a protocol.  For CIFS clients, the client needs to be able to understand how to talk to at least 7 different versions of the protocol, and they need to be able to implement their host OS semantics on every one of those versions.  For the original NT 3.1 redirector, more than 3/4ths of the specification for the redirector was taken up with how each and every single Win32 API would be implemented against various versions of the server.  And each and every one of those needed specific code paths (and test cases) in the client.  For the server, each of the protocol dialects was essentially the same - you needed to know how to implement the semantics of the protocol on the server's OS. 

    For the client, on the other hand, you had to pick and choose which of the protocol elements was most appropriate given the circumstances.  As a simple example, for the IMAP protocol, clients have two different access mechanisms - you can access the messages in a mailbox by UID or by sequence number.  UIDs have some interesting semantics (especially if the client's going to access the mailbox offline), but sequence numbers have different semantics.  The design of the client heavily depends on this choice - there are things you can't do if you use UIDs but there's a different set of things you can't do if you use sequence numbers.  It's a really tough design decision that will quite literally reflect the quality of your client - is your IMAP client nothing more than a POP3 client on steroids, or does it fully take advantage of the protocol?  Another decision made by clients: Do they fetch the full RFC 2822 header from the server and parse it on the client, or do they fetch only the elements of the header that they're going to display?

    So when you're thinking about writing networking software, just remember the rule:

    Writing servers is easy, writing clients is hard.

    You'll be happy you did.

  • Larry Osterman's WebLog

    Grocery Wars

    • 0 Comments

    While we were going to the lecture on Monday, Todd Bowra mentioned a Star Wars parody called (Grocery) Store Wars.

     

    Coincidentally, someone sent Valorie a link to it yesterday (go figure that).  So, in honor of tomorrows debut, I present...

    Store Wars

  • Larry Osterman's WebLog

    What's wrong with this code, part 12 - Retro Bad Code

    • 23 Comments
    I was at a security tech talk last week discussing some fascinating stuff, and it reminded me of an interview question that my manager used to give to people who said that they understood x86 Assembly language.

    I realized that it would make an interesting "retro" "what's wrong with this code" so, here goes.

     

    When you're writing code for low level platforms (REALLY low level platforms), one of the first things you need to start handling is processor interrupts.  On the x86 family of processors, when an interrupt is generated (either hardware or software), the processor generates a trap frame with the following information (I may have the CS and IP backwards, check your processor manual):

    Your code is now executing, but you're running on the application's stack.  So the very first thing that has to happen in your interrupt handler is that you've got to switch to your own stack (you need to do this because you don't know how much memory is remaining on the user stack - if you overflow the user's stack, you've got problems.

    So you'd have to write code that switches from the user's stack to your kernel mode stack.

    My boss used to use an example that was the system call dispatcher for a theoretical operating system, which maintained a 32bit pointer to the kernel mode stack in a global variable, I'll continue that tradition.


        <code to establish DS>
        MOV    [Saved_SS], SS
        MOV    [Saved_SP], SP
        MOV    SP, WORD PTR [Dos_Stack]
        MOV    SS, WORD PTR [Dos_Stack]+2
        <code to dispatch off of the value in AH>

    The problem is that there's a MASSIVE bug in this code.  And it's not because of the use of global variables for the saved SS and SP - assume that the reentrancy issues are handled in the <code to establish DS> section above.

  • Larry Osterman's WebLog

    Threat Modeling, Part 3 - Process

    • 3 Comments
    Continuing the discussion on threat modeling that I started in this post (and continued in this one)...

    There's a critical third part of threat modeling, and that's the process of threat modeling.

    Threat modeling is a discipline - you need to start the threat modeling process early in your feature's lifetime.

    Hold on, what's a feature?  I keep on referring to it, but I don't ever define it.  For this discussion, a "feature" is just that - it's a feature of a product.  Features can be coarse grained (The "My Pictures" shell area), or they can be fine grained (adding transparent PNG support to Trident).  A feature can affect one source file, it could affect a hundred different DLLs, it's up to your development team to determine what level of granularity you want to work on.

    For each team, there's a "natural" boundary for a feature, but it's ultimately up to the team doing the design to decide what that boundary is.  If you make it too fine grained ("Make a new folder'" vs. "Rename this folder" vs "Copy this folder") you may end up doing too much work (which will reduce the overall quality of the product).  On the other hand, if you make it too broad ("ntoskrnl.exe") you're likely to have more components than are manageable.  The bottom line is that your team needs to make a call about what you threat model.

    And this quandary leads to the unfortunate thing about defining the process for threat modeling.  While the end product (the threat model) is fairly well defined, the process is squishy.  There is no "one true way" of going about performing threat models, for each development team, the process is slightly different.

    For example, on some teams, a program manager collects the entrypoints and interfaces and writes the DFDs.

    On other teams, the entire threat modeling process is handled by a single developer.  On other teams, the responsibility is split - the PM does the entrypoints and interfaces, and a developer writes the DFDs. 

    Still other teams spread the responsibility around, requiring each developer and PM to do the DFDs for the entrypoints for which they're responsible.

    It all depends on the dynamics of your team - if your PMs are tightly tied into the development process (or if you don't have PMs at all), then it might make sense to have them handle some of the work.  But maybe not...   IMHO, If your PM doesn't know how the internals work, it's probably not a good idea to offload the process to them, let a developer who can write do the work.

    Once you've decided how you're going to apportion the workload of generating the threat model, you need to start the process.  Start by enumerating your entrypoints, then drill down to figure out what assets are involved.  Then go back and see if you need more entrypoints (or assets).  The threat modeling process is inherently an iterative process, in the beginning, if you haven't come up with a new entrypoint or a new asset on each meeting, you're likely not looking at the problem in "the right way".

    One thing that every group I've encountered has done during the threat model process is what I call the "Big Threat Brainstorming Meeting".  The BTBM comes fairly late in the threat modeling process, when the rest of the threat model is relatively mature - you've already had three or four iterations in a smaller scope (maybe a couple of people), and you think you've got most of the entrypoints and assets enumerated and the DFDs completed.  In the BTBM, you get the entire development team for the feature - both dev, test and PM and the team brainstorms the threats against the various pieces of the feature.  You iterate down each of the entrypoints and resources for the feature, and try to figure if there are threats against them.  For each threat, you need to identify the entrypoint and resource associated with the threat (even at this point, you may still find new entrypoints or protected resources), and spend some time figuring out if the threats are mitigated or not.

    In many ways, the "BTBM" is the core of the threat modeling process.  You really must engage the entire team for this one, because everyone has a slightly different view of their component, and it's not always clear what the interactions between the different components are - the developers who actually own the code have a much better idea than the person who's writing the threat model.

    Once you've had the "big meeting" you still need to write up the results of the meeting and generate threat trees for all the threats that are discovered.  And then it's time for yet another iteration of the threat model review.  You may need to do a second generation BTBM, it depends on how comfortable you are with the completeness of your threat model.

    Eventually, the changes per iteration damp down, and you've got a finished threat model, right?

    Well, no.  Actually you don't.  What you have is an approximation of a finished threat model.  Because as you make changes to the code during your march to ship, the design is going to change.  And when the design changes, then the threat model has to change too.  You need to ensure that you change the design of any of the entrypoints to your feature, you go back and revisit the threat model to ensure that the new design hasn't changed it.  But the really cool thing about having the threat model there, while you're making the changes is that updating the threat model forces you to revisit the design change, and that's going to make sure that you think about the security ramifications of your fix WHEN you make the fix, not after you shipped.

    I can't say how cool running through this process is.  When we had our BTBM the other day, we came up with a boatload of threats (almost all mitigated) AND we found a bunch of vulnerabilities that we hadn't considered before.  It was an invaluable experience for everyone involved.  As a result of this, we have a much higher probability that we really do understand the threats to our component.

    And now we get to do it all over again for our deliverables for the next milestone.  Yay!

     

  • Larry Osterman's WebLog

    End of Life Issues

    • 23 Comments
    Wow.  Yesterday I asked y'all when it was ok to end-of-life an API.

    I'm astonished at the quality of the responses.  For real.

    Almost everyone had great ideas about it - some were more practical than others but everyone had great ideas. 

    Several people suggested a VM based solution, that's an interesting idea, but somewhat impractical with the current state of technology.  And it's beyond the scope of what I'm trying to discuss.  In particular, I was wondering what the criteria were for end-of-lifeing an API.  Also, Mike Dimmick pointed out one of the huge problems with VM solutions - it doesn't work for plug-ins.

    A number of people talked about end-of-lifeing with the release of a new platform.  But the problem there is that the Windows appcompat contract is a BINARY contract, not a source level contract.  So we can't release a new version of Windows that doesn't support APIs, since that would break existing applications.

    A couple of people suggested that open-source would be the solution to the problem.  But open sourcing old APIs that doesn't fix the problem, it just pushes it onto someone else.  And by releasing the source to a high level API, it locks down the interface in at a lower level, which is almost always a bad idea (because it removes your architectural flexibility - what happens if there are four different APIs that use a single low level API, you want to end-of-life one of them but keep the other three.  If you open source that one, then you freeze the low level API, which removes your ability to innovate the other three APIs (this is a grotesque simplification of the issues, I may write more on this one later).

    Michael Kaplan wrote about a possible solution, but that doesn't really solve how you REMOVE APIs - it just describes how you can make changes without removing the APIs.

    At least one person said that the software wasn't relevant that it was the data that mattered.  At some level, that's right, but it's a naive view of the world - the reality is that for most businesses, if their line-of-business application doesn't work, they don't care if their data hasn't been lost - their business is still just as shut down.  And many businesses DON'T have the source for their LoB applications.  Or they don't have the money to bring those applications up-to-date.  It can cost millions and millions of dollars to update a suite of LoB applications.  And those updates invariably involve UI changes, which means retraining costs.  Which most businesses aren't willing to take on.  So the existing apps have to continue to run.

    Dana Epp was one of the first people to pick up on what I think is the most critical aspect of end-of-lifeing an API - you need to have a credible alternative to the API - one that offers comparable functionality with similar ease of use.  If you don't have an easy alternative for customers to adopt, then they won't adopt it.

    But (IMHO) the most important thing that everyone missed (although Dana came close) has to do with the binary contract.  You must keep existing binaries working.

    And that means that your ONLY opportunity for removing an API comes when you know that every application running on your system is going to be recompiled.  That happens when you switch hardware platforms.  So there have really only been three opportunities in the life of Win32 to do this - Alpha, PPC, X64.

    Any other time has the possibility of breaking applications.

    So to me, an API can be removed if:

    1. You can guarantee that there are no existing applications that call the API.
    2. You have a credible alternative API that is comparably easy to use.

    Criteria 1 only occurs with the release of a new hardware platform, Criteria 2 is just a result of careful planning.

    While I was doing the "How to play a CD" series, Drew noticed that my sample didn't work on his version of x64.  Why?

    Well, because we thought we could end-of-life the MCI APIs for x64.  It turns out that virtually all of the MCI API set has been replaced with newer, more functional APIs and there were some architectural issues with the MCI APIs that made it attractive to remove them.  So we did.

    And then we realized that we'd messed up.  Why?  Audio CD playback.

    It turns out that we didn't have a reasonable alternative to the MCI APIs for CD playback - everything else had credible replacements but not CD playback.  And it turns out that a number of apps that were being ported to x64 relied on the MCI APIs for CD audio playback.

    So we put all the MCI APIs back before we released, and addressed the architectural issues.  Because appcompat trumps architecture every time.

    Having said that, a number of the people making comments were absolutely right - for Longhorn, there WILL be some scenarios that won't continue to work, as a result of some of the high level architectural changes we're making.  But we're being a lot more careful about ensuring that those scenarios truly ARE corner cases.

    Having said all this stuff about when it's ok to end-of-life code, I MUST include Adrian's comment verbatim:

    A lot of the comments here are looking at this from the point of view of somebody with existing applications upgrading to a newer version of the OS. It's an an important considertation, but it's not the one I am faced with day to day.

    As developers, we rarely get to choose which platforms we target. The market determines that. I and many other Windows developers I know are building *new* applications, but the market demands that they run on older platforms as well as the "current" ones. A non-trivial number of people still run Windows 98 and NT 4, especially as you look at the international market.

    The users of an obsolete API are not necessarily applications that are five or ten years old. They may be brand new applications that rely on an API that was been superceded long ago because it's available across all of the platforms that must be supported.

    Sure, sometimes your application will dynamically use a newer, better API if it's available or fall back to the old one if it's not. But when the old API is sufficient and universal, it will often be used rather than the "current" one, even in sparkling new code.

    Consider Avalon, a whole new presentation layer. How long will it take for it to completely replace GDI? If I'm writing a new general-purpose application today, I *have* to use GDI. Even if I were targeting a release date that coincides with Longhorn, I couldn't afford to ignore Windows XP, Windows 2000, Windows NT 4.0, Windows Me, and Windows 98. (In reality, we even ensure that the basic functionality of our app works even in Windows 95 and NT 3.1.) If I had unlimited resources, I *might* try to develop a parallel GDI/Avalon version. But when is the last time you were on a development project with unlimited resources?

    I don't have a general answer to Larry's question, but using GDI as an example, I'd say you could retire it when Longhorn (or perhaps XP with Avalon extensions) is used about as much as Windows 98 is used today. That's probably four releases after Longhorn, or more than a decade away.

    Adrian's spot-on - it's not enough that we ensure that the API has been superseded.  We also need to ensure that the replacement either is available on previous platforms or has been around for enough time that it it's a credible alternative.

     

     

  • Larry Osterman's WebLog

    Playing audio CDs, part 11 - Why isn't my sample ready for prime time?

    • 7 Comments

    As I mentioned in my previous post, the code I've provided will play back audio CDs.  But it's not ready for prime time yet.

    There are four major problems with the code.

    First off, the error handling isn't 100% up-to-snuff.  In particular, there are several error checks that are missing (if an allocation fails in the constructor of the CDRomReadData function, for example).  The error handling in general isn't robust enough for production code, and if there are failures during the read loop, the memory used to hold buffers is leaked.  That was an intentional omission to keep the size of the code down.

    Next, the code totally ties up the callers thread.  That means it can't easily be adopted for use except in very limited scenarios.  And it's totally unsuitable for use in a Windows application (or a non windows application that uses STA apartment model COM objects).  For production systems that almost always a complete non starter.

    Third, the code as written is utterly laptop unfriendly.  Laptops have a rather different power consumption profile than desktop systems.  For a laptop, battery life is king.  And, because of the laws of physics, that means that anything that involves moving parts is bad.  The bigger the moving part, the worse it is.  So writing code that involves moving heads on a hard disk is bad, but writing code that keeps a relatively heavy CD spinning in a CD drive is even worse.  For a laptop, it's far better to either spin the CD up once and transfer the entire audio data into memory, or to read the CD at 1x - slowly stepping the heads forward.  The absolute worst thing you can do is to spin up the drive and let it spin down again repeatedly - it doesn't take that much power to keep the disc spinning, but starting the disc up is horribly expensive.

    The final version of this sort-of does the 1x read thingy - it moves the heads slowly forward.  But that means that the CD is spinning for the entire length of the track, which leads to increased power consumption.  With two or three buffers, this isn't that bad (again, it doesn't take too much power to keep the drive spinning).  But if you increase the CDROM_READAHEAD_DEPTH too high, you can actually get into a situation where playing back the audio samples takes so long that the CD drive decides to spin down the disk.  And that means that the next read, the drive spins up again.   And that's bad.  On a laptop, the "read it all into memory" version may actually be better from a power standpoint (it may cause the system to page however, which is bad - there are always trade-offs).

    The fourth reason that this code isn't ready for prime time is what is known as "stitching".

    You see, audio CDs were never designed to be played in a computer.  Instead, they were intended to be played back in a commercial CD player, with a simple track next/track previous command.  On those devices, it wasn't critical that data be able to be read reliably.

    So it turns out that if you ask a CD-ROM drive to read bytes on the audio CDfrom block 253962 to 253972, you might actually get the contents of blocks 253961 to 253971, or the contents of blocks 253963 to 253973.  You can't predict what the actual data that's read from the disk will be. This limitation doesn't happen to CD-ROM disks because CD-ROMs were designed for accurate location.

    Because you can't reliably read the data, you need to "stitch" together the samples you read with the samples that you last read.  That involves DSP and sample matching logic that's beyond my ability to describe in the context of a blog post.  But essentially the idea is that you match the samples at the start of the incoming data block with the samples at the end of the previous data block and look for overlaps.  If you find overlaps, then you slide the incoming block over until the overlapped samples line up.

    If you don't really care about full fidelity rendering, then this probably doesn't matter.  But if you do, then you need to care about this issue.

    Apple's got a pretty good description of their implementation of DAE and stitching (which showed up in Mac OS9) in their Tech Note 1187

    And to be fair - Anonymous posted a pretty good description of the stitching issue in their comment on the first post in this series.

  • Larry Osterman's WebLog

    Playing Audio CDs, part 9 - Fixing Glitches

    • 9 Comments
    When we last left playing back audio, we had playback working, but it was glitching like CRAZY - literally every packet read had a glitch during playback.

    So lets see if we can fix the glitching problem.

    As I mentioned yesterday, the root cause of the glitches is that we're waiting on the wave playback to complete - we're doing all our I/O through a single buffer.

    So what happens if you don't wait for the wave writes to complete?  It's a simple change to make, maybe it will help with playback.

     

    HRESULT CDAENoWaitPlayer::PlayTrack(int TrackNumber)
    {
        HRESULT hr;
        HANDLE waveWriteEvent = CreateEvent(NULL, FALSE, FALSE, NULL);
        HWAVEOUT waveHandle = OpenWaveForCDAudio(waveWriteEvent);
        if (waveHandle == NULL)
        {
            return E_FAIL;
        }

        TrackNumber -= 1; // Bias the track number by 1 - the track array is )ORIGIN 0.

        CAtlList<CDRomReadData *> readDataList;

        for (DWORD i = 0 ; i < (_TrackList[TrackNumber]._TrackLength / DEF_SECTORS_PER_READ); i += 1)
        {
            CDRomReadData *readData = new CDRomReadData(DEF_SECTORS_PER_READ);
            if (readData == NULL)
            {
                printf("Failed to allocate a block\n");
                return E_FAIL;
            }
            readData->_RawReadInfo.DiskOffset.QuadPart = ((i * DEF_SECTORS_PER_READ) + _TrackList[TrackNumber]._TrackStartAddress)*
                                                          CDROM_COOKED_BYTES_PER_SECTOR;
            readData->_RawReadInfo.TrackMode = CDDA;
            readData->_RawReadInfo.SectorCount = DEF_SECTORS_PER_READ;
            hr = CDRomIoctl(IOCTL_CDROM_RAW_READ, &readData->_RawReadInfo, sizeof(readData->_RawReadInfo), readData->_CDRomData,
                                readData->_CDRomDataLength);
            if (hr != S_OK)
            {
                printf("Failed to read CD Data: %d", hr);
                return hr;
            }
            MMRESULT waveResult;
            readData->_WaveHdr.dwBufferLength = readData->_CDRomAudioLength;
            readData->_WaveHdr.lpData = (LPSTR)readData->_CDRomData;
            readData->_WaveHdr.dwLoops = 0;

            waveResult = waveOutPrepareHeader(waveHandle, &readData->_WaveHdr, sizeof(readData->_WaveHdr));
            if (waveResult != MMSYSERR_NOERROR)
            {
                printf("Failed to prepare wave header: %d", waveResult);
                return HRESULT_FROM_WIN32(waveResult);
            }
            waveResult = waveOutWrite(waveHandle, &readData->_WaveHdr, sizeof(readData->_WaveHdr));
            if (waveResult != MMSYSERR_NOERROR)
            {
                printf("Failed to write wave header: %d", waveResult);
                return HRESULT_FROM_WIN32(waveResult);
            }
            //
            // Add this buffer to the end of the read queue.
            //
            readDataList.AddTail(readData);

            //
            // See if the block at the head of the list is done. If it is, free it and all the other completed
            // blocks at the head of the list.
            //
            // Because we know that the wave APIs complete their data in order, we know that the first buffers in the list
            // will complete before the last - the list will effectively be sorted by completed state
            //
            while (!readDataList.IsEmpty() && readDataList.GetHead()->_WaveHdr.dwFlags & WHDR_DONE)
            {
                CDRomReadData *completedBlock;
                completedBlock = readDataList.RemoveHead();
                waveOutUnprepareHeader(waveHandle, &completedBlock->_WaveHdr, sizeof(completedBlock->_WaveHdr));
                delete completedBlock;
            };
        }
        //
        //    We're done, return
        //
        return S_OK;
    }

    So what changed?  First off, instead of allocating a single block, we allocate a CDRomReadData object for every single read.  We take advantage of the fact that the wave APIs queue their writes and simply drop the request into the waveOutWrite API when the data's been read.

    Since the blocks of data are much smaller than the length of the track, the odds are high that we'll be done with some of the blocks before we finish reading the audio track, so we use the fact that the wave header has a flag that indicates that the wave writer is done with a block to let us know when it's ok to free up the block.

    So when I tested this version of the playback, the glitches were gone (good!).  But the playback stopped after a relatively short time - certainly before the end of the track (I was using Ravel's Bolero as my test case - the clarinet solo at the beginning of the crescendo is a great test to listen for glitches).  But Bolero's about 15 minutes long, and the playback was finishing up after about 1 minute or so.

    Why?  Because my CDROM drive is faster than the audio playback - the audio data had been read and queued for the entire track, but it hadn't finished playing back.  If you think about it, reading the data was done at 48x (or whatever the speed of the CDROM drive is), but the playback is done at 1x.

    So we need to add one more piece to the routine - just before we return S_OK, we need to put the following loop:

        //
        // Now drain the requests in the queue.
        //
        while (!readDataList.IsEmpty())
        {
            if (readDataList.GetHead()->_WaveHdr.dwFlags & WHDR_DONE)
            {
                CDRomReadData *completedBlock;
                completedBlock = readDataList.RemoveHead();
                waveOutUnprepareHeader(waveHandle, &completedBlock->_WaveHdr, sizeof(completedBlock->_WaveHdr));
                delete completedBlock;
            }
            else
            {   
                Sleep(100);
            }
        };

    This loop waits until all the queued reads complete (and frees them).

    But there's still a problem with the code - on my machine, playing Bolero, there were 4056 CDRomReadData objects on the readDataList queue that had to be freed during the final rundown list.  The PlayCDWMP application took up 156M of memory.  Essentially we'd read all the audio data into memory during the read process.  Not good.  Note that we didn't LEAK the memory (I fixed that problem) - we know where every bit of the memory is.  But we've allocated WAY more than we should have.

    Next time, lets see if we can work on fixing the memory management problem.

     

    Edit: Raymond gently reminded me :) that Sleep(0) throws the system into a CPU bound loop, so changed the sleep to actually sleep for some time.

    Edit2: For those that complained, added the waveOutUnprepareHeader call.

     

  • Larry Osterman's WebLog

    Playing Audio CDs, part 8 - Simple DAE Playback

    • 11 Comments
    Ok, time to get down and dirty in the "CD Playback" series.

    Up until now, we've just been reading metadata from the CD.  Now it's time to read the actual audio data and play it back.

    First, a bit about playback.  To do the playback, we'll be using the waveOutXxx APIs.  There are a boatload of multimedia APIs available, but the reality is that for a task this simple, the wave APIs are probably the best suited for the work.

    There are four wave APIs that we care about here: waveOutOpen, waveOutWrite, waveOutPrepareHeader and waveOutUnprepareHeader.  We won't use waveOutUnprepareHeader in this example because we never free the buffer in question - we always use the same buffer for wave writes.  waveOutOpen opens the wave device for rendering with a specified audio format (in this case, 44.1kHz, stereo, 16 bits/sample), waveOutPrepareHeader sets a buffer up for writing, and waveOutWrite queues the buffer to the internal wave playback queue (all wave buffers are queued when you call waveOutWrite, and are each rendered in turn).

    So on with the code.

    First off, we've got to add a new class to hold the data read from the CDROM, the CDRomReadData.

    struct CDRomReadData
    {
        CDRomReadData(DWORD SectorsPerRead)
        {
            _CDRomDataLength = SectorsPerRead*CDROM_RAW_BYTES_PER_SECTOR;
            _CDRomAudioLength = SectorsPerRead*CDROM_RAW_BYTES_PER_SECTOR;

            _CDRomData = new BYTE[_CDRomDataLength];
            ZeroMemory(&_RawReadInfo, sizeof(_RawReadInfo));
            ZeroMemory(&_WaveHdr, sizeof(_WaveHdr));
            _WaveHdr.dwBufferLength = _CDRomDataLength;
            _WaveHdr.lpData = (LPSTR)_CDRomData;
            _WaveHdr.dwLoops = 0;
        }
        ~CDRomReadData()
        {
            delete []_CDRomData;
        }
        WAVEHDR _WaveHdr;
        RAW_READ_INFO _RawReadInfo;
        DWORD _CDRomDataLength;
        DWORD _CDRomAudioLength;
        BYTE * _CDRomData;
    };
    The WAVEHDR structure is used to hold state data for the waveOutWrite API.  The RAW_READ_INFO structure is used by the CD ROM driver to hold information about CD reads.

    Next, we need a function to open the wave device:

    HWAVEOUT CDAESimplePlayer::OpenWaveForCDAudio(HANDLE EventHandle)
    {
        WAVEFORMATEX waveFormat;
        waveFormat.cbSize = 0;
        waveFormat.nChannels = 2;
        waveFormat.nSamplesPerSec = 44100;
        waveFormat.wBitsPerSample = 16;
        waveFormat.nBlockAlign = waveFormat.nChannels * waveFormat.wBitsPerSample;
        waveFormat.nAvgBytesPerSec = waveFormat.nSamplesPerSec*waveFormat.nBlockAlign/8;
        waveFormat.wFormatTag = WAVE_FORMAT_PCM;
        HWAVEOUT waveHandle;

        MMRESULT waveResult = waveOutOpen(&waveHandle, WAVE_MAPPER, &waveFormat, (DWORD_PTR)EventHandle, NULL,
                                            CALLBACK_EVENT | WAVE_ALLOWSYNC | WAVE_FORMAT_DIRECT);
        if (waveResult != MMSYSERR_NOERROR)
        {
            printf(_T("Failed to open wave device: %d\n"), waveResult);
            return NULL;
        }
        //
        // Swallow the "open" event.
        //
        WaitForSingleObject(EventHandle, INFINITE);
        return waveHandle;
    }

    There's at least one "tricky" bit here.  The function takes a pointer to an auto-reset event that's used to signal when the wave operation completes - this gets used later on in the process.  We also hard code the CD audio format - 44,100 samples per second, 16 bits per sample, stereo.  The "tricky" bit comes with the call to WaitForSingleObject - the Wave APIs will set the event to the signalled state whenever there is a "wave message" that occurs.  Since one of the messages (WOM_OPEN) is generated on any wave opens, we have to swallow that event before we return - otherwise the caller would be out of step with the wave driver.

    And now, finally, what we've all been waiting for: CD Audio Playback.

    HRESULT CDAESimplePlayer::PlayTrack(int TrackNumber)
    {
        HRESULT hr;
        HANDLE waveWriteEvent = CreateEvent(NULL, FALSE, FALSE, NULL);
        HWAVEOUT waveHandle = OpenWaveForCDAudio(waveWriteEvent);
        if (waveHandle == NULL)
        {
            return E_FAIL;
        }

        CDRomReadData *readData = new CDRomReadData(DEF_SECTORS_PER_READ);
        for (DWORD i = 0 ; i < (_TrackList[TrackNumber]._TrackLength / DEF_SECTORS_PER_READ); i += 1)
        {
            readData->_RawReadInfo.DiskOffset.QuadPart = ((i * DEF_SECTORS_PER_READ) + _TrackList[TrackNumber]._TrackStartAddress)*
                            CDROM_COOKED_BYTES_PER_SECTOR;
            readData->_RawReadInfo.TrackMode = CDDA;
            readData->_RawReadInfo.SectorCount = DEF_SECTORS_PER_READ;
            hr = CDRomIoctl(IOCTL_CDROM_RAW_READ, &readData->_RawReadInfo, sizeof(readData->_RawReadInfo), readData->_CDRomData,
                                                readData->_CDRomDataLength);
            if (hr != S_OK)
            {
                printf("Failed to read CD Data: %d", hr);
                return hr;
            }
            MMRESULT waveResult;
            readData->_WaveHdr.dwBufferLength = readData->_CDRomAudioLength;
            readData->_WaveHdr.lpData = (LPSTR)readData->_CDRomData;
            readData->_WaveHdr.dwLoops = 0;

            waveResult = waveOutPrepareHeader(waveHandle, &readData->_WaveHdr, sizeof(readData->_WaveHdr));
            if (waveResult != MMSYSERR_NOERROR)
            {
                printf("Failed to prepare wave header: %d", waveResult);
                return HRESULT_FROM_WIN32(waveResult);
            }
            waveResult = waveOutWrite(waveHandle, &readData->_WaveHdr, sizeof(readData->_WaveHdr));
            if (waveResult != MMSYSERR_NOERROR)
            {
                printf("Failed to write wave header: %d", waveResult);
                return HRESULT_FROM_WIN32(waveResult);
            }
            //
            // Wait until the wave write completes.
            //
            WaitForSingleObject(waveWriteEvent, INFINITE);
        }

        return S_OK;
    }

    Some things to note: First off, the error checking in this code is horrendous.  It leaks memory, and doesn't check for memory allocation failures.  But the loop is extremely straighforward - it simply opens a wave device, then loops for the number of blocks in the track reading each block, and handing it to the wave APIs to play back.  Note the call to WaitForSingleObject at the bottom of the loop, that's waiting on the WOM_DONE message that's generated whenever the wave write completes, we need to ensure that the wave write completes before we read the next block into the buffer.

    The code is gross, but it DOES play the data on the CD.  However, you compile the code, you'll notice that it glitches like crazy.  The reason for that is really simple: We're doing everything synchronously, and that means that we don't have any opportunity to overlap the wave writes with the CD reads.  And that stinks.

    Tomorrow, we'll start to fix the problem.

    Edit: Fixed typo in destructor (it's a nop, but people are complaining...).

    Edit2: Fixed waveOutOpen WAVEFORMATEX structure nAvgBytesPerSecond calculation.  The original version was bits/second, not bytes/second.  Thanks Elliot!

  • Larry Osterman's WebLog

    Playing Audio CDs, part 10 - Glitch Free, Low Memory

    • 9 Comments
    So yesterday I wrote an example that removed the glitching from my DAE CD playback example.

    But it had some major drawbacks - for example, it consumed huge amounts of system memory, and had absolutely horrendous latency problems - if you wanted to pause playback, you would have to wait for all 10 minutes worth of queued audio samples had played before the pause would take effect.

    Is it possible to rewrite the example to save memory and improve latency?

    Of course there is (otherwise why would I be writing this?).  The key is to notice that by the time a block has finished playing, the player has had time to read the next block - you don't need a block for every read, you can instead recycle the read blocks.

    And that brings us to the next version of the PlayTrack method.

    HRESULT CDAENoWaitLowMemPlayer::PlayTrack(int TrackNumber)
    {
        HRESULT hr;
        HANDLE waveWriteEvent = CreateEvent(NULL, FALSE, FALSE, NULL);
        MMRESULT waveResult;
        CDRomReadData *readData = NULL;
        HWAVEOUT waveHandle = OpenWaveForCDAudio(waveWriteEvent);
        if (waveHandle == NULL)
        {
            return E_FAIL;
        }

        TrackNumber -= 1; // Bias the track number by 1 - the track array is )ORIGIN 0.

        CAtlList<CDRomReadData *> readDataList;

        for (DWORD i = 0 ; i < CDROM_READAHEAD_DEPTH ; i += 1)
        {
            readData = new CDRomReadData(DEF_SECTORS_PER_READ);
            if (readData == NULL)
            {
                printf("Failed to allocate a block\n");
                return E_FAIL;
            }

            readData->_WaveHdr.dwBufferLength = readData->_CDRomAudioLength;
            readData->_WaveHdr.lpData = (LPSTR)readData->_CDRomData;
            readData->_WaveHdr.dwLoops = 0;

            waveResult = waveOutPrepareHeader(waveHandle, &readData->_WaveHdr, sizeof(readData->_WaveHdr));
            if (waveResult != MMSYSERR_NOERROR)
            {
                printf("Failed to prepare wave header: %d", waveResult);
                return HRESULT_FROM_WIN32(waveResult);
            }
            readData->_WaveHdr.dwFlags |= WHDR_DONE;
            readDataList.AddTail(readData);
        }

        for (DWORD i = 0 ; i < (_TrackList[TrackNumber]._TrackLength / DEF_SECTORS_PER_READ); i += 1)
        {
            //
            // Get a free block from the read queue. Since WAVE writes complete in order, the queue is sorted by wave write completion status.
            // If the head of the queue isn't done, spin waiting until it IS done.
            //
            while (true)
            {
                if (!readDataList.IsEmpty() && readDataList.GetHead()->_WaveHdr.dwFlags & WHDR_DONE)
                {
                    readData = readDataList.RemoveHead();
                    break;
                }
                else
                {
                    Sleep(10); // Sleep for a bit to release the CPU.
                }
            };
            //
            //  Read the data from the disk.
            //
            readData->_RawReadInfo.DiskOffset.QuadPart = ((i * DEF_SECTORS_PER_READ) + _TrackList[TrackNumber]._TrackStartAddress)*
                                                                                         CDROM_COOKED_BYTES_PER_SECTOR;
            readData->_RawReadInfo.TrackMode = CDDA;
            readData->_RawReadInfo.SectorCount = DEF_SECTORS_PER_READ;
            hr = CDRomIoctl(IOCTL_CDROM_RAW_READ, &readData->_RawReadInfo, sizeof(readData->_RawReadInfo),
                            readData->_CDRomData, readData->_CDRomDataLength);
            if (hr != S_OK)
            {
                printf("Failed to read CD Data: %d", hr);
                return hr;
            }
            //
            // Write it to the audio device.
            //
            waveResult = waveOutWrite(waveHandle, &readData->_WaveHdr, sizeof(readData->_WaveHdr));
            if (waveResult != MMSYSERR_NOERROR)
            {
                printf("Failed to write wave header: %d", waveResult);
                return HRESULT_FROM_WIN32(waveResult);
            }
            //
            // And add this buffer to the end of the read queue.
            //
            readDataList.AddTail(readData);
        }
        //
        // We're done playing, drain the requests in the queue.
        //
        while (!readDataList.IsEmpty())
        {
            if (readDataList.GetHead()->_WaveHdr.dwFlags & WHDR_DONE)
            {
                CDRomReadData *completedBlock;
                completedBlock = readDataList.RemoveHead();
                waveOutUnprepareHeader(waveHandle, &completedBlock->_WaveHdr, sizeof(readData->_WaveHdr));
                delete completedBlock;
            }
            else
            {
                Sleep(100);
            }
        };

        return S_OK;
    }

    This version uses significantly less memory - in fact, it's pretty glitch free with CDROM_READAHEAD_DEPTH set to 2 (I thought I'd need 3 buffers for this example, but two seems to work (but there may be glitches on startup)).  It also improves the latency problem - at no time are more than CDROM_READAHEAD_DEPTH blocks worth of data are queued to the wave writer.  So if you pause playback, the playback will stop quickly.

    I've also done a bit of restructuring the code to clarify the relationship between the buffer and the waveOutPrepareHeader/waveOutUnprepareBuffer API.  The actual inner loop simply grabs a buffer from the queue of ready buffers (the readDataList), reads the audio data, calls waveOutWrite on the data and adds the block back to the queue.

    I took a small liberty of overusing the WHDR_DONE flag in the code that prepares the loop - I turn the bit on on newly allocated buffers to pretend that they've been played - this makes the loop that pulls the blocks from the queue easier.

    I was taken to task in the previous version for not calling waveOutUnprepareBuffer, the commenters were right, even though the waveOutUnprepareBuffer is functionally a NOP on every supported version of Windows, it's more complete to include it in the code.

    I do want to stress that this is NOT production code though.  Tomorrow, I'll write a bit about what it would take to change this simple example into something that could be used in a production system.

  • Larry Osterman's WebLog

    More CD Audio Trivia

    • 6 Comments

    A Co-worker pointed this out to me:

    http://www.cdrfaq.org/faq02.html#S2-29:

    The general belief is that it was chosen because the CD designers wanted to have a format that could hold Beethoven's ninth symphony. They were trying to figure out what dimensions to use, and the length of certain performances settled it.

    There are several different versions of the story. Some say a Polygram (then part of Philips) artist named Herbert von Karajan wanted his favorite piece to fit on one disc. Another claims the wife of the Sony chairman wanted it to hold her favorite symphony. An interview in the July 1992 issue of _CD-ROM Professional_ reports a Mr. Oga at Sony made the defining request. (This is almost certainly Norio Ohga, who became President and COO of Sony in 1982 and has been a high-level executive ever since.)

    The "urban legends" web site has some interesting articles for anyone wishing to puruse the matter further. The relationship of Beethoven's ninth to the length is noted "believed true" in the alt.folklore.urban FAQ listing, but no particular variant is endorsed.

    http://www.urbanlegends.com/misc/cd/cd_length_skeptical.html http://www.urbanlegends.com/misc/cd/cd_length_karajan.html http://www.urbanlegends.com/misc/cd/cd_length_origin.html

    Another entry:

    http://www.snopes2.com/music/media/cdlength.htm

    Searching the net will reveal any number of "very reliable sources" with sundry variations on the theme.


    He also pointed me to: http://www.chipchapin.com/CDMedia/cdrom3.php3, which is a great primer on CD audio, including why CD data sectors are 2048 bytes while CD audio sectors are 2352 bytes.

     

  • Larry Osterman's WebLog

    Playing Audio CDs, part 7 - DAE Table of contents.

    • 14 Comments
    So now this series comes to the "fun" part, DAE.

    DAE stands for "Digital Audio Extraction", it means reading the raw audio data from the CDROM.

    Over the next couple of articles, this will turn into the most complicated code I've ever attempted to drop into the blog, so bear with me - this turns into a bit of a wild ride.

    The first thing to know about DAE is that to be able to use it, you need the DDK.  The code in this example depends on NTDDCDRM.H, which contains the definitions for the CDROM IOCTLs.  One other HUGE caveat: The IOCTLs defined here are subject to change - they're based on preliminary documentation, so YMMV.

     

     

    So, with that, here's the initialization and table of contents reading logic:

    #define CD_BLOCKS_PER_SECOND 75 // A useful constant that's NOT in the DDK

    DWORD MSFToBlocks( UCHAR msf[4] )
    {
        DWORD cBlock;

        cBlock =
            ( msf[1] * ( CD_BLOCKS_PER_SECOND * 60 ) ) +
            ( msf[2] * CD_BLOCKS_PER_SECOND ) +
            msf[3];

        return( cBlock - 150);
    }

    MSFToBlocks converts from a MSF array (4 bytes, representing Hours, Minutes, Seconds and Frames, where a frame is a sample of audio data) into a block count.  It assumes that there are always 0 hours in an MSF array (which apparently is true on CD audio tracks).  Before anyone asks, I'm not sure where the 150 comes from.

    HRESULT CDAESimplePlayer::OpenCDRomDrive(LPCTSTR CDRomDrive)
    {
        HRESULT hr = S_OK;
        _CDRomHandle = CreateFile(CDRomDrive, GENERIC_READ, FILE_SHARE_READ|FILE_SHARE_WRITE, NULL, OPEN_EXISTING,
                                  FILE_FLAG_OVERLAPPED|FILE_ATTRIBUTE_NORMAL, NULL);
        if (_CDRomHandle == INVALID_HANDLE_VALUE)
        {
            hr = HRESULT_FROM_WIN32(GetLastError());
            printf("Error %x opening CDROM drive %s", hr, CDRomDrive);
        }

        return hr;
    }

    OpenCDRomDrive opens the CDRom drive.  Please note that we're opening the file for overlapped access, this is important later on in the series.  The format of a CDRom drive string is "\\.\<drive letter>:".


    HRESULT CDAESimplePlayer::CDRomIoctl(DWORD IOControlCode,
                                         void *ioctlInputBuffer,
                                         DWORD ioctlInputBufferSize,
                                         void *ioctlOutputBuffer,
                                         DWORD &ioctlOutputBufferSize)
    {
        HRESULT hr = S_OK;
        OVERLAPPED overlapped = {0};
        overlapped.hEvent = CreateEvent(NULL, TRUE, FALSE, NULL);
        if (overlapped.hEvent == NULL)
        {
            hr = HRESULT_FROM_WIN32(GetLastError());
            printf("Error %d getting event handle\n", hr);
            goto Exit;
        }

        if (!DeviceIoControl(_CDRomHandle, IOControlCode, ioctlInputBuffer, ioctlInputBufferSize, ioctlOutputBuffer, ioctlOutputBufferSize,
             &ioctlOutputBufferSize, &overlapped))
        {
            hr = HRESULT_FROM_WIN32(GetLastError());
            if (hr == HRESULT_FROM_WIN32(ERROR_IO_PENDING))
            {
                if (!GetOverlappedResult(_CDRomHandle, &overlapped, &ioctlOutputBufferSize, TRUE))
                {
                    hr = HRESULT_FROM_WIN32(GetLastError());
                    printf("Error %d waiting for CDROM IOCTL\n", hr);
                }
                else
                {
                    hr = S_OK;
                }
            }
            else
            {
                printf("Error %d IOCTLING CDROM\n", hr);
            }
        }
    Exit:
        if (overlapped.hEvent)
        {
            CloseHandle(overlapped.hEvent);
        }
        return hr;
    }

    Please note the use of a static overlapped to turn the asynchronous IOCTL to a synchronous IOCTL.  Otherwise this is just a wrapper around the DeviceIOControl API.  


    HRESULT CDAESimplePlayer::Initialize()
    {
        DWORD driveMap = GetLogicalDrives();
        int i;
        CString DriveName;
        for (i = 0 ; i < 32 ; i += 1)
        {
            if (driveMap & 1 << i)
            {
                DriveName.Format(_T("%c:"), 'A' + i);
                if (GetDriveType(DriveName) == DRIVE_CDROM)
                {
                    break;
                }
            }
        }

        _CDRomDriveName= CString(_T("\\\\.\\")) + DriveName;

        return S_OK;
    }

    Initialize is easy - just walk through the drives until you hit one that's a CD ROM drive, then remember the drive letter.

    HRESULT CDAESimplePlayer::DumpTrackList()
    {
        HRESULT hr;
        hr = OpenCDRomDrive(_CDRomDriveName);
        if (hr != S_OK)
        {
            printf("Failed to open CDRom Drive %s: %x\n", _CDRomDriveName, hr);
            goto Exit;
        }
        CDROM_TOC tableOfContents;
        DWORD tocSize = sizeof(tableOfContents);
        hr = CDRomIoctl(IOCTL_CDROM_READ_TOC, NULL, 0, (void *)&tableOfContents, tocSize);
        if (hr != S_OK)
        {
            printf("Failed to read CDRom Table of contents: %x\n", _CDRomDriveName, hr);
            goto Exit;
        }

        for (int i = tableOfContents.FirstTrack - 1 ; i < tableOfContents.LastTrack ; i += 1)
        {
            CString trackName;
            DWORD trackLengthInBlocks = MSFToBlocks(tableOfContents.TrackData[i+1].Address) - MSFToBlocks(tableOfContents.TrackData[i].Address);
            DWORD trackLengthInSeconds = trackLengthInBlocks / CD_BLOCKS_PER_SECOND;
            DWORD trackLengthInMinutes = trackLengthInSeconds / 60;
            DWORD trackLengthInHours = trackLengthInMinutes / 60;
            DWORD trackLengthFrames = trackLengthInBlocks % CD_BLOCKS_PER_SECOND;
            DWORD trackLengthMinutes = trackLengthInMinutes - trackLengthInHours*60;
            DWORD trackLengthSeconds = trackLengthInSeconds - trackLengthMinutes*60;

            trackName.Format(_T("Track %d, Starts at %02d:%02d:%02d:%02d, Length: %02d:%02d:%02d:%02d"), tableOfContents.TrackData[i].TrackNumber,
                                        tableOfContents.TrackData[i].Address[0],
                                        tableOfContents.TrackData[i].Address[1],
                                        tableOfContents.TrackData[i].Address[2],
                                        tableOfContents.TrackData[i].Address[3],
                                        trackLengthInHours,
                                        trackLengthMinutes,
                                        trackLengthSeconds,
                                        trackLengthFrames
                                );
            printf("%s\n", trackName);
            CDRomTrack track;
            track._TrackStartAddress = MSFToBlocks(tableOfContents.TrackData[i].Address);
            track._TrackNumber = tableOfContents.TrackData[i].TrackNumber;
            track._TrackControl = tableOfContents.TrackData[i].Control;
            track._TrackLength = MSFToBlocks(tableOfContents.TrackData[i+1].Address) - track._TrackStartAddress;

            this->_TrackList.Add(track);
        }
    Exit:
        return hr;
    }

    Ok, now the meat of the code - we issue the IOCTL_CDROM_READ_TOC IOCTL to the drive and retrieve a table of contents structure.

    The TOC contains the basic information about the drive - the track number of the first and last track, plus an array of TRACK_DATA structures.  The TRACK_DATA structure's where all the fun is. We only really care about the start address of each track - the rest isn't significant for this example.

    One thing to note is that there's an implicit array overrun - while the TOC runs from the first track to the last track, the track data actually runs to one more than the last track - that's to allow the length calculation of the last track to work correctly.

    The calculation of the track length is more tortuous than it needs to be, I was trying to set it up so that someone stepping through it in the debugger (me :)) would be able to see what was going on.

    Tomorrow, I'll start with the playback logic.  Today was easy, tomorrow, it starts getting complicated.

  • Larry Osterman's WebLog

    Tracing the Journey of Man

    • 10 Comments

    Nothing technical today, sorry :(

    Went to a fascinating lecture last night with friends of the family (the Bowras).  It was a part of National Geographic's "National Geographic Live!" series.  The NG Live series is a series of three or four lectures given every spring that presents National Geographic lecturers presenting information about stuff on which the society is working.  The Bowra's have invited me to come along with them to a couple of the previous lectures in the series, the one with Robert Ballard (as in the guy who found the Titanic) was especially fascinating.

    Last night's lecture was Spencer Wells presenting the materiel in his book (and NatGio special) Tracing the Journey of Man.

    It was a fascinating lecture.  Spencer Well's is one of the people who did the research that determined that all humans descend from a pool of approximately 2000 individuals who lived in Africa sometime about 60,000 years ago.

    He figured this out by measuring the amount of genetic variation in human beings - what I hadn't realized is that humans are very different from other primates - the genetic diversity in most primates is around 25%, while humans have only a two or three percent genetic diversity.  The only way that they could explain this is that there had to have been a tiny pool of original ancestors for all humans alive on the earth.

    By tracing genetic markers that live on the Y chromosome, his team was also able to determine that human beings actually spread from two different locations - one in Africa, the other, about a thousand or so years later from Australia.  His team figured that about 60,000 a group of people left Africa and followed the coastline across the Indian subcontinent and then crossed a land bridge into Australia.  From there, humans spread out from both population centers to cover the world (that took about 15,000 years).

    One of the things I loved was how he figured this spread out.  He couldn't find any oral tradition of a great journey in the tales of the Australian Aborigines, so he figured that if they HAD made the journey, he'd be able to find genetic markers along the route.

    And in fact, when he went to India and started sampling the population, he discovered a genetic marker that the people he sampled shared in common with the Aborigines.  And that marker dated from somewhere about 55,000 years ago.

    So he deduced a testable theory from his hypothesis, then performed the experiment and confirmed it.  Very, very cool.

    My favorite moment was during the Q&A question when someone asked: "Have you had any resistance from local authorities to your research?  And can you present it in Kansas?".  That got a HUGE round of applause.

    My biggest complaint about it was that the last 1/3 of the lecture was essentially a sales pitch for NatGio's new Genographic Project.

  • Larry Osterman's WebLog

    Daniel got the part!

    • 7 Comments

    Somehow I forgot to mention that we just got notice that Daniel was cast in the Seattle Children's Theater summer season production of the musical "Honk!".  They'll be doing 5 weeks of rehearsals and then 4 days of production afterwards.

    Daniel was in last summers production of "Joseph and the Amazing Technicolor Dreamcoat", I'm looking forward to seeing what part he gets in Honk!.

    Over the summer, if you get a chance, it's absolutely worth watching the SCT drama school shows.  They do pretty challenging productions, for example, they're doing "As you Like It" this summer.   They also do some original stuff (including "Tommorrow was Better", which was partly written by Daniel (and others) during SCT's Original Works class in the fall semester.

    And if you've got kids from about 3-15 years old, then I STRONGLY encourage you to stop by one of SCT's mainstage productions.  These are absolutely unbelievable.  This year's just about run out but next year is going to be great.  These guys do SERIOUS drama for kids.  It's not "theater by children", it's "theater for children".  They hire some of the most talented actors and directors in the Seattle area and they put on truly challenging theater.  SCT is, without a doubt, one of the (if not the) top children's theater in the country.  They do Shakespeare, they do musicals, and they do a LOT of original works.   Some of their stuff is fluff, some of it is deadly serious.  But all of it is high quality.

    Edit: Forgot to include my normal plug for SCT, thanks for reminding me Adi.

     

  • Larry Osterman's WebLog

    No post today, prepping for my presentation :( <EOM>

    • 4 Comments

    And community server seems to believe that I need to provide content.

    So, to make it happy, I'll provide content.

    It won't be interesting content, but it WILL be content.

    Sorry about that :)

     

  • Larry Osterman's WebLog

    Too busy to write something interesting today

    • 5 Comments

    I'm busy preparing for a techtalk on our new Longhorn feature to be given in my building at 1PM in the conference room on the Okanogon room.  Any MS people who see this should feel free to attend.

    This is a warmup for my Friday DevTalk at the Olympic Room on the 27th at 11:30AM.

    Sorry for the short notice.

     

  • Larry Osterman's WebLog

    What's wrong with this code, part 12 - Retro Bad Code Answers

    • 8 Comments

    In the last article, I looked at a prototype code snippet to enter a system call.

    But the code had a bug in it (no, really?  Why would I be asking what was wrong with it otherwise)?

    Not surprisingly, it wasn't that hard to find, Peter Ibbotson found it in the first comment - if you set SP before you set SS, then you introduce a window where a hardware interrupt could occur which would pre-empt your code and trash random pieces of user memory.

    Several people quite correctly pointed out that writing to the SS segment would lock out interrupts for the next instruction, which would inherently protect the MOV SP instruction.

    But in reality, the answer is a bit subtler than that.

    You see, you can prevent hardware interrupts from occurring simply by turning off the "allow interrupts" flag by issuing a CLI instruction - that will disable all hardware interrupts (software interrupts don't matter, since you own the code)..

    And the x86 architecture mandates that after a software or hardware interrupt occurs the interrupt flag is disabled.  So the code in question is ALREADY called with interrupts disabled.

    So why is all this important?

    Because there's one interrupt that is NOT disabled by the CLI instruction, that's the NMI (or Non Maskable Interrupt).  You can't disable NMI's, under any circumstances.

    So how can you switch stacks if an NMI could come along and interrupt your code?  Well, that's where the MOV SS behavior comes into play.  While the NMI interrupt can't be disabled, it CAN be deferred - and the MOV SS sequence defers the NMI interrupt until after the NEXT instruction has finished executing.

    Btw, Universalis mentioned in the comments of the last post that this behavior wasn't present on the 8088, my version of the 8088 hardware reference manual states differently:

     "A MOV (move) to segment register instruction and a POP segment register instruction are treated similarly: No interrupt is recognized until after the following instruction.  This mechanism protects a program that is changing to a new stack (by updating SS and SP).  If an interrupt were recognized after SS had been changed, but before SP had been altered, the processor would push the flags, CS and IP onto the wrogn area of memory.  It follows from this that whenever a segment register and another value must be updated together, the segment register should be changed first followed immediately by the instruction that changes the other value.

    So that's why the MOV SS needs to come first.  But why did people care, given that NMI's weren't' that common anyway?

    Well, it turns out that one very well known OEM produced a product with a wireless keyboard (and little square keys) that tied the keyboard interrupt to the NMI line on the processor.  So every time the user hit a key on a keyboard they would be generating an NMI.

    Another clever issue with interrupts had to do with a bug in (I believe) the first steppings of the 286 processor (it might have been the 8088 though).  As I'd mentioned before, when you executed an interrupt, the interrupt handler was called with interrupts disabled.  But this processor had a bug in it - if an interrupt (software, of course) occurred with interrupts disabled, then the processor would enable interrupts briefly during the interrupt translation.

    So you had a situation where you could get a hardware interrupt executed even though you'd disabled interrupts.  Not pretty at all.  And before people ask, no, I don't remember how one worked around it :(

    Kudos: Peter Ibbotson for being the first, but everyone else commenting pretty much agreed with him. 

Page 1 of 2 (26 items) 12