Larry Osterman's WebLog

Confessions of an Old Fogey
Blog - Title

The dreaded "beeping death"

The dreaded "beeping death"

Rate This
  • Comments 11

Anyone who's been at Microsoft for long enough (long enough to use DOS on a day-to-day basis) remembers the deadly "beeping death".

The "beeping death" was an artifact of the MS-NET product that we deployed for networking here at Microsoft, and I was the developer responsible for the "beeping death".

What was the beeping death?  Well, it occurred because a confluence of about whole lot of different parts of the networking system.

First, you need to understand how connection oriented network protocols work (this is a VERY rough description).  When the client sends a request to the server, the server acknowledges receipt of the package with a packet called an "ACK".  In addition, on a connection oriented protocol, the connection itself is kept alive by the client periodically sending a message (called a "keep-alive") to the server - that way, even if there's no network traffic, the client will eventually discover if the server has gone away.

Secondly, the MS-NET product (and the DOS Lan Manager product after it) used the NetBIOS API layer to talk to the network adapter.  But in reality, it didn't.  The MS-NET product instead talked to an abstract networking API layer called the "session" layer, which was a part of the MS-NET product.  From an API standpoint, it was extremely similar to the NetBIOS API layer, but it wasn't quite the same.

Third, there were two different implementations of the session layer.  One (session.exe) was a sample version that was shipped with the OEM kit for MS-NET.  The other was called minses.exe.  Minses.exe provided a minimal session layer that was intended to interface with NetBIOS.  So it functioned as a mapping layer between the MS-NET components and the actual networking stack.

Now one of the cool features of minses was that on synchronous networking calls, the minses would beep the PC speaker while the call was outstanding.  That would let the user know that the system was still thinking about their request, and it hadn't forgotten them.

Fourth, the Microsoft corporate network at the time was (and still is, to my knowledge) the largest, most complicated corporate network on the planet.  We have branch offices in hundreds of countries, there are hundreds of thousands of computers on the network, it's REALLY big network (Raymond tells this story about the network back in the 1990s).  It's a REALLY big, really complicated network.  That means that there are a bazillion failure points on the network, which means that connectivity often went down.

And finally, the networking solution we used back then was based on Ungermann-Bass smart network cards.  These cards were pretty cool actually - when you started the system, the OS downloaded the entire network stack onto the card, which mean that system memory didn't get consumed by the networking stack.  With this fifth piece, the networking guys reading this should start saying "Uh oh"...

 

Now that I've set the stage for the confluence of features, lets see what happens when this system gets deployed in real life..

In the normal case, everything works fine - you never ever hear the beep, because responses come back before the beep comes out.   But that's the most uninteresting problem (for networking environments, the normal case is usually profoundly uninteresting - it's when things start failing that things get exciting)...

And the "beeping death" scenario was no different - it gets interesting when you start looking at the ways that things can fail.

Lets consider some of the failure modes:

    1) Connectivity fails on an intervening network node between the client and the server.

In that case, the client hangs waiting on the network to time out.  This could take several seconds, sometimes even as much as a minute.  Bad, but not the end of the world, because the timeouts within the transport detect the connectivity problem and fail the request.

    2) The client crashes (this IS MS-DOS, we're talking about). 

In this case, the connection is held alive (remember - the actual network transport is running on the UB card, not taking up system memory), and that ties up some resources on the server but it's still not the end of the world (from a clients perspective)

    3) The server computer is really busy.

In this case, the client waits until the server comes back to it.  That may take time and can be really annoying.

    4) The server crashes, or otherwise freezes (breaks into the kernel debugger, etc).

In this case, the server disappears.  If the timing of the request was correct, the servers crash tears down the connections and the clients get networking failures.  If, on the other hand, you're unlucky, the network card might have received the client's request and handed it to the server, but the server hadn't responded to the client.  In that case, there were no outstanding network requests for the client.  Because the transport is sitting running entirely on the network adapter, it has no way of knowing that the host operating system is dead.  The transport just sits there, quietly acknowledging the keep-alive It sits on the network card until the operating system is rebooted.  It can get even more heinous when the server process and is restarted - in that case, the card would sometimes "forget" existing connections until the operating system was rebooted (or power was recycled on the server).

If you happened to be one of the poor clients stuck in this state, they sat there blocked on a synchronous network receive waiting for the frozen server to respond to their request.  Since the server process was gone (or the machine was in the debugger, or...), the client never had an opportunity to detect the failure.  And, since DOS was a single threaded operating system, and the networking requests were executed in the kernel, the user had no choice but to reboot their client.

 

I got indescribable amounts of flack for the beeping death, because it seemed that every time any server crashed anywhere at Microsoft, some set of clients would start beeping forever...  Fortunately, many of the changes I made for DOS Lan Manager 2.0 removed the beeping death (it allowed the client to detect a hung server and tear down the connection to the server even if the underlying network claimed that things were just fine).

 

Networking can be fun :)

  • "the Microsoft corporate network at the time was (and still is, to my knowledge) the largest, most complicated corporate network on the planet. We have branch offices in hundreds of countries, there are hundreds of thousands of computers on the network"

    That's a pretty bold statement. Having worked as a performance engineer at what's now the 2nd largest bank in the U.S., with a fairly thorough worldwide presence, we easily had beyond hundreds of corporate sites. Add to that thousands of branches as well as standalone ATMs, every type of network connection you can come up with (from dial-up for ATMs and card verification to OC-192s), I think I'm safe in saying that I doubt MS' network could compare.
  • You may be right, there may be bigger, more complicated networks out there, but there aren't many.

    Given your description, I think that Microsoft's network is probably on the same order of magnitude in size as yours is.
  • I read an interesting article sometime back, in one of the ACM journals, about using sound to debug software. The researchers had taken some real world programs, and inserted routines to emit sound at various places. So, for eg, if you want to know whether a particular code path is hit without doing a bunch of logging, or hooking up the debugger, just insert a routine there to emit (for eg) a beep. If there is an infinite loop there, the beep will sound continuously.

    Sounded interesting..
  • My friend, who is in the PC video playing programming bussiness, uses "debug beeps" all the time. That's how he catches dropped frames, etc.
  • There was another kind of "beeping death" in Windows 95/98...
    When resources got really low those error messages from the 16 Bit layer started popping up (those with the white background, system font and "Retry", "Ignore" and "Cancel" buttons). On some occasions the system locked up completely and you could hear a clicking noise from the speaker whenever you moved the mouse or pressed a key.

    I wonder how you explain this... ;-)

  • There was another kind of "beeping death" in Windows 95/98...
    When resources got really low those error messages from the 16 Bit layer started popping up (those with the white background, system font and "Retry", "Ignore" and "Cancel" buttons). On some occasions the system locked up completely and you could hear a clicking noise from the speaker whenever you moved the mouse or pressed a key.

    I wonder how you explain this... ;-)

  • When I saw this post's title, I thought it was about the Win 9x scenario, too. Shows how much of a newbie I am. ;)
  • Speaking of "error message beeps", NT has AFAIK always emitted beeps (using beep.sys) whenever the input queue gets full. It was especially easy to provoke this in NT4 (after the move of the handler into kmode :-< ) with even a moderately loaded system, if you just moved the mouse around. Mouse messages would queue up, the system would not drain the queue, and from there on every kind of user input would result in "bleep".

    You can go "bleep" yourself, Windows box! :-)
  • Input queue full, indeed.

    I've just had to disable beep.sys on my brand new Thinkpad X41 because the darn thing would beep (LOUDLY) pretty much every other time you used the mouse or trackpoint to scroll (which the thing couldn't keep up with). Incredibly, incredibly annoying on what's otherwise a very nice product.

    From the posts I've found, there are many Thinkpad users out there with the same problem, though the solution of turning off beep.sys isn't very well known.
  • Here's the beeping death:
    http://support.microsoft.com/kb/187518/EN-US/
  • Just a quick touch on the 'largest intranet'. I don't think Microsoft's had the largest intranet ever, really.

    I thought the military was the first, and Schlumberger is the second (first in terms of commercial).

    http://www.slb.com/content/about/history.asp

    It makes sense, Schlumberger is one of the worlds largest oil and gas companies, and their volume of data that they churn through is much more sizeable then any software corporation's (seismic alone.. Oh my!).

    Anyhow, just thought I'd throw some trivia. :)
Page 1 of 1 (11 items)