Also known as "Larry mounts a DDOS attack against every single machine running Windows NT"
Or: No stupid mistake goes unremembered.
I was recently in the office of a very senior person at Microsoft debugging a problem on his machine. He introduced himself, and commented "We've never met, but I've heard of you. Something about a ping of death?"
Oh. My. Word. People still remember the "ping of death"? Wow. I thought I was long past the ping of death (after all, it's been 15 years), but apparently not. I'm not surprised when people who were involved in the PoD incident remember it (it was pretty spectacular), but to have a very senior person who wasn't even working at the company at the time remember it is not a good thing :).
So, for the record, here's the story of Larry and the Ping of Death.
First I need to describe my development environment at the time (actually, it's pretty much the same as my dev environment today). I had my primary development machine running a version of NT, it was running a kernel debugger connected to my test machine over a serial cable. When my test machine crashed, I would use the kernel debugger on my dev machine to debug it. There was nothing debugging my dev machine, because NT was pretty darned reliable at that point and I didn't need a kernel debugger 99% of the time. In addition, the corporate network wasn't a switched network - as a result, each machine received datagram traffic from every other machine on the network.
Back in that day, I was working on the NT 3.1 browser (I've written about the browser here and here before). As I was working on some diagnostic tools for the browser, I wrote a tool to manually generate some of the packets used by the browser service.
One day, as I was adding some functionality to the tool, my dev machine crashed, and my test machine locked up.
*CRUD*. I can't debug the problem to see what happened because I lost my kernel debugger. Ok, I'll reboot my machines, and hopefully whatever happened will hit again.
The failure didn't hit, so I went back to working on the tool.
And once again, my machine crashed.
At this point, everyone in the offices around me started to get noisy - there was a great deal of cursing going on. What I'd not realized was that every machine had crashed at the same time as my dev machine had crashed. And I do mean EVERY machine. Every single machine in the corporation running Windows NT had crashed. Twice (after allowing just enough time between crashes to allow people to start getting back to work).
I quickly realized that my test application was the cause of the crash, and I isolated my machines from the network and started digging in. I quickly root caused the problem - the broadcast that was sent by my test application was malformed and it exposed a bug in the bowser.sys driver. When the bowser received this packet, it crashed.
I quickly fixed the problem on my machine and added the change to the checkin queue so that it would be in the next day's build.
I then walked around the entire building and personally apologized to every single person on the NT team for causing them to lose hours of work. And 15 years later, I'm still apologizing for that one moment of utter stupidity.
Sorry, I see it is published, just not automatically updated by automated tools. Sorry.
Larry, I have to be honest, I'm glad that Windows Vista shipped with WDS, it seems to be completely stable, quick, and the UI is asynchronous (even when enumerating old NT Browser systems).
The instability and synchronous enumeration of the old browser list caused lots of application freezes on old versions of Windows (e.g. a Save File dialog in an MS Office application when the user wanted to store the file on a server). Some people blamed the network, others blamed their "slow" computer... ;)
I recently saw an oddity on a colleague's PC running Windows XP: network name lookup (i.e. Start > Run > \\servername) had completely stopped working.
When we looked at netdiag /test:winsock /v, it showed that there were a HUGE number of registered NetBT bindings, over 200. This is because he uses the laptop for commissioning Windows Mobile 5.0 devices, i.e. installing software on them then shipping them to the customer. ActiveSync in WM 5.0 is implemented using RNDIS - the device emulates a USB-connected network adapter. Each different device has its own serial number, so USB sees it as a different device. Guess what happens after you've plugged 100 different devices into the computer? You have 100 network adapters, bound to both TCP and UDP. Windows doesn't clean them up because they might eventually come back.
The workaround was to set the DEVMGR_SHOW_NONPRESENT_DEVICES environment variable, launch Device Manager, select View/Show Hidden Devices and delete every one of the 'Windows Mobile-based Device #nnn' devices under Network Adapters. Having done this, file sharing suddenly started working again.
I'd better do this soon on my PC, I'm up to Device #48. Anyone know of an automated way to delete these devices?
(Sorry, Larry, I know it's a bit tangential, is bowser involved in any way?)
Mike: Not to my knowledge. The browser is disabled by default on XP as far as I know.
Here is the story of my own DoS attack.
We have a series of computers we use to do distributed resource builds. These computers take the raw game files (textures, models, etc) and processes them making them ready for the game.
We had just made a series of improvements to improve the performance of the system and released the new software. That night we get a "nice" email from IS saying they shutdown our build servers because they had taken down the phone system.
It turns out that the programs we use to process the data contained diagnostic code that sent around 40-50 UDP broadcast packets every time the program started.
Oh, did I mention that these build computers are all high speed multi-cpu, multi-core computers.
Oh, did I mention that the programs to process the data only take a very short amount of time so they get run a LOT.
Oh, did I mention that these high speed computers were all sitting in the server room on a 1GB network?
I did a lot of apologizing for taking down the company phone system.
I read bowser.sys and thought, "King Koopa has now invaded my OS kernel! All hope is lost!"
@Mike: One thing to try is to add a registry key to:
with Value name: IgnoreHWSerNumVVVVPPPP and Value DWORD:0x1
Where VVVV = USB Vendor ID in Hex
PPPP = USB Product ID in Hex
This key prevents the USB layer from creating individual per serial number nodes under HKLM\System\CCS\Enum\USB\. You will have to reboot after this change. Note that the Found New HW Wizard will no longer prompt you for the driver for each newly found device after this change.
I'm not sure about the exact scenario that you're describing, but if the mechanism relies on the USB serial number (as opposed to the MAC address in the USB network adapter) it might help. (Our HW has a USB serial number, and in production testing, the registry quickly fills up with the \Enum\USB nodes for each device connected if you do not use this key...)
Larry: Sorry for the totally-off-topic.
I hope that it's just good natured ribbing. After all, most developers probably wouldn't have know what was happening and just continued. Someone from IT would have had the unenviable task of tracking down the source of the disruption. Now -that- would have been embarrassing.
The way brains focus in on the task at hand, it's not surprising that you didn't catch it the first time. You have to step out of the box you're in and change your context.
Again, as long as it's good natured, it's fine to keep bringing it up though. That's what good friends are for. ;-)
Matt, Glad I wasn't the only one thinking Mario Bros.
OK, if we're discussing our own DoS tales, I'll tell mine.
The first time I configured a corporate intranet, I made two DNS servers query each other first and then query the ISP. So if one of them received a query from an ordinary client, then a chain reaction started with each server querying the other back and forth and both of them sending queries to the ISP until they finally got an answer back. After a while things settled down. When I figured out what was happening, first I fixed it, and then I asked the ISP if maybe the reason why things settled down might be that they blacklisted us. They said no, they hadn't observed any problem. Whew. Anyway it lasted less than an hour and I figured out a less recursive configuration.
Sounds like some kind of epic adventure inside Microsoft:
Deep in the bowels of Microsoft is a lone programmer, sparring with a particularly merciless code fault. Long ago the daylight had forsaken him; the cold night was without stars and moon; he slowly began to sink into the dreary gloom of despair. His mood worsened towards the brink of failure.
As the night wore on, a minstrel came forward and proclaimed, "I will sing to you of Larry of the Third NT, and the Ping of Death."
And when he heard that he laughed aloud for sheer delight, and he stood up and cried "O great glory and splendour! And all my wishes have come true!" and then he wept.
Heh, that sounds like something that happened to me back in high school. Only it might not have been an accident. It might have been a Perl script, running on a secret Linux server, iterating over the school's IP range. It might have been pinging each address with a malformed packet and it may have bluescreened every Windows 9x computer in the school. However, that is just wild speculation on my part. Nothing that I know anything about.
I still remember while testing an early version (a beta) of Operations Manager (which later became MOM and now is OpsMgr - but it was still missioncritical software's at that time) that had a bug: instead than notifying the network Administrator with a NET SEND, it would notify EVERY SINGLE USER in the domain. So, testing it on the production environment it did flood everybody in the company with Alert popups.... OK, it did not actually crash anything, but still... the CEO of the company I was working at did not quite like that too...
heh! i brought down my corporate network one day, crashing every Win 3.1 machine... probably 30 or so people.
We had BNC cabling (was that 10-baseT? I forget) configured as a ring with every machine on it... I was playing with a screwdriver in my machine (putting an 8 port serial card in) and accidentally shorted the network... Immediate swearing including the ferociously bad tempered and intimidating CEO (at the time, I was 21) who stormed out of his office swearing "Who the @#$% did that! What the $^%^ caused that".
He then saw me with screwdriver in hand... "Was that you? Do you know how much %^&ing work I've lost?" Fortunately another guy I worked with, who i hadn't liked very much until that point said, "Nope, wasn't him, must've been the Novell server crashing. It does that sometimes."
Ah, fond memories!