Larry Osterman's WebLog

Confessions of an Old Fogey
  • Larry Osterman's WebLog

    Farewell to one of the great ones

    Yesterday was the last day at Microsoft for David WeiseI've written about David (in passing) in the past, but never in detail.

    David started at Microsoft in 1986, when Microsoft acquired Dynamical Systems Research.  Before founding DSR, he was a member of the ORIGINAL MIT blackjack team - not the latecomers that you see in all the movies, but the original team, back in the 1970s.  According to Daniel Weise (David's twin brother), they ran it like an investment company - MIT people could invest money in the blackjack team, and the blackjack team would divide their winnings up among them.  Apparently RMS was one of the original investors, during David's going away party, Daniel joked that the FSF was founded on David's blackjack winnings :)

    After leaving Princeton with a PhD in molecular biophysics, David, Chuck Whitmer, Nathan and Cameron Myhrvold, and a few others founded DSR to create a "Topview" clone.  For those new to the industry, Topview was a text based multitasking shell that IBM created that was going to totally revolutionize the PC industry - it would wrest control of the platform from Microsoft and allow IBM to maintain its rightful place as leader of the PC industry.  Unfortunately for IBM, it was an utter flop.

    And, as Daniel pointed out, it was unfortunate for DSR.  Even though their product was twice as fast as IBMs and 2/3rds the size, when you base your business model on being a clone of a flop, you've got a problem.

    Fortunately, at the time, Microsoft was also worried about Topview, and they were looking for a company that understood the Topview environment so that if it was successful, Microsoft would have the ability to integrate Topview support into Windows.

    Finding DSR may have been one of the best acquisitions that Microsoft ever made.  Not only did they find the future CTO (and founder of Microsoft Research) Nathan Myhrvold, but they also hired David Weise.

    You see, the DSR guys were wizards, and David was a wizard's wizard.  He looks at programs and makes them smaller and faster.  It's absolutely magical to watch him at his work.

    I (and others) believe that David is single handedly responsible for making Microsoft over a billion dollars.  He's also (IMHO) the person who is most responsible for the success of Windows 3.0.

    Everywhere David worked, he dramatically improved the quality of the product.  He worked on the OS/2 graphics drivers and they got faster and smarter.  He (and Chuck) figured out tricks that even the designers of the hardware didn't realize could be done.

    And eventually, David found himself in the Windows group with Aaron Reynolds, and Ralph Lipe (and several others).

    Davids job was to move the graphics drivers in windows into protected mode on 286 and better processors (to free up precious memory below 640K for Windows applications).  He (and Chuck) had already figured out how to get normal Windows applications to use expanded memory for their code and data, but now he was tackling a harder  problem - the protected mode environment is subtler than expanded memory - if you touched memory that wasn't yours, you'd crash.

    David succeeded (of course).  But David, being David, didn't stop with the graphics drivers.

    He (along with Murray Sargent, creator of the SST debugger) also figured out how to get normal Windows applications running in protected mode.

    Which totally and utterly and irrevocably blew apart the 640K memory barrier.

    I remember wandering over to the Windows group over in Building 3 to talk to Aaron Reynolds about something to do with the MS-DOS redirector (I was working on DOS Lan Manager at the time).  I ran into David, and he called me into his office "Hey, look at what I've got working!".

    He showed me existing windows apps running in protected mode on the 286.  UNMODIFIED Windows 1.0 applications running in protected mode.

    He then ran me around the rest of the group, and they showed me the other stuff they were working on.  Ralph had written a new driver architecture called VxD.  Aaron had done something astonishing (I'm not sure what).  They had display drivers that could display 256 color bitmaps on the screen (the best OS/2 could do at the time was 16 colors).

    My jaw was dropping lower and lower as I moved from office to office.  "Oh my goodness, you can't let Steve see this, he's going to pitch a fit" (those aren't quite the words I used, but this is a family blog).

    You see, at this time, Microsoft's systems division was 100% focused on OS/2 1.1.  All of the efforts of the systems division were totally invested in OS/2 development.  We had invested literally tens of millions of dollars on OS/2, because we knew that it was the future for Microsoft.  OS/2 at the time just ran a single DOS application at a time, and it had only just recently gotten a GUI (in 1989).  It didn't have support for many printers (only about 5, all made by IBM, and (I believe) the HP Laserjet).

    And here was this little skunkworks project in building three that was sitting on what was clearly the most explosive product Microsoft had ever produced.  It was blindingly obvious, even at that early date - Windows 3.0 ran multiple DOS applications in virtual x86 machines.  It ran Windows applications in protected mode, breaking the 640K memory barrier.  It had a device driver model that allowed for development of true 32bit device drivers.  It supported modern displays with color depths greater than had been available on PC operating systems. 

    There was just no comparison between the two platforms - if they had to compete head-to-head, Windows 3.0 would win hands down.

    Btw, David had discussed it with Steve (I just learned that yesterday).  As David put it, he realized that this was potentially an issue, so he went to Steve, and told him about it.  Steve asked Dave to let him know when he'd made progress.  That night, David was up until 5AM working on the code, he got it working, and he'd left it running on his machine.  He left a note on SteveB's office door saying that he should stop by David's office.  When David got in the next day (at around 8AM), he saw that his machine had crashed, so he knew that Steve had come by and seen it.

    He went to Steve's office, and they had a chat.  Steve's only comment was that David should tell his manager and his manager's manager so that they'd not be surprised at the product review that was going to happen later that day.  At the product review, Steve and Bill greenlighted the Windows 3.0 project, and the rest was history.  My tour was apparently a couple of days after that - it was finally ok to let people know what the Windows 3.0 team was doing.

    The rest was history.  At its release, Windows 3.0 was the most successful software project in history, selling more than 10 million copies a month, and it's directly responsible for Microsoft being where it is today.

    And, as I mentioned above, David is responsible for most of that success - if Windows 3.0 hadn't run Windows apps in protected mode, then it wouldn't have been the unmitigated success it was.

    David's spent the last several years working in linguistics - speech generation, etc.  He was made a distinguished engineer back in 2002, in recognition of his contribution to the industry. The title of Distinguished Engineer is the title to which all Microsoft developers aspire, it is literally the pinnacle of a developers career at Microsoft when they're named DE - other DE's include Dave Cutler, Butler Lampson, Jim Gray, Anders Hejlsberg.  This is unbelievably rarified company - these are the people who have literally changed the world.

    And David absolutely belongs in their company.

    David's leaving to learn more about the state of molecular biology today, he wants to finally be able to use his PhD, the field has changed so much since he left it, and it's amazing what's happening in it these days.

    As I said as I was leaving his goodbye party:

    "Congratulations, good luck, and, from the bottom of my heart, thank you".

    Bonne Chance David, I wish you all the best.  When you get your Nobel Prize, I'll be able to say "I knew him back when he worked at Microsoft".


    Edit: Corrected David's PhD info based on Peter Woit's blog post here.  Sorry David, and thanks Peter.

    Edit2: Grey->Gray :)  Thanks Jay

  • Larry Osterman's WebLog

    How did we make the DOS redirector take up only 256 bytes of memory?

    In one of my early posts, I mentioned a status review we had with BillG for the DOS Lan Manager redirector (network filesystem).

    I also talked to Robert Scoble about this in the last of my Channel9 videos.  One thing that somehow got missed in both the original article (later updated) and the video was our reaction to Bill's feedback.

    The simple answer is that we fixed the problem.  My team didn't do much with the transports and network drivers (because they were out of our scope), but we were able to do something about the footprint of the redir.exe program (it was a T&SR application).

    When we were done with it, I managed to shrink the below 640K running size of redirector to 128 bytes in size, beyond which I couldn't figure out how to go.

    The question that obviously comes up is: How did you manage to do that?  Raymond, please forgive me for what I'm about to disclose, for within this tale lie dragons.  This discussion is for historical purposes ONLY.  I don't recommend it as a practice.

    The MS-DOS redirector was actually originally written as a part of the MSDOS.SYS (IBMDOS.COM) binary.  For obvious reasons (not every user in the world had a network card, especially in 1984), the redirector was split out from the DOS binary after the product shipped.  In fact, when I took over the redirector project, the binary used to link with hundreds of unresolved external symbol errors (because the redirector linked with some, but not all of the MS-DOS binaries).  One of the first things that I did while working on the project was to clean this up so that the redirector would cleanly link without relying on the MS-DOS objects.  But being a part of MS-DOS, it was written in "tiny" mode - the code and data were commingled internally.

    The first thing I did when trying to shrink the footprint of the redirector was to separate the code and data segments.  Today, this seems utterly obvious, but in 1986, it was a relatively radical idea, especially for real-mode software.  Once I had split the data and code, I was able to make the data segment relocatable.  This change was critical, because it enabled us to do a boatload of things to reduce our footprint.  One thing to keep in mind about the redirector was that even though the data (and eventually code) was relocatable, the motion wasn't dynamic.

    The first thing I did was to lay the redirector's code and data as follows:


    Initialization Code


    Initialization Data/Dynamic data (allocated after startup)

    By laying out the code and data this way, I could slide the data over the initialization code after the initialization was done.  It wasn't that much of a real savings, however, since the original redirector simply started the dynamic data at the start of the initialization code (and left it uninitialized).

    The next thing to do was to take advantage of a quirk in the 286 processor.  The 8086 could only address one megabyte of memory, all the memory above 640K was reserved for system ROMs. (A quick aside: DOS could (and did) take advantage of more than 640K of RAM - DOS could address up to 1M of RAM, all the processor could support if it wasn't for the system ROMs.  In particular, there were several 3rd party memory cards that allowed mapping memory between 640K and 0xB0000, which was the start of video memory).  With the addition of the 286 processor, the machine could finally address more than 1M of RAM.  It turns out that if the machine had more than 640K of RAM, most systems mapped the memory above 640K to above 1M.  Unfortunately, there were a number ofapplications that depended on the fact that the 8086 could only address 1M of RAM, and performed arithmetic that assumed that physical address 0xFFFF0+0x30=0x000020.  To control this, the PC/AT and its successors defined a software controllable pin called the "A20 line" - if it was disabled , memory access between 1M and 1M+64K was redirected to 0, if it was enabled , then it was mapped to real memory.  This is really complicated, but the effect was that if you enabled the A20 line, an application could have access to 64K of additional memory that didn't impact any running MS-DOS applications!  This 64K was known as the "High Memory Area", or HMA.

    Because the powers that be knew that this would be a highly contentious piece of real estate (everyone would want to party on it), Microsoft (or Intel, or IBM, I'm not sure who) wrote a specification and a driver called HIMEM.SYS.  HIMEM.SYS's purpose was to arbitrate access to that 64K chunk of RAM.

    Well, for the DOS Lanman redirector, we wanted to use that area, so if we were able to reserve the region via himem.sys, we moved the data (both dynamic and static) up to that memory.  On every entry to the redirector, we enabled the A20 line (via himem.sys), and on every exit, we disabled the A20 line.

    That saved about 30K of the 60K MS-DOS footprint, so far so good. 

    The next step in the process was to remove our dependencies on himem.sys.  Around this time, Lotus, Intel and Microsoft had defined a specification for an expanded memory manager, known as LIM.  This allowed a 3rd party memory card to bank swap memory into the 0xA0000->0xFFFFF memory region.  Marlin Eller joined the team about that time, and he wrote the code to move the data segment for the DOS redirector into LIM (if himem.sys wasn't available, and LIM was).  After finishing that work, he moved on to other projects within Microsoft.  That's where things stood for Lan Manager 1.5, the data had been removed, but nothing else.  A HUGE improvement, but we weren't satisfied.

    So far, we were just moving the data around, we hadn't done anything to deal with the 30K or so of code.

    The next thing we did was to split the redirector up still further:

    "Low" code
    "Low" data


    Initialization Code


    Initialization Data/Dynamic data (allocated after startup)

    We added a low code and data segment.  The "low" code segment contained all the external hooks into the redirector (interrupt handlers, etc), and code to enable the HMA and LIM segments.  We then moved the data into LIM memory, and the code into the HMA.  This was a bit trickier, but we managed.

    So we now had a low code segment that was about 2K or so, and the code and data was moved up out of the 640K boundary.  Normally, I'd be satisfied with this, but I love a challenge.

    The next step was to look long and hard at the low code.  It turns out that most of the low code didn't really NEED to be low, it was just convenient.  Since the code had been moved into the HMA, all I needed to do was to have a low-memory stub with enough code to enable the HMA, and dispatch to the corresponding function in high memory.

    The other thing I realized was that the MS-DOS PSP (Program Segment Prefix, the equivalent of a task in MS-DOS) contained 128 bytes of OS stuff, and 128 bytes of command line (this is where Raymond starts cringing).  Since the redirector didn't use the command line, I figured I could re-use that 128 bytes of memory for my stub to enable the high memory area.  And that's what I did - I used the 128ish bytes of command line to hold the interrupt dispatch routines for all the entrypoints to the redirector (there were about 4 of them), and pointers to the corresponding routines in the high memory area, and the code to enable (and disable) the HMA.

    And voila, I had a 0 footprint redirector.  The only negative that came from this was that applications that enumerated the "running" processes didn't handle the "code-in-the-command-line" thing.

    Btw, the work I did here was pretty much totally clean.  I used the linker to define the segments that were relocated, I didn't do any of the other sleazy things that MS-DOS programmers did to make their code small (like combining multiple instructions together relying on the relative offset of a jump instruction to form the first byte of a different instruction).  It was actually a pretty cool piece of work.

    Oh, and this description doesn't really give the full flavor of what had to be done to get this to work.  A simple example: Because I had to handle moving the data over the code that was performing the move - that meant that I need to first move the initialization code out of the way (past the end of the data), jump to the moved initialization code, move the data over the original initialization code, then terminate the application.

    But we eventually (for Lan Manager 2.2) had a 0 footprint redirector.  It took some time, and it didn't work for every configuration, but we DID make it work.


  • Larry Osterman's WebLog

    Transfering a pointer across processes

    I seem to be "Riffing on Raymond" more and more these days, I'm not sure why, but..

    Raymond Chen's post today on the type model for Win64 got me to thinking about one comment he made in particular:

    Notice that in these inter-process communication scenarios, we don't have to worry as much about the effect of a changed pointer size. Nobody in their right mind would transfer a pointer across processes: Separate address spaces mean that the pointer value is useless in any process other than the one that generated it, so why share it?

    Actually, there IS a really good reason for sharing handles across processes.  And the Win64 team realized that and built it into the product (both the base team and the RPC team).  Sometimes you want to allocate a handle in one process, but use that handle in another.  The most common case where this occurs is inheritance - when you allocate an inheritable handle in one process, then spawn a child process, that handle is created in the child process as well.  So if a WIn64 process spawns a Win32 process, all the inheritable handles in the Win64 process will be duplicated into the Win32 process.

    In addition, there are sometimes reasons why you'd want to duplicate a handle from your process into another process.  This is why the DuplicateHandle API has an hTargetProcessHandle parameter.  One example of this is if you want to use a shared memory region between two processes.  One way of doing this would be to use a named shared memory region, and have the client open it.  But another is to have one process open the shared memory region, duplicate the handle to the shared memory region into the other process, then tell the other process about the new handle.

    In both of these cases (inheritable handles and DuplicateHandle), if the source process is a 64bit process and the target process is a 32bit process, then the resulting handle is appropriately sized to work in the 32bit process (the reverse also holds, of course)

    So we've established that there might be a reason to move a handle from one process to another.  And now, the RPC team's part of the solution comes into play.

    RPC (and by proxy DCOM) defines a data type call __int32644.  An int3264 is functionally equivalent to the Win32 DWORD_PTR (and, in fact, the DWORD_PTR type is declared as an __int3264 when compiled for MIDL).

    An __int3264 value is an integer that's large enough to hold a pointer on the current platform.  For Win32, it's a 32 bit value, for Win64, it's a 64 bit value.  When you pass an __int3264 value from one process to another it either gets truncated or extended (either signed or unsigned)..

    __int3264 values are passed on the wire as 32bit quantities (for backwards compatibility reasons).

    So you can allocate a block of shared memory in one process, force dup the handle into another process, and return that new handle to the client in an RPC call.  And it all happens automagically.

    Btw, one caveat: In the current platform SDK, the HANDLE_PTR type is NOT RPC'able across byte sizes - it's a 32bit value on 32bit platforms and a 64bit value on 64bit platforms, and it does NOT change size (like DWORD_PTR values do).  The SDK documentation on process interoperability is mostly correct, but somewhat misleading in this aspect. It says "The 64-bit HANDLE_PTR is 64 bytes on the wire (not truncated) and thus does not need mapping" - I'm not going to discuss the "64 bytes on the wire" part, but most importantly it doesn't indicate that the 32-bit HANDLE_PTR is 32 bits on the wire.

    Edit: Removed HTML error that was disabling comments...


  • Larry Osterman's WebLog

    Audio in Vista, the big picture


    So I've talked a bit about some of the details of the Vista audio architecture, but I figure a picture's worth a bunch of text, so here's a simple version of the audio architecture:

    This picture is for "shared" mode, I'll talk about exclusive mode in a future post.

    The picture looks complicated, but in reality it isn't.  There are a boatload of new constructs to discuss here, so bear with me a bit.

    The flow of audio samples through the audio engine is represented by the arrows - data flows from the application, to the right in this example.

    The first thing to notice is that once the audio leaves the application, it flows through a very simple graph - the topology is quite straightforward, but it's a graph nonetheless, and I tend to refer to samples as moving through the graph.

    Starting from the left, the audio system introduces the concept of an "audio session".  An audio session is essentially a container for audio streams, in general there is only one session per process, although this isn't strictly true.

    Next, we have the application that's playing audio.  The application (using WASAPI) renders audio to a "Cross Process Transport".  The CPT's job is to get the audio samples to the audio engine running in the Windows Audio service.

    In general, the terminal nodes in the graph are transports, there are three transports that ship with Vista, the cross process transport I mentioned above, a "Kernel Streaming" transport (used for rendering audio to a local audio adapter), and an "RDP Transport" (used for rendering audio over a Remote Desktop Connection). 

    As the audio samples flow from the cross process transport to the kernel streaming transport, they pass through a series of Audio Processing Objects, or APOs.  APOs are used to provide DSP on the audio samples.  Some examples of the APOs shipped in Vista are:

    • Volume - The volume APO provides mute and gain control.
    • Format Conversion - The format converter APOs (there are several) provide data format conversion - int to float32, float32 to int, etc.
    • Mixer - The mixer APO mixes multiple audio streams
    • Meter - The meter APO remembers the peak and RMS values of the audio samples pumped through it.
    • Limiter - The limiter APO prevents audio samples from clipping when rendering.

    All of the code above runs in user mode except for the audio driver at the very end.

  • Larry Osterman's WebLog

    Why does Windows share the root of your drive?


    Out-of-the box, a Windows system automatically shares the root of every hard drive on the machine as <drive>$ (so you get C$, D$, A$, etc).

    The shares are ACL'ed so that only members of the local administrative group can access them, and they're hidden from the normal enumeration UI (they're included in the enumeration APIs but not in the UI (as are all shares with a trailing $ in their name).

    One question that came up yesterday was why Windows does this in the first place.

    The answer is steeped in history.  It goes way back to the days of Lan Manager 1.0, and is a great example of how using your own dogfood helps create better products.

    Lan Manager was Microsoft's first attempt at competing directly with Novell in networking.  Up until that point, Microsoft produced an OEM-only networking product called MS-NET (I have a copy of the OEM adaptation kit for MS-NET 1.1 in my office - it was the first product I ever shipped at Microsoft).

    But Lan Manager was intended as a full solution.  It had a full complement of APIs to support administration, supported centralized authentication, etc.

    One of the key features for Lan Manager was, of course, remote administration.  The server admin could sit in their office and perform any administrative tasks they wanted to on the computer.

    This worked great - the product was totally living up to our expectations...

    Until the day that the development lead for Lan Manager (Russ (Ralph) Ryan) needed to change a config file on the LanMan server that hosted the source code for the Lan Manager product.  And he realized that none of the file shares on the machine allowed access to the root directory of the server!  He couldn't add a new share remotely, because the UI for adding file shares required that you navigate through a tree view of the disk - and since the root wasn't shared, he could only add shares that lived under the directories that were already shared.

    So he had to trudge from his office to the lab and make the config change to the server.

    And thus a new feature was born - by default, Lan Manager (and all MS networking products to this day) shares the root of the drives automatically to ensure that remote administrators have the ability to access the entire drive.   And we'd probably have never noticed it unless we were dogfooding our products.

    Nowadays, with RDP and other more enhanced remote administration tools, it's less critical, but there are a boatload of products that rely on the feature.

    Note1: You can disable the automatic creation of these shares by going to this KB article.

    Note2: The test lead for the Lan Manager product was a new hire, fresh from working at Intel who went by the name of Henry (Brian) Valentine.

  • Larry Osterman's WebLog

    Did you know that OS/2 wasn't Microsoft's first non Unix multi-tasking operating system?


     Most people know about Microsoft’s official timeline for its operating-system like products

    1.      Xenix - Microsoft’s first operating system, which was a version of UNIX that we did for microprocessors. 

    2.      MS-DOS/PC-DOS, a 16 bit operating system for the 8086 CPU

    3.      Windows (not really an operating system, but it belongs in the timeline).

    4.      OS/2, a 16 bit operating system written in joint development with IBM.

    5.      Windows NT, a 32 bit operating system for the Intel i386 processor, the Mips R8800 and the DEC Alpha

    But most people don’t know about Microsoft’s other multitasking operating system, MS-DOS 4.0 (not to be confused with PC-DOS 4.0)

    MS-DOS 4.0 was actually a version of MS-DOS 2.0 that was written in parallel with MS-DOS 3.x (DOS 3.x shipped while DOS 4 was under development, which is why it skipped a version).

    DOS 4 was a preemptive real-mode multitasking operating system for the 8086 family of processors.  It had a boatload of cool features, including movable and discardable code segments, movable data segments (the Windows memory manager was a version of the DOS 4 memory manager).  It had the ability to switch screens dynamically – it would capture the foreground screen contents, save it away and switch to a new window.

    Bottom line: DOS 4 was an amazing product.  In fact, for many years (up until Windows NT was stable), one of the DOS 4 developers continued to use DOS 4 on his desktop machine as his only operating system.

    We really wanted to turn DOS 4 into a commercial version of DOS, but...   Microsoft at the time was a 100% OEM shop – we didn’t sell operating systems, we sold operating systems to hardware vendors who sold operating systems with their hardware.  And in general the way the market worked in 1985 was that no computer manufacturer was interested in a version of DOS if IBM wasn’t interested.  And IBM wasn’t interested in DOS.  They liked the idea of multitasking however, and they were very interested in working with that – in fact, one of their major new products was a product called “TopView”, which was a character mode window manager much like Windows.  The wanted an operating system that had most of the capabilities of DOS 4, but that ran in protected mode on the 286 processor.  So IBM and Microsoft formed the Joint Development Program that shared development resources between the two companies.  And the DOS 4 team went on to be the core of Microsoft’s OS/2 team.

    But what about DOS 4?  It turns out that there WERE a couple of OEMs that had bought DOS 4, and Microsoft was contractually required to provide the operating system to them.  So a skeleton crew was left behind to work on DOS and to finish it to the point where the existing DOS OEM’s were satisfied with it.


    Edit: To fix the title which somehow got messed up.


  • Larry Osterman's WebLog

    How do I open ports in the Windows Firewall?


    One of the side-projects I recently was assigned to work on was to switch the Windows Media Connect project from using the home-brewed HTTP server that was originally coded for the product, to using HTTP.SYS, which is included in XP SP2.  This was as a part of a company-wide initiative to remove all home-brewed HTTP servers (and there were several) and replace them with a single server.  The thinking was that having a half dozen HTTP servers in the system was a bad idea, because each of them was a potential attack vector.  Now with a single server, we have the ability to roll out fixes in a single common location.

    The HTTP.SYS work was fascinating, and I’ll probably write about it more over time, but I wanted to concentrate on a single aspect of the problem.

    I got the server working relatively quickly until we picked up a new version of XP SP2.  That one featured additional improvements to the firewall, and all of a sudden, the remote devices couldn’t retrieve content from the web server.  The requests weren’t getting to our service at all.  What was weird was that they WERE getting the content directory (the names of the files on the machine) but when they tried to retrieve them, they failed. 

    Well, we had suspected that this was going to happen; the new build of SP2 moved HTTP.SYS behind the firewall (it had been in front of the firewall previously).  So now we needed to open a hole in the firewall for our process, the UPnP hosting service had already opened their port, that's why the content directory was available.  Over the next several posts, I’ll go through the process that I went through do discover how to do this.  Everything I needed was documented, but it wasn’t always obvious. 

    The first thing we had to deal with was the fact that we only wanted to open the firewall on local subnet addresses.  To prevent users’ multimedia content from going outside their home, WMC will only accept connections from IP addresses that are in the private network IP address range (192.168.x.x) and the AutoIP address range (169.254.x.x).  We also open up the local address ranges of 10.x.x.x and 172.16.x.x (with a netmask of 0xff, 0xf0, 0, 0) .  So we only wanted to open the firewall on private IP addresses,  it would be a “bad” thing if we opened the WMC port to public addresses, since that could potentially be used as an attack vector.

    The Windows firewall has been documented since Windows XP, the first MSDN hit for “internet connection firewall” returns this page that documents the API.  For XP SP2, there’s a new firewall API, if you use an MSDN search for “firewall API” the first hit is this page which describes the XP SP2 firewall API in great detail.  For a number of reasons (chief among which was that when I wrote the code the firewall API hadn’t been published), my implementation uses the original firewall API that’s existed since Windows XP.  As a result, my code and the techniques I’ve described in the next couple of posts work should work just fine on Windows XP as well as working on XP SP2.

    Anyway, on with the story.  So, as always, I started with the API documentation.  After groveling through the API for a while, I realized I was going to need to use the INetSharingConfiguration interface’s AddPortMapping API to add the port.  I’d want to use the INetSharingConfiguration API on each of the IP addresses that WMC was using.

    So to add a mapping for my port, I simply called INetSharingConfiguration::AddPortMapping specifying a name (in my case I used the URL for the IP address), internal and external port (the same in my case), and a string with the local IP address.  That API returned an INetSharingPortMapping object, which we have to Enable to make it effective.

    Tomorrow: How do we get the INetSharingConfiguration?

    Edit: Clarified IP addresses used for WMC after further investigation.

    Edit: Updated link


  • Larry Osterman's WebLog

    Breaking Up (shared services) Is(n't) Hard To Do

    The last time I wrote, I talked about shared services. One of the problems of working with shared services is that sometimes one service in the process gets in the way of other services.

    For the audio service, it lives in the "networking services" service host (because the networking services svchost is used for all services that run as LocalSystem).  But, because it runs in the same process as the networking functionality, it means that it can be quite difficult to debug the audio service, especially if you're using a source level debugger - if your debugger has to talk to the network, and portions of the networking stack are suspended (by the debugger) it can be hard to make things work...

    It turns out that there's a remarkably clever trick that can be used to split a normally shared service into its own process. 

    From a Windows command prompt, simply type:

    C:\>sc config <servicename> type= own

    To move the service back into its normal shared config, type:

    C:\>sc config <servicename> type= share

    The SC tool should be in the system32 directory on all XP installations, if not, it's in the platform SDK (I believe), and obviously you need to be an administrator to make this work.

    I can't take credit for this, it was shown to me by one of the NT perf guys, but I like it sufficiently that it's worth sharing.


    One more caveat: Before people start trying this on their XP system, please note that there's a reason that those services are in the same process.  Splitting them up will cause your system to use a LOT more memory, and WILL make your system unstable.  I'm posting this trick because it can be quite useful for people who are developing their own shared services.

    Several of the built-in services assume that they're in the same address space as other services, and if they start running in separate processes, they will crash in strange and mysterious ways, if you're not careful, you can render your machine unbootable.

    Just don't go there, it wouldn't be prudent.

  • Larry Osterman's WebLog

    Mirra, first impressions


    We've currently got something like 7 computers currently in use in my house these days, and I've been looking for a centralized backup solution for the home for a while.

    Eventually, I settled on a Mirra - a small form-factor appliance-like backup server.  It comes in four sizes, 80GB, 160GB, 250GB, and 400GB.  I ordered the 400GB based on the amount of stuff we've got saved on the various machines that will have to be backed up.

    I've not yet had a chance to use all the features of the product (in particular, I've not used the remote access functionality), but I  did set it up and get it running on two of the machines over the weekend.

    I have to say that I'm impressed.  IMHO, these guys have been taking lessons from Apple in terms of out-of-box experience (Personally, I think that Apple does OOBE better than any other PC hardware company).

    You open the Mirra box, and you see a cardboard inset, with a folded card-stock flyer and the power cord and an ethernet cord.

    On the cover of the flyer, are the words "Welcome to Mirra".  You open it up, and it unfolds into a four page story telling you that you're about to enter into a new world where you don't have to worry about your data.  On the back of each of the four pages is one of the four steps to setting up the Mirra - the first tab has you plugging in the appliance (you need to plug it into AC and into an ethernet port), the second tab has you installing the software on your PC, the third has you configuring the PC, the fourth is "Relax".

    I LOVED this experience - it's exactly the balance that computer-related appliance should strike - simple instructions, clearly spelled out, easy for Abby to get right.  The actual Mirra device is a small form-factor PC, I didn't crack the case to see what was running inside it, but it's got video, keyboard, mouse, and USB ports on the case (the video and USB are covered over with plastic).  The small form-factor PC is perfect for "plug it in and forget about it".

    I had some difficulties getting the software installed on the first machine I tried, it didn't recognize the firewall I was running (Microsoft One-Care Beta1), and I had to manually configure it.  On the other hand, the manufacturers web site was extremely helpful getting past this hurdle, and once set up, it immediately started copying files.

    I next went to each of the four accounts on that machine and set the software up on each of them.  It worked seamlessly for all four accounts, including all the limited user accounts.  This alone impressed the heck out of me - there aren't that many software products out there that consider the FUS (fast user switching) and LUA scenarios, but clearly the Mirra guys had.

    I then went upstairs to my computer, and installed it.  This machine doesn't have One-Care installed on it, and the Mirra detected the XP SP2 firewall and opened the relevant ports in the firewall (the firewall is enabled and I didn't need to do anything about it).  The machine then started hammering our home network copying off all the files on the machine.

    I still need to get it installed on the kids computers, that'll be interesting since the kids computers don't have access to the internet.

    The Mirra backup software runs as two services, running in two processes (I'm not sure why, since both services run at localsystem).  However, once configured, the Mirra backup software will run without requiring any process in the user's session.  If I was doing it, I'd have used just one process to run both services, but...

    As I commented to Valorie "This is the software I would have designed".  I was utterly impressed that they seem to have nailed several critical scenarios that are usually overlooked.

    One negative (not relevant to me, but probably to others) is that this is a Windows-only product - they don't seem to have Mac or Linux clients.

    In general, though, I'm pretty impressed.


  • Larry Osterman's WebLog

    APIs you never heard of - the Timer APIs


    It's time for another "APIs you never heard of" article :)

    This time, I'd like to talk about the time* APIs.

    The time* APIs are a set of 7 APIs built into the windows multimedia extensions (winmm.dll).  They provide a rudimentary set of timer functions for Windows applications.  At this point, except for two of the APIs, they exist only for historical purposes, the core OS now provides significantly higher quality APIs for timers.

    The time APIs fall into three rough categories:

    1. Timer information (timeGetDevCaps, timeGetTime and timeGetSystemTime)
    2. Timer callback functions (timeSetEvent and timeKillEvent)
    3. Timer frequency functions (timeBeginPeriod and timeEndPeriod)

    The first two categories are obsolete (arguably timeGetDevCaps still has valid uses).  The timeGetTime API is effectively identical to the GetTickCount() API, and timeGetSystemTime simply returns the exact same value that timeGetTime would have returned, packed into a MMTIME structure. 

    The timeSetEvent and timeKillEvent have been replaced with the Win32 Timer Queue functions, I'm not sure if I know of any reason to ever call the MME versions of these functions :).  In fact, timeSetEvent will call PulseEvent API, which is fundamentally flawed.  There is one difference between timeSetEvent and the Win32 timer queue functions - timeSetEvent will call timeBeginPeriod to set the timer resolution to the resolution specified in the call to timeSetEvent.  Even with this, you're better off calling timeBeginPeriod and using the Win32 Timer Queue functions (because the Win32 timer queue functions are far more flexible). 

    But then there's the timeBeginPeriod and timeEndPeriod APIs.  These are actually fun APIs, especially in the multimedia or gaming space, because they allow you to change the resolution of the internal scheduler, which can lower (or raise) the resolution with which the internal clock runs.

    This has a number of side effects - it increases the responsiveness of the system to periodic events (when event timeouts occur at a higher resolution, they expire closer to their intended time).  But that increased responsiveness comes at a cost - since the system scheduler is running more often, the system spends more time scheduling tasks, context switching, etc.  This can ultimately reduce overall system performance, since every clock cycle the system is processing "system stuff" is a clock cycle that isn't being spent running your application.  For some multimedia applications (video, for example) the increased system responsiveness is worth the system overhead (for instance, if you're interested in very low latency audio or video, you need the system timers to run at a high frequency).

    Edit: Added comment about timeSetEvent calling timeBeginPeriod to set the resolution.\

    Edit2 (years later): Updated the link to GetTickCount...

  • Larry Osterman's WebLog

    How do I divide fractions?


    Valorie works as a teacher's aid in a 6th grade classroom at a local elementary school.

    They've been working on dividing fractions recently, and she spent about two hours yesterday working with one student trying to explain exactly how division of fractions works.

    So I figured I'd toss it out to the blogsphere to see what people's answers are.  How do you explain to a 6th grader that 1/2 divided by 1/4 is 2? 

    Please note that it's not sufficient to say: Division is the same as multiplication by the inverse, so when you divide two fractions, you take the second one, invert it, and multiply.  That's stating division of fractions as an axiom, and not a reason.

    In this case in particular, the teacher wants the students to be able to graphically show how it works.

    I can do this with addition and subtraction of numbers (both positive and negative) using positions on a number line. Similarly, I can do multiplication of fractions graphically - you have a whole, divide it into 2 halves.  When you multiply the half by a quarter, you are quartering the half, so you take the half, divide it into fours, and one of those fours is the answer.

    But how do you do this for division?

    My wife had to type this part because we have a bit of, um, discussion, about how simple this part is....

    How can you explain to 9-11 year old kids why you multiply by the reciprocal without resorting to the axiom? It's easy to show graphically that 1/2 divided by 1/4 is 2 quarters because the kids can see that there are two quarters in one half. Equally so, the kids can understand that 1/4 divided by 1/2 is 1/2 of a half because the kids can see that only half of the half is covered by the original quarter. The problem comes in when their intuition goes out.  They can solve it mathematically, but the teacher is unwilling to have them do the harder problems “on faith“ and the drawing is really confusing the kids. Having tried to draw the 5/8 divided by 3/10, I can assure you, it is quite challenging. And no, the teacher is not willing to keep the problems easy. And no, don't get me started on that aspect of this issue.

    I'm a big fan that if one method of instruction isn't working, I try to find another way to explain the concept. I visited my usual math sites and found that most people don't try to graph this stuff until 10th grade or adulthood. Most of the sites have just had this “go on faith“ response (show the kids the easy ones, and let them “go on faith“ that it will hold true for all cases). I really wish I could figure out a way to show successive subtraction, but even that gets difficult on the more complicated examples.

    What I am hoping is that someone out there can provide me with the “aha!“ I need to come up with a few more ways to explain this. What this has been teaching me is that I've been doing this “on faith“ most of my life and never stopped to think about why myself.

    Any ideas/suggestions would be much appreciated.


  • Larry Osterman's WebLog

    Why doesn't Mozilla (Firefox) like the Microsoft OCA web site?


    In my previous post about OCA, the comments thread has a long discussion started by Shannon J Hager about Mozilla’s behavior when you attempt to access  If you attempt to access this web site using Firefox (or other Mozilla variants), you get the following dialog box:

    Which is weird, because of course the web site works just fine in IE.  No big deal, right – Microsoft’s well known for sleazing the rules for it’s own products, so obviously this is Microsoft’s fault – they probably did something like hard coding in trust to the Microsoft issuing CA.  But I was kinda surprised at this, so I spent a bit of time checking it out...

    The way that SSL certificate verification is supposed to work is that if the issuer of a certificate isn’t trusted, then the code validating the certificate is supposed to check the parent of the issuer to see if IT is trusted.  If the parent of the issuer isn’t trusted, it’s supposed to check the grandparent of the issuer, and so forth until you find the root certificate authority (CA).

    The issuing CA of the certificate on the winqual web site is the “Microsoft Secure Server Authority”, it’s not surprising Mozilla doesn’t trust that one.  The parent of the issuing CA is the “Microsoft Internet Authority”, again, no surprise that Mozilla doesn’t trust it.

    But the grandparent of the issuing CA is the “GTE CyberTrust Root”.  This is a well known CA, and Mozilla should be trusting it.  And what do you know, Mozilla DOES claim to trust that root CA:

    Well, Cesar Eduardo Barros actually went and checked using openssl to see why the CA isn’t trusted.  He tried:

    $ openssl s_client -connect -showcerts

    depth=0 /C=US/ST=Washington/L=Redmond/O=WHDC (Old WHQL)/OU=Microsoft/
    verify error:num=20:unable to get local issuer certificate
    verify return:1
    depth=0 /C=US/ST=Washington/L=Redmond/O=WHDC (Old WHQL)/OU=Microsoft/
    verify error:num=27:certificate not trusted
    verify return:1
    depth=0 /C=US/ST=Washington/L=Redmond/O=WHDC (Old WHQL)/OU=Microsoft/
    verify error:num=21:unable to verify the first certificate
    verify return:1
    Certificate chain
    0 s:/C=US/ST=Washington/L=Redmond/O=WHDC (Old WHQL)/OU=Microsoft/
    i:/DC=com/DC=microsoft/DC=corp/DC=redmond/CN=Microsoft Secure Server Authority
    -----END CERTIFICATE-----
    Server certificate
    subject=/C=US/ST=Washington/L=Redmond/O=WHDC (Old WHQL)/OU=Microsoft/
    issuer=/DC=com/DC=microsoft/DC=corp/DC=redmond/CN=Microsoft Secure Server Authority
    No client certificate CA names sent
    SSL handshake has read 1444 bytes and written 324 bytes
    New, TLSv1/SSLv3, Cipher is RC4-MD5
    Server public key is 1024 bit
    Protocol : TLSv1
    Cipher : RC4-MD5
    Session-ID: [...]
    Master-Key: [...]
    Key-Arg : None
    Start Time: [...]
    Timeout : 300 (sec)
    Verify return code: 21 (unable to verify the first certificate)

    Decoding the certificate it gave me above (openssl x509 -text) I get the same information Mozilla gives me and a bit more, but no copy of the issuer. The only suspicious thing in there is:

    Authority Information Access:
    CA Issuers - URI:
    CA Issuers - URI:http://corppki/aia/msssa1(1).crt

    Getting that URI gives me a blank HTML page with a 0.1 second redirect to itself. (The CRL one seems valid, however.)

    So I was confused, why wasn’t openSSL able to verify the certificate?  So I started asking the security PM’s here at Microsoft what was up.  One of the things he told me was that Microsoft doesn’t hard code ANY intermediate certificates in our browser.  Instead, our browser relies on the referral information in the certificate to chase down the CA hierarchy.

    So why can’t Mozilla do the same thing?  Is there something wrong with our certificates that’s preventing this from working?  I kept on pestering and the PM’s kept on digging.  Eventually I got email from someone indicating “IE is chasing 48.2 AIA”.

    Well, this isn’t very helpful to me, so I asked the security PM in question to explain it in English.  Apparently the root cause of the problem is that IE is following the Authority Information Access 48.2 OID ( to find the parent of the certificate, while Mozilla isn’t.

    Inside the Microsoft certificate is the following:

    And if you go to you’ll find the parent CA for the certificate on the winqual web site.  So now it’s off to figure out if the IE behavior is according to standard, or if it’s another case of Microsoft ignoring web standards in favor of proprietary extensions.

    A few minutes of googling discovers that the AIA 48.2 field is also known as the id-ad-caIssuers OID.  The authoritative reference for this OID is RFC2459 (the RFC that defines the x.509 certificate infrastructure).  It describes this field as:

     The id-ad-caIssuers OID is used when the additional information lists CAs that have issued certificates superior to the CA that
    issued the certificate containing this extension. The referenced CA Issuers description is intended to aid certificate users in
    the selection of a certification path that terminates at a point trusted by the certificate user.

    In other words, IE is correctly chasing the AIA 48.2 references in the certificate to find the root issuing CA of the certificate. Since it didn’t have direct knowledge of the issuing CA, it correctly looked at the AIA 48.2 field of the certificate for the winqual web site and chased the AIA 48.2 references to the root CA.  It appears that Mozilla (and OpenSSL and GnuSSL) apparently don’t follow this link, which is why they pop up the untrusted certificate dialog.

    Issue solved.  Now all someone has to do is to file bugs against Mozilla and OpenSSL to get them to fix their certificate validation logicJ.

    Btw, I want to give HUGE kudo’s to Cesar Eduardo Barros for tirelessly trying to figure this out, and to Michael Howard and the lead program manager for NT security for helping me figure this out.  If you look at the info from the certificate that Cesar posted above, he correctly caught the AIA 48.2 fields inside the CA, it was a huge step in the right direction, all that remained was to figure out what it really meant.

    Edit: Fixed picture links.

    Edit2: Fixed line wrapping of reference from RFC2459.

  • Larry Osterman's WebLog

    So you need a worker thread pool...


    And, for whatever reason, the NT’s built-in thread pool API doesn’t work for you.

    Most people would write something like the following (error checking removed to reduce typing (and increase clarity)):

    class WorkItem
        LIST_ENTRY m_listEntry;

    class WorkerThreadPool
        HANDLE m_heventThreadPool;
        CRITICAL_SECTION m_critsThreadPool;
        LIST_ENTRY m_workItemQueue;

        void QueueWorkItem(WorkItem *pWorkItem)
            //   Insert the work item onto the work item queue.
            InsertTailList(&m_workItemQueue, pWorkItem->m_listEntry);
            //   Kick the worker thread pool
        void WorkItemThread()
            while (1)
                // Wait until we’ve got work to do
                WaitForSingleObject(&m_heventThreadPool, INFINITE);
                //  Remove the first item from the queue.
                workItem = RemoveHeadList(&m_workItemQueue);
                // Process the work item if there is one.
                if (workItem != NULL)
                    <Process Work Item>

    I’m sure there are gobs of bugs here, but you get the idea.  Ok, what’s wrong with this code?  Well, it turns out that there’s a MASSIVE scalability problem in this logic.  The problem is the m_critsWorkItemQueue critical section.  It turns out that this code is vulnerable to condition called “lock convoys” (also known as the “boxcar” problem).  Basically the problem occurs when there are more than one threads waiting on the m_heventThreadPool event.  What happens when QueueWorkItem calls SetEvent on the thread pool event?  All the threads in the thread pool immediately wake up and block on the work queue critical section.  One of the threads will “win” and will acquire the critical section, pull the work item off the queue and release the critical section.  All the other threads will then wake up, one will successfully acquire the critical section, and all the others will go back to sleep.  The one that woke up will see there’s no work to do and will block on the thread pool.  This will continue until all the work threads have made it past the critical section.

    Essentially this is the same situation that you get when you have a bunch of boxcars in a trainyard.  The engine at the front of the cars starts to pull.  The first car moves a little bit, then it stops because the slack between its rear hitch and the front hitch of the second car is removed.  And then the second car moves a bit, then IT stops because the slack between its rear hitch and the front hitch of the 3rd card is removed.  And so forth – each boxcar moves a little bit and then stops.  And that’s just what happens to your threads.  You spend all your valuable CPU time executing context switches between the various threads and none of the CPU time is spent actually processing work items.

    Now there are lots of band-aids that can be applied to this mechanism to make it smoother.  For example, the m_heventThreadPool event could be an auto-reset event, which means that only one thread would wake up for each work item.  But that’s only a temporary solution - if you get a flurry of requests queued to the work pool, you can still get multiple worker threads waking up simultaneously.

    But the good news is that there’s an easier way altogether.  You can use NT’s built-in completion port logic to manage your work queues.  It turns out that NT exposes a really nifty API called PostQueuedCompletionStatus that essentially lets NT manage your worker thread queue for you!

    To use NT’s completion ports, you create the port with CreateIoCompletionPort, remove items from the completion port with GetQueuedCompletionStatus and add items (as mentioned above) with PostQueuedCompletionStatus.

    PostQueuedCompletionStatus takes 3 user specified variables, one of which which can be used to hold a 32 bit integer (dwNumberOfBytesTransferred), and two of which can be used to hold pointers (dwCompletionKey and lpOverlapped).  The contents of these parameters can be ANY value; the API blindly passes them through to GetQueuedCompletionStatus.

    So, using NT’s completion ports, the worker thread class above becomes:

    class WorkItem

    class WorkerThreadPool
        HANDLE m_hcompletionPort;

        void QueueWorkItem(WorkItem *pWorkItem)
            PostQueuedCompletionStatus(m_hcompletionPort, 0, (DWORD_PTR)pWorkItem, NULL);

        void WorkItemThread()
            while (1)
                GetQueuedCompletionStatus(m_hCompletionPort, &numberOfBytes, &pWorkItem, &lpOverlapped, INFINITE);
                // Process the work item if there is one.
                if (pWorkItem != NULL)
                    <Process Work Item>

    Much simpler.  And as an added bonus, since NT’s managing the actual work queue in the kernel, it allows NT to eliminate the lock convoy in the first example.


    [Insert std disclaimer: This posting is provided "AS IS" with no warranties, and confers no rights]

  • Larry Osterman's WebLog

    How to lose customers without really trying...


    Not surprisingly, Valorie and I both do some of our holiday season shopping at ThinkGeek.  But no longer.  Valorie recently placed a substantial order with them, but Instead of processing her order, they sent the following email:

    From: ThinkGeek Customer Service []
    Sent: Thursday, November 15, 2007 4:28 AM
    To: <Valorie's Email Address>
    Subject: URGENT - Information Needed to Complete Your ThinkGeek Order

    Hi Valorie,

    Thank you for your recent order with ThinkGeek, <order number>. We would like to process your order as soon as possible, but we need some additional information in order to complete your order.

    To complete your order, we must do a manual billing address verification check.

    If you paid for your order via Paypal, please send us a phone bill or other utility bill showing the same billing address that was entered on your order.

    If you paid for your order via credit card, please send us one of the following:

    - A phone bill or other utility bill showing the same billing address that was entered on your order

    - A credit card statement with your billing address and last four digits of your credit card displayed

    - A copy of your credit card with last four digits displayed AND a copy of a government-issued photo ID, such as a driver's license or passport.

    To send these via e-mail (a scan or legible digital photo) please reply to or via fax (703-839-8611) at your earliest convenience. If you send your documentation as digital images via email, please make sure they total less than 500kb in size or we may not receive your email. We ask that you send this verification within the next two weeks, or your order may be canceled. Also, we are unable to accept billing address verification from customers over the phone. We must receive the requested documentation before your order can be processed and shipped out.

    For the security-minded among you, we are able to accept PGP-encrypted emails. It is not mandatory to encrypt your response, so if you have no idea what we're talking about, don't sweat it. Further information, including our public key and fingerprint, can be found at the following


    At ThinkGeek we take your security and privacy very seriously. We hope you understand that when we have to take extra security measures such as this, we do it to protect you as well as ThinkGeek.

    We apologize for any inconvenience this may cause, and we appreciate your understanding. If you have any questions, please feel free to email or call us at the number below.


    ThinkGeek Customer Service

    1-888-433-5788 (phone)

    1-703-839-8611 (fax)

    Wow.  We've ordered from them in the past (and placed other large orders with them), but we've never seen anything as outrageous as this.  They're asking for exactly the kind information that would be necessary to perpetuate an identity theft of Valorie's identity, and they're holding our order hostage if we don't comply.

    What was worse is that their order form didn't even ask for the CVE code on the back of the credit card (the one that's not imprinted).  So not only didn't they follow the "standard" practices that most e-commerce sites follow when dealing with credit cards, but they felt it was necessary for us to provide exactly the kind of information that an identity thief would ask for.

    Valorie contacted them to let them know how she felt about it, and their response was:

    Thank you for your recent ThinkGeek order. Sometimes, when an order is placed with a discrepancy between the billing and the shipping addresses, or with a billing address outside the US, or the order is above a certain value, our ordering system will flag the transaction. In these circumstances, we request physical documentation of the billing address on the order in question, to make sure that the order has been placed by the account holder. At ThinkGeek we take your security and privacy very seriously. We hope you understand that when we have to take extra security measures such as this, we do it to protect you as well as ThinkGeek.
    Unfortunately, without this documentation, we are unable to complete the processing of your order. If we do not receive the requested documentation within two weeks of your initial order date, your order will automatically be cancelled. If you can't provide documentation of the billing address on your order, you will need to cancel your current order and reorder using the proper billing address for your credit card. Once we receive and process your documentation, you should not need to provide it on subsequent orders. Please let us know if you have any further questions.

    The good news is that we have absolutely no problems with them canceling the order, and we're never going to do business with them again.  There are plenty of other retailers out there that sell the same stuff that ThinkGeek does who are willing to accept our business without being offensive about it.


    Edit to add:  Think Geek responded to our issues, their latest response can be found here.

  • Larry Osterman's WebLog

    My office guest chair


    Adam writes about his office guest chair.

    Microsoft's a big company, and, like all big company has all sorts of silly rules of what you can have in your office.   One of them is that for office furniture, you get:

    1. A desk chair
    2. One PED (sort of a mobile filing cabinet)
    3. One curved desk piece (we have modular desk pieces with adjustable heights)
    4. One short straight desk piece
    5. One long straight desk piece
    6. One white board
    7. One cork board
    8. One or two hanging book shelves with THREE shelves (not 4)
    9. One guest chair.

    If you're a manager, you can get a round table as well (presumably to have discussions at).

    In my case, most of my office stuff is pretty stock - except I got my manager to requisition a round table for his office for me (he already had one).  I use it to hold my manipulative puzzles.  I also have two PEDs

    But I'm most proud of my guest chair.  I have two of them.  One's the standard Microsoft guest chair.  But the other one's special.  You see, it comes from the original Microsoft campus at 10700 Northup Way, and is at least 20 years old.

    I don't think that it's the original chair I had in my original office way back then - that was lost during one of my moves, but I found the exact match for the chair in a conference room the day after the move and "liberated" it. 

    But I've had this particular chair since at least 1988 or so.  The movers have dutifly moved it with me every time.

    Daniel loves it when he comes to my office since it's comfy - it's padded and the standard guest chairs aren't.

    Edit: Someone asked me to include a picture of the chair:

  • Larry Osterman's WebLog

    The application called an interface that was marshalled for a different thread.

    Another one from someone sending a comment:

    I came across your blog and was wondering if what to do when encountering above error message in an application program.

    The error occurs once in a while when printing out from a Windows application.

    Is there some setting missing in computer administration or in the Registry or can it only be solved in the code?

    Appreciate your help!

    Yech.  This one's ugly.  It's time for Raymond's Psychic Powers(tm) of detection.

    If you take the error message text and look inside winerror.h, you'll see that the error message mentioned is exactly the text for the RPC_E_WRONG_THREAD error.

    If you then do an MSDN search for RPC_E_WRONG_THREAD, the first hit is: "INFO: Explanation of RPC_E_WRONG_THREAD Error".  Essentially, the error's a side effect of messing up threading models.  I wrote about them about 18 months ago in "What are these threading models, and why do I care?". 

    So, knowing that the app's dutifully reporting RPC_E_WRONG_THREAD to the user, what had to have happened to cause this error?

    It means that the application did a CoCreateInstance of an Single Threaded Apartment COM object in one thread, but used it in another thread.

    Given the comment that it only happens once in a while, we can further deduce that the application called CoCreateInstance from a thread in a pool of worker threads, and attempted to use it in a function queued to that pool of threads (otherwise it would fail all the time and the author of the app would have found the problem).  Given that it only happens when printing (an operation that's usually handled in a background thread), this makes sense.

    Unfortunately for the person who asked the question, they don't really have any choice but to contact the vendor that created the app and hope that they have an update that fixes the problem, because there's no workaround you can do outside the app :(

  • Larry Osterman's WebLog

    New Audio APIs for Vista


    In an earlier post, I mentioned that we totally re-wrote the audio stack for Windows Vista.  Today I want to talk a bit about the APIs that came along with the new stack.

    There are three major API components to the Vista audio architecture:

    • Multimedia Device API (MMDEVAPI) - an API for enumerating and managing audio endpoints.
    • Device Topology - an API for discovering the internals of your audio card's topology.
    • Windows Audio Session API ((WASAPI) - the low level API for rendering audio.

    All the existing audio APIs have been re-plumbed to use these APIs internally, for Vista, all audio goes through these three APIs.  For the vast majority of the existing audio applications, things should "just work"...

    In general, we don't expect that anyone will move to these new APIs, they're documented for completeness reasons, but the reality is that unless you're dealing with extremely low latency audio (sub 20ms), or writing a control panel applet for a specific audio adapter, you're not likely to ever want to deal with them (the new APIs really are very low level APIs - using the higher level APIs is both easier and less error prone).


    MMDEVAPI is the entrypoint API - it's a COM class that allows applications to enumerate endpoints and "activate" interfaces on them.  Endpoints fall into two general types: Capture and Render (You can consider Capture endpionts as microphones and line in, Render endpoints are things like speakers).  MMDEVAPI also allows the user to manage defaults for each of the types. As I write this, are actually three different sets of defaults supported in Vista: "Console", "Multimedia", and "Communications".  "Console" is used for general purpose audio, "Multimedia" is intended for audio playback applications (media players, etc), and "Communications" is intended for voice communications (applications like Yahoo! Messenger, Microsoft Communicator, etc). 

    Windows XP had two sets of defaults (the "default" default and the "communications" default), we're adding a 3rd default type to enable multimedia playback.  Consider the following scenario.  I have a Media Center computer.  The SPDIF output from the audio adapter's connected to my home AV receiver, I have a USB headset that I want to use for VOIP, and there are stereo speakers connected to the machine that I use for day-to-day operations.  We want to enable applications to make intelligent choices when they choose which audio device to use - the default in this scenario is to use the desktop speakers, but we want to allow Communicator (or Messenger, or whatever) to use the headset, and Media Center to use the external receiver.  We may end up changing these sets before Vista ships, but this give a flavor of what we're thinking about.

    MMDEVAPI supports an "activation" design pattern - essentially, instead of calling a class factory to create a generic object, then binding the object to another object, with activation, you can enumerate objects (endpoints in this case) and "activate" an interface on that object.  It's a really convenient pattern when you have a set of objects that may or may not have the same type.

    Btw, you can access the category defaults using wave or mixer messages, this page from MSDN describes how to access them - the console default is accessed via DRVM_MAPPER_PREFERRED_GET and the communications default is accessed via DRVM_MAPPER_CONSOLEVOICECOM_GET.

    Device Topology:

    Personally, I don't believe that anyone will ever use Device Topology, except for audio hardware vendors who are writing custom control panel extensions.  It exists for control panel type applications that need to be able to determine information about the actual hardware. 

    Device Topology exposes collections of parts and the connections between those parts.  On any part, there are zero or more controls, which roughly correspond to the controls exposed by the audio driver.  One cool thing about device topologies is that topologies can connect to other topologies.  So in the future, it's possible that an application running on an RDP server may be able to enumerate and address the audio devices on the RDP client - instead of treating the client as an endpoint, the server might be able to enumerate the device topology on the RDP client and manipulate controls directly on the client.  Similarly, in the future, the hardware volume control for a SPDIF connector might manipulate the volume on an external AV receiver via an external control connection (1394 or S/LINK).

    One major change between XP and Vista is that Device Topology will never lie about the capabilities of the hardware - before Vista, if a piece of hardware didn't have a particular control the system tried to be helpful and provide controls that it thought ought to be there (for instance if a piece of hardware didn't have a volume control, the system helpfully added one).  For Vista, we're reporting exactly what the audio hardware reports, and nothing more.  This is a part of our philosophy of "don't mess with the user's audio streams if we don't have to" - emulating a hardware control when it's not necessary adds potentially unwanted DSP to the audio stream.

    Again, the vast majority of applications shouldn't need to use these controls, for most applications, the functionality provided by the primary APIs (mixerLine, wave, DSound, etc) are going to be more suitable for their needs.


    WASAPI is the "big kahuna" for the audio engine.  You activate WASAPI on an endpoint, and it provides functionality for rendering/capturing audio streams.  It also provides functions to manage the audio clock and manipulate the volume of the audio stream.

    In general, WASAPI operates in two modes.  In "shared" mode, audio streams are rendered by the application and mixed by the global audio engine before they're rendered out the audio device.  In "exclusive" mode, audio streams are rendered directly to the audio adapter, and no other application's audio will play.  Obviously the vast majority of applications will operate in shared mode, that's the default for the wave APIs and DSound.  One relatively common scenario that WILL use exclusive mode is rendering content that requires a codec that's present in the hardware that Windows doesn't understand.  A simple example of this is compressed AC3 audio rendered over a SPDIF connection - if you attempt to render this content, if Windows doesn't have a decoder for this content, then DSound will automatically initialize WASAPI in exclusive mode and will render the content directly to the hardware.

    If your application is a pro audio application, or is interested in extremely low latency audio then you probably want to consider using WASAPI, otherwise it's better to stick with the existing APIs.

    Tomorrow: Volume control (a subject that's near and dear to my heart) :)

  • Larry Osterman's WebLog

    Turning the blog around - End of Life issues.


    I'd like to turn the blog around again and ask you all a question about end-of-life issues.

    And no, it's got nothing to do with Terry Schaivo.

    Huge amounts of text have been written about Microsoft's commitment to platform stability.

    But platform stability comes with an engineering cost.  It gets expensive maintaining old code - typically it's not written to modern coding standards, the longer that it exists, the more heavily patched it becomes, etc.

    For some code that's sufficiently old, the amount of engineering that's needed to move the code to a new platform can become prohibitively expensive (think about what would be involved in porting code originally written for MS-DOS to a 128bit platform).

    So for every API, the older it gets, the more the temptation exists to find a way of ending its viable lifetime.

    On the other hand, you absolutely can't break applications.  And not just the applications that are commercially available - If a customer's line-of-business application fails because you decided to remove an API, you're going to have to put the API back.

    So here's my question: Under what circumstances is it ok to remove an API from the operating system?  Must you carry them on forever?

    This isn't just a Microsoft question.  It's a platform engineering problem - if you're committed to a stable platform (in other words, on your platform, you're not going to break existing applications on a new version of the platform), then you're going to have to face these issues.

    I have some opinions on this (no, really?) but I want to hear from you folks before I spout off on them.

  • Larry Osterman's WebLog

    What's wrong with this code, part 10

    Ok, time for another "what's wrong with this code".  This one's trivial from a code standpoint, but it's tricky...

    // ----------------------------------------------------------------------
    // Function:
    // CThing1::OnSomethingHappening()
    // Description:
    // Called when something happens
    // Return:
    // S_OK if successful
    // ----------------------------------------------------------------------
    HRESULT CThing1::OnSomethingHappening()
        HRESULT hr;

        <Do Some Stuff>
        // Perform some operation...
        hr = PerformAnOperation();
        if (FAILED(hr))
            hr = ERROR_NOT_SUPPORTED;
        IF_FAILED_JUMP(hr, Error);

        return hr;

        goto Exit;

    Not much code, no?  So what's wrong with it?

    As usual, answers and kudos tomorrow.

  • Larry Osterman's WebLog

    Why add a throw() to your methods?


    Towards the end of the comments in my last "What's wrong with this code" , hippietim asked why I added a throw()attribute to the destructor of the CCoInitializer.  The answer's pretty simple.  If you add a throw() attribute around routines that never throw, the compiler can be clever about code motion and optimization.  Consider the following totally trivial:

    class MyClass
        size_t CalculateFoo()
        size_t MethodThatCannotThrow()
    return 100;
    void ExampleMethod()
            size_t foo, bar;
                foo = CalculateFoo();
                bar = foo * 100;
    "bar is %d", bar);
            catch (...)


    When the compiler sees this, with the "throw()" attribute, the compiler can completely optimize the "bar" variable away, because it knows that there is no way for an exception to be thrown from MethodThatCannotThrow().  Without the throw() attribute, the compiler has to create the "bar" variable, because if MethodThatCannotThrow throws an exception, the exception handler may/will depend on the value of the bar variable.

    In addition, source code analysis tools like prefast can (and will) use the throw() annotation to improve their error detection capabilities - for example, if you have a try/catch and all the functions you call are marked as throw(), you don't need the try/catch (yes, this has a problem if you later call a function that could throw).

  • Larry Osterman's WebLog

    One in a million is next Tuesday


    Back when I was a wee young lad, fresh from college, I thought I knew everything there was to know.


    I’ve since been disabused of that notion, rather painfully.

    One of the best happened very early on, back when I was working on DOS 4.  We ran into some kind of problem (I’ll be honest and say that I don’t remember what it was). 

    I was looking into the bug with Gordon Letwin, the architect for DOS 4.  I looked at the code and commented “Maybe this is what was happening?  But if that were the case, it’d take a one in a million chance for it to happen”.

    Gordon’s response was simply: “In our business, one in a million is next Tuesday”.

    He then went on to comment that at the speeds which modern computers operate (4.77 MHz remember), things happened so quickly that something with a one in a million chance of occurrence is likely to happen in the next day or so.

    I’m not sure I’ve ever received better advice in my career. 

    It has absolutely stood the test of time – no matter how small the chance of something happening, with modern computers and modern operating systems, essentially every possible race condition or deadlock will be found within a reasonable period of time.

    And I’ve seen some absolute doozies in my time – race conditions on MP machines where a non interlocked increment occurred (one variant of Michael Grier’s “i = i + 1” bug).   Data corruptions because you have one non protected access to a data structure.  I’m continually amazed at the NT scheduler’s uncanny ability to context switch my application at just the right time as to expose my data synchronization bug.  Or to show just how I can get my data structures deadlocked in hideous ways.

    So nowadays, whenever anyone comments on how unlikely it is for some event to occur, my answer is simply: “One in a million is next Tuesday”.

    Edit: To fix the spelling of MGrier's name.

    Edit:  My wife pointed out the following and said it belonged with this post:

  • Larry Osterman's WebLog

    Larry's rules of software engineering #2: Measuring testers by test metrics doesn't.


    This one’s likely to get a bit controversial J.

    There is an unfortunate tendency among test leads to measure the performance of their testers by the number of bugs they report.

    As best as I’ve been able to figure out, the logic works like this:

    Test Manager 1: “Hey, we want to have concrete metrics to help in the performance reviews of our testers.  How can we go about doing that?”
    Test Manager 2: “Well, the best testers are the ones that file the most bugs, right?”
    Test Manager 1: “Hey that makes sense.  We’ll measure the testers by the number of bugs they submit!”
    Test Manager 2: “Hmm.  But the testers could game the system if we do that – they could file dozens of bogus bugs to increase their bug count…”
    Test Manager 1: “You’re right.  How do we prevent that then? – I know, let’s just measure them by the bugs that are resolved “fixed” – the bugs marked “won’t fix”, “by design” or “not reproducible” won’t count against the metric.”
    Test Manager 2: “That sounds like it’ll work, I’ll send the email out to the test team right away.”

    Sounds good, right?  After all, the testers are going to be rated by an absolute value based on the number of real bugs they find – not the bogus ones, but real bugs that require fixes to the product.

    The problem is that this idea falls apart in reality.

    Testers are given a huge incentive to find nit-picking bugs – instead of finding significant bugs in the product, they try to find the bugs that increase their number of outstanding bugs.  And they get very combative with the developers if the developers dare to resolve their bugs as anything other than “fixed”.

    So let’s see how one scenario plays out using a straightforward example:

    My app pops up a dialog box with the following:


                Plsae enter you password:  _______________ 


    Where the edit control is misaligned with the text.

    Without a review metric, most testers would file a bug with a title of “Multiple errors in password dialog box” which then would call out the spelling error and the alignment error on the edit control.

    They might also file a separate localization bug because there’s not enough room between the prompt and the edit control (separate because it falls under a different bug category).

    But if the tester has their performance review based on the number of bugs they file, they now have an incentive to file as many bugs as possible.  So the one bug morphs into two bugs – one for the spelling error, the other for the misaligned edit control. 

    This version of the problem is a total and complete nit – it’s not significantly more work for me to resolve one bug than it is to resolve two, so it’s not a big deal.

    But what happens when the problem isn’t a real bug – remember – bugs that are resolved “won’t fix” or “by design” don’t count against the metric so that the tester doesn’t flood the bug database with bogus bugs artificially inflating their bug counts. 

    Tester: “When you create a file when logged on as an administrator, the owner field of the security descriptor on the file’s set to BUILTIN\Administrators, not the current user”.
    Me: “Yup, that’s the way it’s supposed to work, so I’m resolving the bug as by design.  This is because NT considers all administrators as idempotent, so when a member of BUILTIN\Administrators creates a file, the owner is set to the group to allow any administrator to change the DACL on the file.”

    Normally the discussion ends here.  But when the tester’s going to have their performance review score based on the number of bugs they submit, they have an incentive to challenge every bug resolution that isn’t “Fixed”.  So the interchange continues:

    Tester: “It’s not by design.  Show me where the specification for your feature says that the owner of a file is set to the BUILTIN\Administrators account”.
    Me: “My spec doesn’t.  This is the way that NT works; it’s a feature of the underlying system.”
    Tester: “Well then I’ll file a bug against your spec since it doesn’t document this.”
    Me: “Hold on – my spec shouldn’t be required to explain all of the intricacies of the security infrastructure of the operating system – if you have a problem, take it up with the NT documentation people”.
    Tester: “No, it’s YOUR problem – your spec is inadequate, fix your specification.  I’ll only accept the “by design” resolution if you can show me the NT specification that describes this behavior.”
    Me: “Sigh.  Ok, file the spec bug and I’ll see what I can do.”

    So I have two choices – either I document all these subtle internal behaviors (and security has a bunch of really subtle internal behaviors, especially relating to ACL inheritance) or I chase down the NT program manager responsible and file bugs against that program manager.  Neither of which gets us closer to shipping the product.  It may make the NT documentation better, but that’s not one of MY review goals.

    In addition, it turns out that the “most bugs filed” metric is often flawed in the first place.  The tester that files the most bugs isn’t necessarily the best tester on the project.  Often times the tester that is the most valuable to the team is the one that goes the extra mile and spends time investigating the underlying causes of bugs and files bugs with detailed information about possible causes of bugs.  But they’re not the most prolific testers because they spend the time to verify that they have a clean reproduction and have good information about what is going wrong.  They spent the time that they would have spent finding nit bugs and instead spent it making sure that the bugs they found were high quality – they found the bugs that would have stopped us from shipping, and not the “the florblybloop isn’t set when I twiddle the frobjet” bugs.

    I’m not saying that metrics are bad.  They’re not.  But basing people’s annual performance reviews on those metrics is a recipe for disaster.

    Somewhat later:  After I wrote the original version of this, a couple of other developers and I discussed it a bit at lunch.  One of them, Alan Ludwig, pointed out that one of the things I missed in my discussion above is that there should be two halves of a performance review:

                MEASUREMENT:          Give me a number that represents the quality of the work that the user is doing.
    And      EVALUATION:               Given the measurement, is the employee doing a good job or a bad job.  In other words, you need to assign a value to the metric – how relevant is the metric to your performance.

    He went on to discuss the fact that any metric is worthless unless it is reevaluated at every time to determine how relevant the metric is – a metric is only as good as its validity.

    One other comment that was made was that absolute bug count metrics cannot be a measure of the worth of a tester.  The tester that spends two weeks and comes up with four buffer overflow errors in my code is likely to be more valuable to my team than the tester that spends the same two weeks and comes up with 20 trivial bugs.  Using the severity field of the bug report was suggested as a metric, but Alan pointed out that this only worked if the severity field actually had significant meaning, and it often doesn’t (it’s often very difficult to determine the relative severity of a bug, and often the setting of the severity field is left to the tester, which has the potential for abuse unless all bugs are externally triaged, which doesn’t always happen).

    By the end of the discussion, we had all agreed that bug counts were an interesting metric, but they couldn’t be the only metric.

    Edit: To remove extra <p> tags :(

  • Larry Osterman's WebLog

    Choosing a C runtime library


    Yesterday a developer in my group came by asking about a failure he saw when running the application verifier on his component.  The app verifier was reporting that he was using a HEAP_NO_SERIALIZE heap from a thread other than the one that created the heap.

    I looked a bit deeper and realized that he was running with the single threaded statically linked C runtime library.  An honest mistake, given that it’s the default version of the C runtime library.

    You see, there are 3 different versions of the C runtime library shipped (and 3 different versions of the ATL and MFC libraries too). 

    The first is the statically linked single-threaded library.  This one can be used only on single threaded applications, and all the object code for the C runtime library functions used is included in the application binary.  You get this with the /ML compiler switch.

    The second is the statically linked, multi-threaded library.  This one’s the same as the first, but you can use it in a multithreaded application.  You get this one with the /MT compiler switch.

    The third is the dynamically linked library.  This one keeps all the C runtime library code in a separate DLL (MSVCRTxx.DLL).  Since the runtime library code’s in a DLL, it also handles multi-threaded issues.   The DLL library is enabled with the /MD switch.

    But I’ve been wondering.  Why on earth would anyone ever choose any option OTHER than multi-threaded DLL version of the runtime library?

    There are LOTS of reasons for always using the multithreaded DLL:

    1)      Your application is smaller because it doesn’t have the C runtime library loaded into it.

    2)      Because of #1, your application will load faster.  The C runtime library is almost certainly in memory, so the pages containing the library don’t have to be read from disk.

    3)      Using the multithreaded DLL future-proofs your application.  If you ever add a second thread to your application (or call into an API that creates multiple threads), you don’t have to remember to change your C runtime library.  And unless you’re running the app verifier regularly, the only way you’ll find out about the problem is if you get a heap corruption (if you’re lucky).

    4)      If your application has multiple DLL’s, then you need to be VERY careful about allocation – each DLL will have its own C runtime library heap, as will the application.  If you allocate a block in one DLL, you must free it in the same DLL.

    5)      If a security bug is ever found in the C runtime library, you don’t have to release an update to your app.

    The last one’s probably the most important IMHO.  Just to be clear - There haven’t been any security holes found in the C runtime library.  But it could happen.  And when it happens, it’s pretty ugly.  A really good example of this can be seen with the security vulnerability that was found in the zlib compression library. This library was shipped in dozens of products, and every single one of them had to be updated.  If you do a google search for “zlib library security vulnerability” you can see some of the chaos that resulted from this disclosure.  If your app used the DLL C runtime library, then you’d get the security fix for free from windows update when Microsoft posted the update.

    The only arguments I’ve been able to come up with for using the static C runtime libraries are:

    1)      I don’t have to distribute two binaries with my application – If I use the DLL, I need to redistribute the DLL.  This makes my application setup more complicated.

    Yes, but not significantly (IMHO).  This page lists the redistribution info for the C runtime library and other components.

    2)      If I statically link to the C runtime library, I avoid DLL hell.

    This is a red herring IMHO.  Ever since VC6, the C runtime library has been tightly versioned, as long as your installer follows the rules for version checking of redistributable files (found here) you should be ok.

    3)      My code is faster since the C runtime library doesn’t have to do all that heap synchronization stuff.

    Is it really?  How much checking is involved in the multithreaded library?  Let’s see.  The multithreaded library puts some stuff that was kept in global variable in thread local storage.  So there’s an extra memory indirection involved on routines like strtok etc.  Also, the single threaded library creates it’s heap with HEAP_NO_SERIALIZE (that’s what led to this entire post J).  But that just wraps the heap access with an EnterCriticalSection/ExitCriticalSection.  Which is very very fast if there’s no contention.  And since this is a single threaded application, by definition there’s no contention for the critical section.

    Using the multithreaded DLL C runtime library is especially important for systems programmers.  First off, if your system component is a DLL, it’s pretty safe to assume that you’ll be called from multiple threads, so at an absolute minimum, you’re going to want to use the multithreaded static C runtime library.  And if you’re using the multithreaded static C runtime library, why NOT use the DLL version?

    If you’re not writing a DLL, then it’s highly likely that your app does (or will) use multiple threads.  Which brings me back to the previous comment – why NOT use the DLL version? 

    You’re app will be smaller, more secure, future-proof, and no slower than if you don’t.


  • Larry Osterman's WebLog

    Should I check the parameters to my function?


    I just had an interesting discussion with one of the testers in my group.

    He had just finished filing a series of bugs against our components because they weren’t failing when he passed bogus pointers to the API.  Instead, they raised a 0xC0000005 exception and crashed his application.

    The APIs did fail if he passed a null pointer in, with E_POINTER. 

    But he felt that the API should check all the bogus pointers passed in and fail with E_POINTER if the pointer passed in didn’t point to valid memory.

    This has been a subject of a lot of ongoing discussion over the years internally here at Microsoft.  There are two schools of thought:

    School one says “We shouldn’t crash the application on bogus data.  Crashing is bad.  So we should check our parameters and return error codes if they’re bogus”.

    School two says “GIGO – if the app hands us garbage, then big deal if it crashes”.

    I’m firmly in the second camp (not surprisingly, if you know me).  There are a lot of reasons for this.  The biggest one is security.  The way you check for bad pointers on Win32 is by calling the IsBadReadPtr and IsBadWritePtr API.  Michael Howard calls these APIs “CrashMyApplication” and “CorruptMemoryAndCrashMySystem” respectively.  The problem with IsBadReadPtr/IsBadWritePtr is that they do exactly what they’re advertised as doing:  They read and/or write to the memory location specified, with an exception handler wrapped around the read/write.  If an exception is thrown, they fail, if not, they succeed.

    There are two problems with this.  The only thing that IsBadReadPtr/IsBadWritePtr verifies is that at the instant that the API is called, there was valid memory at that location.  There’s nothing to prevent another thread in the application from unmapping the virtual address passed into IsBadReadPtr immediately after the call is made.  Which means that any error checks you made based on the results of this API aren’t valid (this is called out in the documentation for IsBadWritePtr/IsBadReadPtr).

    The other one is worse.  What happens if the memory address passed into IsBadReadPtr is a stack guard page (a guard page is a page kept at the bottom of the stack – when the system top level exception handler sees a fault on a guard page, it will grow the threads stack (up to the threads stack limit))?  Well, the IsBadReadPtr will catch the guard page exception and will handle it (because IsBadReadPtr handles all exceptions).  So the system exception handler doesn’t see the exception.  Which means that when that thread later runs, its stack won’t grow past the current limit.  By calling IsBadReadPtr in your API, you’ve turned an easily identifiable application bug into a really subtle stack overflow bug that may not be encountered for many minutes (or hours) later.

    The other problem with aggressively checking for bad parameters on an API is that what happens if the app doesn’t check the return code from the API?  This means that they could easily have a bug in their code that passes a bogus pointer into IsBadWritePtr, thus corrupting memory.  But, since they didn’t check the return code, they don’t know about their bug.  And, again, much later the heap corruption bug that’s caused by the call to IsBadWritePtr shows up.  If the API had crashed, then they’d find the problem right away.

    Now, having said all this, if you go with school two, you’ve still got a problem – you can’t trust the user’s buffers.  At all.  This means you’ve got to be careful when touching those buffers to ensure that you’re not going to deadlock the process by (for instance holding onto a critical section while writing to the user’s buffer).

    The other thing to keep in mind is that there are some situations where it’s NOT a good idea to crash the user’s app.  For example, if you’re using RPC, then RPC uses structured exception handling to communicate RPC errors back to the application (as opposed to API return codes).  So sometimes you have no choice but to catch the exceptions and return them.  The other case is if someone has written and shipped an existing API that uses IsBadReadPtr to check for bad pointers on input, it may not be possible to remove this because there may be applications that depend on this behavior.

    So in general, it’s a bad idea to use IsBadXxxPtr on your input parameters to check for correctness.  Your users may curse you for crashing their app when they screw up, but in the long term, it’s a better idea.

  • Larry Osterman's WebLog

    What are Known DLLs anyway?


    In my previous post about DLLs and how they work, I commented that winmm.dll was a KnownDLL in Longhorn.  It turns out that this is a bug in an existing KnownDLL. But what in the heck ARE Known DLLs in the first place?

    Well, it turns out that it’s in the KB, and I’ll summarize.

    KnownDLL’s is a mechanism in Windows NT (and win9x) that allows the system to “cache” commonly used system DLLs.  It was originally added to improve application load time, but it also can be considered a security mechanism, since it prevents people from exploiting weak application directory permissions by dropping in Trojan horse versions of system DLLs (since the key system DLLs are all known DLLs, the version of the file in the application directory will be ignored).  As a security mechanism it's not a particularly strong mechanism (if you can write to the directory that contains a program, you can create other forms of havoc), but it can be considered a security mechanism.

    If you remember from my previous article, when the loader finds a DLL import record in an executable, it opens the file and tries to map the file into memory.  Well, that’s not ENTIRELY the case.  In fact, before that happens the loader looks for an existing section called \KnownDlls\<dll filename>.  If that section exists, then instead of opening the file, the loader simply uses the existing section.   It then follows all the “normal” rules for loading a DLL.

    When the system boots, it looks in the registry at HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\KnownDLLs and creates a \KnownDlls\<dll filename> section for every DLL listed under that registry key.

    If you compare the HKLM\System\CCS\Control\Session Manager\KnownDLLs registry key with the sections under \KnownDlls (using a viewer like winobj), you’ll notice that the \KnownDlls object container always has more entries in it than the registry key.  This is because the \KnownDlls sections are computed as the transitive closure of the DLLs listed in KnownDLLs.  So if a DLL’s listed in KnownDLLs, all of the DLL’s that are statically linked with the DLL are ALSO listed in the \KnownDlls section.

    Also, if you look in the KnownDLLs registry key, you’ll notice that there’s no path listed for the KnownDLLs.  That’s because all KnownDLLs are assumed to be in the directory pointed to by HKLM\System\CCS\Control\KnownDLLs\DllDirectory registry value.  Again, this is an aspect of KnownDLLs being a security feature – by requiring KnownDLLs to be in the same directory, it makes it harder for someone to inject their own Trojan version of one of the KnownDLLs.

    Oh, and if the KnownDLLs processing causes any issues, or if for some other reason you don't want the system to load a DLL as a KnownDll, then you can set HKLM\System\CCS\Control\Session Manager\ExcludeFromKnownDlls to exclude a DLL from the KnownDll processing.  So in my example, until the bug is fixed in the existing KnownDLL, I’m adding winmm.dll to my ExcludeFromKnownDlls list.



Page 2 of 33 (815 items) 12345»