Larry Osterman's WebLog

Confessions of an Old Fogey
  • Larry Osterman's WebLog

    Why should I even bother to use DLL's in my system?

    • 9 Comments

    At the end of this blog entry, I mentioned that when I drop a new version of winmm.dll on my machine, I need to reboot it.  Cesar Eduardo Barros asked:

    Why do you have to reboot? Can't you just reopen the application that's using the dll, or restart the service that's using it?

    It turns out that in my case, it’s because winmm’s listed in the “Known DLLs” for Longhorn.  And Windows treats “KnownDLLs” as special – if a DLL is a “KnownDLL” then it’s assumed to be used by lots of processes, and it’s not reloaded from the disk when a new process is created – instead the pages from the existing DLL is just remapped into the current process.

    But that and a discussion on an internal alias got me to thinking about DLL’s in general.  This also came up during my previous discussion about the DLL C runtime library.

    At some point in the life of a system, you decide that you’ve got a bunch of code that’s being used in common between the various programs that make up the system. 

    Maybe that code’s only used in a single app – one app, 50 instances.

    Maybe that code’s used in 50 different apps – 50 apps, one instance.

    In the first case, it really doesn’t matter if you refactor the code into a separate library or not.  You’ll get code sharing regardless.

    In the second case, however, you have two choices – refactor the code into a library, or refactor the code into a DLL.

    If you refactor the code into a library, then you’ll save in complexity because the code will be used in common.  But you WON’T gain any savings in memory – each application will have its own set of pages dedicated to the contents of the shared library.

    If, on the other hand you decide to refactor the library into its own DLL, then you will still save in complexity, and you get the added benefit that the working set of ALL 50 applications is reduced – the pages occupied by the code in the DLL are shared between all 50 instances.

    You see, NT's pretty smart about DLL's (this isn’t unique to NT btw; most other operating systems that implement shared libraries do something similar).  When the loader maps a DLL into memory, it opens the file, and tries to map that file into memory at its preferred base address.  When this happens, memory management just says “The memory from this virtual address to this other virtual address should come from this DLL file”, and as the pages are touched, the normal paging logic brings them into memory.

    If they are, it doesn't go to disk to get the pages; it just remaps the pages from the existing file into the new process.  It can do this because the relocation fixups have already been fixed up (the relocation fixup table is basically a table within the executable that contains the address of every absolute jump in the code for the executable – when an executable is loaded in memory, the loader patches up these addresses to reflect the actual base address of the executable), so absolute jumps will work in the new process just like they would in the old.  The pages are backed with the file containing the DLL - if the page containing the code for the DLL’s ever discarded from memory, it will simply go back to the DLL file to reload the code pages. 

    If the preferred address range for the DLL isn’t available, then the loader has to do more work.  First, it maps the pages from the DLL into the process at a free location in the address space.  It then marks all the pages as Copy-On-Write so it can perform the fixups without messing the pristine copy of the DLL (it wouldn’t be allowed to write to the pristine copy of the DLL anyway).  It then proceeds to apply all the fixups to the DLL, which causes a private copy of the pages containing fixups to be created and thus there can be no sharing of the pages which contain fixups.

    This causes the overall memory consumption of the system goes up.   What’s worse, the fixups are performed every time that the DLL is loaded at an address other than the preferred address, which slows down process launch time.

    One way of looking at it is to consider the following example.  I have a DLL.  It’s a small DLL; it’s only got three pages in it.  Page 1 is data for the DLL, page 2 contains resource strings for the DLL, and page 3 contains the code for the DLL.  Btw, DLL’s this small are, in general, a bad idea.  I was recently enlightened by some of the office guys as to exactly how bad this is, at some point I’ll write about it (assuming that Raymond or Eric don’t beat me too it).

    The DLL’s preferred base address is at 0x40000 in memory.  It’s used in two different applications.  Both applications are based starting at 0x10000 in memory, the first one uses 0x20000 bytes of address space for its image, the second one uses 0x40000 bytes for its image.

    When the first application launches, the loader opens the DLL, maps it into its preferred address.  It can do it because the first app uses between 0x10000 and 0x30000 for its image.  The pages are marked according to the protections in the image – page 1 is marked copy-on-write (since it’s read/write data), page 2 is marked read-only (since it’s a resource-only page) and page 3 is marked read+execute (since it’s code).  When the app runs, as it executes code in the 3rd page of the DLL, the pages are mapped into memory.  The instant that the DLL writes to its data segment, the first page of the DLL is forked – a private copy is made in memory and the modifications are made to that copy. 

    If a second instance of the first application runs (or another application runs that also can map the DLL at 0x40000), then once again the loader maps the DLL into its preferred address.  And again, when the code in the DLL is executed, the code page is loaded into memory.  And again, the page doesn’t have to be fixed up, so memory management simply uses the physical memory that contains the page that’s already in memory (from the first instance) into the new application’s address space.  When the DLL writes to its data segment, a private copy is made of the data segment.

    So we now have two instances of the first application running on the system.  The space used for the DLL is consuming 4 pages (roughly, there’s overhead I’m not counting).  Two of the pages are the code and resource pages.  The other two are two copies of the data page, one for each instance.

    Now let’s see what happens when the second application (the one that uses 0x40000 bytes for its image).  The loader can’t map the DLL to its preferred address (since the second application occupies from 0x10000 to 0x50000).  So the loader maps the DLL into memory at (say) 0x50000.  Just like the first time, it marks the pages for the DLL according to the protections in the image, with one huge difference: Since the code pages need to be relocated, they’re ALSO marked copy-on-write.  And then, because it knows that it wasn’t able to map the DLL into its preferred address, the loader patches all the relocation fixups.  These cause the page that contains the code to be written to, and so memory management creates a private copy of the page.  After the fixups are done, the loader restores the page protection to the value marked in the image.  Now the code starts executing in the DLL.  Since it’s been mapped into memory already (when the relocation fixups were done), the code is simply executed.  And again, when the DLL touches the data page, a new copy is created for the data page.

    Once again, we start a second instance of the second application.  Now the DLL’s using 5 pages of memory – there are two copies of the code page, one for the resource page, and two copies of the data page.  All of which are consuming system resources.

    One think to keep in mind is that the physical memory page that backs resource page in the DLL is going to be kept in common among all the instances, since there are no relocations to the page, and the page contains no writable data - thus the page is never modified.

    Now imagine what happens when we have 50 copies of the first application running.  There are 52 pages in memory consumed by the DLL – 50 pages for the DLL’s data, one for the code, and one for the resources.

    And now, consider what happens if we have 50 copies of the second application running, Now, we get 101 pages in memory, just from this DLL!  We’ve got 50 pages for the DLL’s data, 50 pages for the relocated code, and still the one remaining for the resources.  Twice the memory consumption, just because the DLL was wasn’t rebased properly.

    This increase in physical memory isn’t usually a big deal when it’s happens only once. If, on the other hand, it happens a lot, and you don’t have the physical RAM to accommodate this, then you’re likely to start to page.  And that can result in “significantly reduced performance” (see this entry for details of what can happen if you page on a server).

    This is why it's so important to rebase your DLL's - it guarantees that the pages in your DLL will be shared across processes.  This reduces the time needed to load your process, and means your process working set is smaller.   For NT, there’s an additional advantage – we can tightly pack the system DLL’s together when we create the system.  This means that the system consumes significantly less of the applications address space.  And on a 32 bit processor, application address space is a precious commodity (I never thought I’d ever write that an address space that spans 2 gigabytes would be considered a limited resource, but...).

    This isn’t just restricted to NT by the way.  Exchange has a script that’s run on every build that knows what DLLs are used in what processes, and it rebases the Exchange DLL’s so that they fit into unused slots regardless of the process in which the DLL is used.  I’m willing to bet that SQL server has something similar.

    Credits: Thanks to Landy, Rick, and Mike for reviewing this for technical accuracy (and hammering the details through my thick skull).  I owe you guys big time.

     

  • Larry Osterman's WebLog

    Why do people think that a server SKU works well as a general purpose operating system?

    • 70 Comments

    Sometimes the expectations of our customers mystify me.

     

    One of the senior developers at Microsoft recently complained that the audio quality on his machine (running Windows Server 2008) was poor.

    To me, it’s not surprising.  Server SKUs are tuned for high performance in server scenarios, they’re not configured for desktop scenarios.  That’s the entire POINT of having a server SKU – one of the major differences between server SKUs and client SKUs is that the client SKUs are tuned to balance the OS in favor of foreground responsiveness and the server SKUs are tuned in favor of background responsiveness (after all, its a server, there’s usually nobody sitting at the console, so there’s no point in optimizing for the console).

     

    In this particular case, the documentation for the MMCSS service describes a large part of the root cause for the problem:  The MMCSS service (which is the service that provides glitch resilient services for Windows multimedia applications) is essentially disabled on server SKUs.  It’s just one of probably hundreds of other settings that are tweaked in favor of server responsiveness on server SKUs. 

     

    Apparently we’ve got a bunch of support requests coming in from customers who are running server SKUs on their desktop and are upset that audio quality is poor.  And this mystifies me.  It’s a server operating system – if you want client operating system performance, use a client operating system.

     

     

    PS: To change the MMCSS tuning options, you should follow the suggestions from the MSDN article I linked to above:

    The MMCSS settings are stored in the following registry key:

    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Multimedia\SystemProfile

    This key contains a REG_DWORD value named SystemResponsiveness that determines the percentage of CPU resources that should be guaranteed to low-priority tasks. For example, if this value is 20, then 20% of CPU resources are reserved for low-priority tasks. Note that values that are not evenly divisible by 10 are rounded up to the nearest multiple of 10. A value of 0 is also treated as 10.

    For Vista, this value is set to 20, for Server 2008 the value is set to 100 (which disables MMCSS).

  • Larry Osterman's WebLog

    What comes after Quaternary?

    • 21 Comments

    Valorie asked me this question today, and I figured I'd toss it out to everyone who runs across this post.

    She works in a 5/6 split class, and they're working on a unit on patterns and functions.  They're ordering the data into columns, each of which is derived from the information in the previous column.

    The question is: What do they label the columns?

    The first couple are obvious: Primary, and Secondary.

    Third and fourth: Tertiary, Quaternary.

    But what's the label for the fifth and subsequent columns?

    AskOsford.com suggested that they use quinary (5), senary (6), septenary (7), octonary (8), nonary (9) and denary (10), using the latin roots.

    But the teacher in the class remembers a different order and thinks that the next one (5) should be cinquainary (using the same root as the poetry form cinquains).

    Valorie also pointed to  http://mathforum.com/dr.math/faq/faq.polygon.names.html for a 2-page history lesson. Coolest fact she found: the "gon" part of the word means "knee" and the "hedron" means "seats" so a polygon means "many knees" and polyhedra means "many seats".

    So does anyone have any suggestions?

     

  • Larry Osterman's WebLog

    Nobody ever reads the event logs…

    • 19 Comments

    In my last post, I mentioned that someone was complaining about the name of the bowser.sys component that I wrote 20 years ago.  In my post, I mentioned that he included a screen shot of the event viewer.

    What was also interesting thing was the contents of the screen shot.

    “The browser driver has received too many illegal datagrams from the remote computer <redacted> to name <redacted> on transport NetBT_Tcpip_<excluded>.  The data is the datagram.  No more events will be generated until the reset frequency has expired.”

    I added this message to the browser 20 years ago to detect computers that were going wild sending illegal junk on the intranet.  The idea was that every one of these events indicated that something had gone horribly wrong on the machine which originated the event and that a developer or network engineer should investigate the problem (these illegal datagrams were often caused by malfunctioning networking hardware (which was not uncommon 20 years ago)).

    But you’ll note that the person reporting the problem only complained about the name of the source of the event log entry.  He never bothered to look at the contents of this “error” event log entry to see if there was something that was worth reporting.

    Part of the reason that nobody bothers to read the event logs is that too many components log to the eventlog.  The event logs on customers computers are filled with unactionable meaningless events (“The <foo> service has started.  The <foo> service has entered the running state.  The <foo> service is stopping.  The <foo> service has entered the stopped state.”).  And they stop reading the event log because there’s never anything actionable in the logs.

    There’s a pretty important lesson here: Nobody ever bothers reading event logs because there’s simply too much noise in the logs. So think really hard about when you want to write an event to the event log.  Is the information in the log really worth generating?  Is there important information that a customer will want in those log entries?

    Unless you have a way of uploading troublesome logs to be analyzed later (and I know that several enterprise management solutions do have such mechanisms), it’s not clear that there’s any value to generating log entries.

  • Larry Osterman's WebLog

    Does Visual Studio make you stupid?

    • 43 Comments

    I know everyone's talking about this, but it IS a good question...

    Charles Petzold recently gave this speech to the NYC .NET users group.

    I've got to say, having seen Daniel's experiences with Visual Basic, I can certainly see where Charles is coming from.  Due partly to the ease of use of VB, and (honestly) a lack of desire to dig deeper into the subject, Daniel's really quite ignorant of how these "computer" thingies work.  He can use them just fine, but he has no understanding of what's happening.

    More importantly, he doesn't understand how to string functions/procedures together to build a coherent whole - if it can't be implemented with a button or image, it doesn't exist...

     

    Anyway, what do you think?

     

  • Larry Osterman's WebLog

    What's the big deal with the Moore's law post?

    • 19 Comments
    In yesterday's article, Jeff made the following comment:

    I don't quite get the argument. If my applications can't run on current hardware, I'm dead in the water. I can't wait for the next CPU.

    The thing is that that's the way people have worked for the past 20 years.  A little story goes a long way of describing how the mentality works.

    During the NT 3.1 ship party, a bunch of us were standing around Dave Cutler, while he was expounding on something (aside: Have you ever noticed this phenomenon?  Where everybody at a party clusters around the bigwig?  Sycophancy at its finest).  The topic on hand at this time (1993) was Windows NT's memory footprint.

    When we shipped Windows NT, the minimum memory requirement for the system was 8M, the recommended was 12M, and it really shined at somewhere between 16M and 32M of memory.

    The thing was that Windows 3.1 and OS/2 2.0 both were targeted at machines with between 2M and 4M of RAM.  We were discussing why NT4 was so big.

    Cutlers response was something like "It doesn't matter that NT uses 16M of RAM - computer manufacturers will simply start selling more RAM, which will put pressure on the chip manufacturers to drive their RAM prices down, which will make this all moot". And the thing is, he was right - within 18 months of NT 3.1's shipping, memory prices had dropped to the point where it was quite reasonable for machines to come out with 32M and more RAM. Of course, the fact that we put NT on a severe diet for NT 3.5 didn't hurt (NT 3.5 was almost entirely about performance enhancements).

    It's not been uncommon for application vendors to ship applications that only ran well on cutting edge machines with the assumption that most of their target customers would be upgrading their machine within the lifetime of the application (3-6 months for games (games are special, since gaming customers tend to have bleeding edge machines since games have always pushed the envelope), 1-2 years for productivity applications, 3-5 years for server applications), and thus it wouldn't matter if their app was slow on current machines.

    It's a bad tactic, IMHO - an application should run well on both the current generation and the previous generation of computers (and so should an OS, btw).  I previously mentioned one tactic that was used (quite effectively) to ensure this - for the development of Windows 3.0, the development team was required to use 386/20's, even though most of the company was using 486s.

    But the point of Herb's article is that this tactic is no longer feasible.  From now on, CPUs won't continue to improve exponentially.  Instead, the CPUs will improve in power by getting more and more parallel (and by having more and more cache, etc).  Hyper-threading will continue to improve, and while the OS will be able to take advantage of this, applications won't unless they're modified.

    Interestingly (and quite coincidentally) enough, it's possible that this performance wall will effect *nix applications more than it will affect Windows applications (and it will especially effect *nix derivatives that don't have a preemptive kernel and fully asynchronous I/O like current versions of Linux do).  Since threading has been built into Windows from day one, most of the high concurrency application space is already multithreaded.  I'm not sure that that's the case for *nix server applications - for example, applications like the UW IMAP daemon (and other daemons that run under inetd) may have quite a bit of difficulty being ported to a multithreaded environment, since they were designed to be single threaded (other IMAP daemons (like Cyrus) don't have this limitation, btw).  Please note that platforms like Apache don't have this restriction since (as far as I know), Apache fully supports threads.

    This posting is provided "AS IS" with no warranties, and confers no rights.

  • Larry Osterman's WebLog

    Little Lost APIs

    • 32 Comments
    When you have an API set as large as the Win32 API set, sometimes APIs get "lost".  Either by forgetfulness, or by the evolution of the hardware platform.

    We've got one such set of APIs here in multimedia-land, they're the "aux" APIs.

    The "aux" APIs (auxGetNumDevs, auxGetDevCaps, auxGetVolume, auxSetVolume, and auxOutMessage) are intended to control the volume of the "aux" port on your audio adapter.

    It's a measure of how little used these are that when I asked around my group what the aux APIs did, the general consensus was "I don't know" (this isn't exactly true, but it's close).  We certainly don't know of any applications that actually uses these APIs.

    And that's not really surprising since the AUX APIs are used to control the volume of either the AUX input jack on your sound card or the output volume from a CDROM drive (if connected via the analog cable).

    What's that you say? Your sound card doesn't have an "AUX" jack?  That's not surprising, I'm not sure that ANY sound card has been manufactured in the past 10 years with an AUX input jack (they typically have a "LINE-IN" jack and a "MIC" jack).  And for at least the past 5 years, hardware manufacturers haven't been connecting the analog CD cable to the sound card (it enables them to save on manufacturing costs).

    Since almost every PC system shipped in the past many years (at least 5) has used digital audio extraction to retrieve the CD audio, the analog cable's simply not needed on most systems (there are some exceptions such as laptop machines, which use the analog connector to save battery life when playing back CD audio).  And even if a sound card were to add an AUX input, the "mixer" APIs provide a more flexable mechanism for managing those APIs anyway.

    So with the "aux" APIs, you have a set of APIs that were designed to support a series of technologies that are at this point essentially obsolete.  And even if your hardware used them, there's an alternate, more reliable set of APIs that provide the same functionality - the mixer APIs.  In fact, if you launch sndvol32.exe (the volume control applet), you can see a bunch of sliders to the right of the volume control - they're labeled things like "wave", "sw synth", "Line in", etc.  If your audio card has an "AUX" line, then you'll see an "Aux" volume control - that's the same control that the auxSetVolume and auxGetVolume API controls.  Similarly, there's likely to be a "CD Player" volume control - that's the volume for the CD-ROM control (and it works for both digital and analog CD audio).  So all the "aux" API functionality is available from the "mixer" APIs, but the mixer version works in more situations.

    But even so, the "aux" APIs still exist in the system in the event that someone might still be calling them...  Even if there's no hardware on the system which would be controlled by these APIs, they still exist.

    These APIs are one of the few examples of APIs where it's actually possible that we might be able to end-of-life the APIs - they'll never be removed from the system, but a time might come in the future where the APIs simply stop working (auxGetNumDevs will return 0 in that case indicating that there are no AUX devices on the system).

    Edit: Clarified mixer and aux API relationship a bit to explain how older systems would continue to work.

  • Larry Osterman's WebLog

    Psychic Perf Analysis, or "RegFlushKey actually DOES flush the registry key"

    • 19 Comments
    One of Raymond's more endearing features is what he calls "Psychic Debugging", it even made his wikipedia entry (wow, he even has a wikipedia entry, complete with picture :))

    There's a variant of Psychic Debugging called "Psychic Perf Analysis".  It works like this:

    I get an IM from one of Ryan, one of the perf guys. 

    Ryan: "Hey Larry, we just found a great perf bug that caused a 3 second slowdown in Windows boot time"

    Me: "Let me guess, they were calling RegFlushKey in a service startup path."

    <long pause>

    Ryan: "Who told you?"

     

    One of the things people don't realize about RegFlushKey is that it actually flushes the data that backs the registry key (doh!).  Well, flushing the data means that you need to write it to disk, and the semantics of RegFlushKey ensure that the data's actually been committed - in other words, the RegFlushKey API is going to block until all the disk writes needed to ensure that the data backing the key is physically on the disk.  This can take hundreds and hundreds of milliseconds.

    In Ryan's case, it was complicated because the service was calling RegFlushKey from a DllMain function (Doh!) which held the loader lock, which meant that all the other services in that process were blocked, and there were other services that depended on those services, and...  You get the picture, it REALLY wasn't pretty.

    The documentation for RegFlushKey explicitly says that "In general, RegFlushKey rarely, if ever, need be used", and it's right.

    Why did I know that this was a problem?  Well, when we first deployed the new audio stack into Vista, we were blocked from RI'ing into winmain because the audio service degraded the boot time of Windows by 3/4 of a second (yes, we measure boot time performance to the millisecond, and changes that degrade the system boot performance aren't allowed in).  When I looked at the perf logs of the boot process, I noticed a significant number of writes occurring during the start of the audiosrv service.  I chased it down further, and realized that the writes correlated almost perfectly with some code that was modifying the registry.  I dug deeper and discovered a call to RegFlushKey that we had mistakenly added.  Removing the call to RegFlushKey fixed the problem.

  • Larry Osterman's WebLog

    Error Code Paradigms

    • 33 Comments

    At some point when I was reading the comments on the "Exceptions as repackaged error codes" post, I had an epiphany (it's reflected in the comments to that thread but I wanted to give it more visibility).

    I'm sure it's just an indication of just how slow my mind is working these days, but I just realized that in all the "error code" vs. "exception" discussions that seem to go on interminably, there are two UNRELATED issues being discussed.

    The first is about error semantics - what information do you hand to the caller about what failed.  The second is about error propogation - how do you report the failure to the caller.

    It's critical for any discussion about error handling to keep these two issues separate, because it's really easy to commingle them.  And when you commingle them, you get confusion.

    Consider the following example classes (cribbed in part from the previous post):

    class Win32WrapperException
    {
        // Returns a handle to the open file.  If an error occurs, it throws an object derived from
        // System.Exception that describes the failure.
        HANDLE OpenException(LPCWSTR FileName)
        {
            HANDLE fileHandle;
            fileHandle = CreateFile(FileName, xxxx);
            if (fileHandle == INVALID_HANDLE_VALUE)
            {
                throw (System.Exception(String.Format("Error opening {0}: {1}", FileName, GetLastError());
            }

        };
        // Returns a handle to the open file.  If an error occurs, it throws the Win32 error code that describes the failure.
        HANDLE OpenError(LPCWSTR FileName)
        {
            HANDLE fileHandle;
            fileHandle = CreateFile(FileName, xxxx);
            if (fileHandle == INVALID_HANDLE_VALUE)
            {
                throw (GetLastError());
            }

        };
    };

    class Win32WrapperError
    {
        // Returns either NULL if the file was successfully opened or an object derived from System.Exception on failure.
        System.Exception OpenException(LPCWSTR FileName, OUT HANDLE *FileHandle)
        {
            *FileHandle = CreateFile(FileName, xxxx);
            if (*FileHandle == INVALID_HANDLE_VALUE)
            {
                return new System.Exception(String.Format("Error opening {0}: {1}", FileName, GetLastError()));
            }
            else
            {
                return NULL;
            }

        };
        // Returns either NO_ERROR if the file was successfully opened or a Win32 error code describing the failure.
        DWORD OpenError(LPCWSTR FileName, OUT HANDLE *FileHandle)
        {
            *FileHandle = CreateFile(FileName, xxxx);
            if (&FileHandle == INVALID_HANDLE_VALUE)
            {
                return GetLastError();
            }
            else
            {
                return NO_ERROR;
            }
        };
    };

    I fleshed out the example from yesterday and broke it into two classes to more clearly show what I'm talking about.  I have two classes that perform the same operation.  Win32WrapperException is an example of a class that solves the "How do I report a failure to the caller" problem by throwing exceptions.  Win32WrapperError is an example that solves the "How do I report a failure to the caller" problem by returning an error code.

    Within each class are two different methods, each of which solves the "What information do I return to the caller" problem - one returns a simple numeric error code, the other returns a structure that describes the error.  I used System.Exception as the error structure, but it could have just as easily been an IErrorInfo class, or any one of a bazillion other ways of reporting errors to callers.

    But looking at these examples, it's not clear which is better.  If you believe that reporting errors by exceptions is better than reporting by error codes, is Win32WrapperException::OpenError better than Win32WrapperError::OpenException?  Why? 

    If you believe that reporting  errors by error codes is better, then is CWin32WrapperError::OpenError better than CWin32WrapperError::OpenException?  Why?

    When you look at the problem in this light (as two unrelated problems), it allows you to look at the "exceptions vs. error codes" debate in a rather different light.  Many (most?) of the arguments that I've read in favor of exceptions as an error propagation mechanism  concentrate on the additional information that the exception carries along with it.  But those arguments ignore the fact that it's totally feasible (and in fact reasonable) to define an error code based system that provides the caller with exactly the same level of information that is provided by exception.

    These two problems are equally important when dealing with errors.  The mechanism for error propagation has critical ramifications for all aspects of engineering - choosing one form of error propagation over another can literally alter the fundamental design of a system.

    And the error semantic mechanism provides critical information for diagnosability - both for developers and for customers.  Everyone HATES seeing a message box with nothing but "Access Denied" and no additional context.

     

    And yes, before people complain, I recognize that none of the common error code returning APIs today provide the same quality of error semantics that System.Exception does as first class information - the error return information is normally hidden in a relatively unsophisticated scalar value.  I'm just saying that if you're going to enter into a discussion of error codes vs. exceptions, from a philosophical point of view, then you need to recognize that there are two related problems that are being discussed, and differentiate between these two. 

    In other words, are you advocating exceptions over error codes because you like how they solve the "what information do I return to the caller?" problem, or are you advocating them because you like how they solve the "how do I report errors?" problem?

    Similarly, are you denigrating exceptions because you don't like their solution to the "how do I report errors?" problem and ignoring the "what information do I return to the caller?" problem?

    Just some food for thought.

  • Larry Osterman's WebLog

    Ok, what the heck IS the windows audio service (audiosrv) anyway?

    • 12 Comments

    This morning, Dmitry asked what the heck was the audio service for anyway.

    That's actually a really good question.

    For Windows XP, the most common use for the audiosrv service is that if the audiosrv service didn't exist, applications that linked with winmm.dll would also get setupapi.dll in their address space.  This is a bad thing, since setupapi is relatively large, and for 99% of the apps that use winmm.dll (usually to call PlaySound), they don't need it until they actually start playing sounds (which is often never). 

    As a part of this, audiosrv monitors for plug and play notifications (again, so the app doesn't have to) and allows the application to respond to plug and play changes without having to burn a thread (and a window pump) just to detect when the user plugs in their USB speakers.  All that work's done in audiosrv.

    There's a bunch of other stuff, related to global audio digital signal processing that audiosrv manages, and some stuff to manage user audio preferences, but offloading the PnP functionality is the "big one".  Before Windows XP, this functionality was actually a part of csrss.exe (the windows client/server runtime subsystem), but in Windows XP it was broken out into its own service.

    For Longhorn, Audiosrv will be doing a lot more, but unfortunately, I can't talk about that :(  Sorry. 

    I really do want to be able to talk about the stuff we're doing, but unfortunately none of it's been announced yet, and since none of its been announced yet...

    Edit: Corrected title.  Also added a little more about longhorn.

  • Larry Osterman's WebLog

    WMDG loses one of its own.

    • 18 Comments
    So often, you don't hear about the developers who work behind the curtains here at Microsoft.  Today I'd like to talk a bit about one of them.

    One of the key developers on Windows Multimedia at Microsoft is Syon Bhattacharya.  Syon was responsible for many of the internal pieces of the multimedia work on windows, much of the core code was written by him.  If you've ever watched an AVI file, or seen a windows media player visualization, you've been running his code.

    He started at Microsoft in June of 1995, straight out of college, and worked in the multimedia group his entire career at Microsoft.  Coincidentally, he also came from Carnegie-Mellon University (I know a bunch of the other developers from CMU that came at the same time as Syon, but didn't know him until I joined this group).

    Syon was an extraordinary developer, he had an encyclopedic knowledge of the internals of the multimedia code.  When we were doing the code reviews for XP SP2, when I'd see something that I thought was a vulnerability, I'd wander over to Syon's office to ask him.  Syon not only knew the code I was looking at, but he was able to reconstruct (from his head) all of the code paths in which the potentially vulnerable routine was called.  He's the person that the multimedia team went to when they had tough problems - there didn't seem to be a problem that he couldn't solve.

    In addition to being a complete technical wizard, Syon was one of the nicest persons I've ever worked with, his unflagging good humor throughout development cycles was legend around this group.

    Two years ago, Syon was diagnosed with stomach cancer.

    He continued to work, although it was clear that the treatments were taking their toll on him - I often saw him walking down the hall looking horrible, I'm sure that the treatments were hideously uncomfortable, but he pressed on.

    Over the summer, Syon took a leave of absence to concentrate his energies on fighting the cancer that was eating away at him.

    Unfortunately, yesterday he lost that battle, he passed away at a hospice in Seattle.  His family and friends were with him at the end, and it was apparently very peaceful.  He was 30 years old.

    We will all miss him, the world is a smaller place without him.

    Edit: I'll be adding recollections to this post as they come in...

    I've asked my group (and others) to collect their memories of Syon, here's what they wrote (in no particular order):

    Ji Ma:

    I know Syon since I was transferred to DirectShow group almost 6 years ago.  He was hard working soul and low profile person and easy to talk to.  He has amazing ability to solve very tough computer problem and was always willing go extra miles to help others and he knows so much and so deep about computer and programming that he can solve almost anything that no other people can.  I found him to be indispensable dependent technical resources and ideas.  When ever I have difficulty or lack of idea, I will look up to him for help and he always let me a hand.

    He was easy going and willing to listen and we had very good work relation since many of his developed work was tested by me so we interact a lot and we were truly perfect match to each other and great team.  Many times, I just went to his office and we chat about many things in life and he was always willing to listen and provide valuable comments and genuinely enjoy and appreciate the conversation.  It is very rare in the working environment.  I will always remember him for that.  

    I have several fruit trees and some vegetables grown in our garden.  When it was harvesting time, I brought some to our group and passed to many colleges, including him.  He always admires and grateful to what he get, an apple or a tomato or what ever.  I can tell he true enjoy and appreciate the friendship that we had. 

    He talks very little about his personal life so it is mystery to me and I only know he lives around green lake area. 

    Syon, may you rest in peace and we will always remember you.

    Tracy Shew:

    When I first came to Microsoft as a contractor five years ago, it was sometimes difficult and daunting to work with developers.  These were, after all, the people who had written the code for Windows.  Many of them sometimes acted as if they were aware of this fact, and of the distinction between their station and mine – a mere software tester.  The tester – developer relationship can be antagonistic at times, particularly if I had the gall to find a bug or regression in “their” code.  Sometimes, some developers had little time for my questions, and acted as if my concerns were unimportant.  This was discouraging for me, and made me question why I was working at Microsoft at times.

    Syon, more than anyone else, gave me encouragement to continue.  He was a developer, and he was brilliant, but he never – and I mean never – acted as if my concerns were unimportant.  His door was always open, and he always seemed to have the answer ready – or if not, he knew the person to go to.  And he never made me feel ignorant or inferior to him for having to answer a question.  I quickly learned that Syon was a valuable resource, a wealth of information.  But it was much more than that.  Syon taught me, through his example, that I was not a “mere tester” – that I was making an equally valuable contribution to the product.  This encouraged me to continue at Microsoft, eventually becoming a full-time employee in test.

    I had the pleasure to work closely with Syon for almost four years, being the main tester responsible for checking his code.  Syon’s skill was unquestionable; problems were very rare, and, if one was encountered, Syon was extraordinary at quickly locating the difficulty – even if it was outside his area.  I do not know the number of times he has trudged over to the lab to look at one of our machines.  “Why is it doing that?” we would ask, looking at a bazaar error message or a garbled, incomprehensible stack trace.  I sometimes felt that we took advantage of his openness and generosity – not many developers will “dirty their feet” by coming into the lab to look at a sick computer, unless you can first prove it is their code at fault – they would rather have a remote, at the very least, or have you port the bug off to the “owner” – something which is sometimes difficult to determine.  I tried to use Syon as an “avenue of last resort,” lest we overuse the resource – if we absolutely couldn’t determine the issue, and know one else knew what was happening, only then would we bring in Syon.  And, in four years of steady work, day to day, I can count on one hand the number of times we managed to stump him.  And never, not on a single occasion, did Syon refuse help because he was too busy, or because it was not his area, or for any reason at all for that matter.

    Since Syon’s illness took him away from work, there hasn’t been a week go by that this resource hasn’t been missed.  Very frequently, an issue will come up, and someone will say, “If Syon were here, we could figure this out.”  His combination of knowledge, intuition towards problems, and plain generosity in sharing what he knew is unequalled.  People often use the word “irreplaceable” when they lose a colleague, but for us there is no degree of exaggeration in applying it.

    For me, though, Syon was more than a resource.  He demonstrated to me the value that I was contributing to Microsoft, and a vision of the partnership that should exist between development and test, and between teams, where “ownership” should not be used either as a dividing line to avoid issues, nor as a way of assigning responsibility or blame.  Syon simply loved making the best code he could, and he loved solving problems, so he saw all of our contributions, whether development or test, assisting in this process.  He encouraged everyone around him to do their best, and to be excellent.  I wished I could have known him better – losing him is a tremendous blow, certainly professionally, but also personally.  Even though we had a professional rather than social relationship – you would have to call us colleagues rather than friends – I am grateful to him for many different things, and especially for the encouragement he gave.

    Eric Rudolph:

    Syon always was a team player, and he ended up being the backbone of the DirectShow product at Microsoft. After many other people had been reorganized, or had moved on, Syon stuck with DirectShow and not only supported it, but he also supported it's customers and all the accompanying hassles. Not only did he do this really well, but he did it with a gracefulness and humbleness that made it seem easy. Syon knew everything about everything, he was the go-to guy when it came to something that nobody else knew. I don't know a single person at Microsoft (myself included) who wouldn't use that kind of responsibility as a bargaining chip to further their career, but not Syon. When I asked him, "why don't you try and promote yourself more?" He would say, "oohhhh, I guess I'm just lazy." But Syon was anything _but_ lazy. Maybe unmotivated for self gain, but that was one of the things that was cool about him. On a personal note, Syon wasn't easy to get too close to, but I'm proud to count myself among his friends, and he was always up for doing anything. He was my personal movie critic, if I wanted to know if a movie was good, Syon was the first person I would ask. He was an amazing guy, and the effects of who he was and what he did to help people, will ripple outwards forever. I respect him immensely, he taught me many things while I had the chance to work with him.

    Martin Puryear (Dev Manager for WMDG): 

    Syon was one of those rare selfless people that willingly took on any task without a complaint, regardless of the task.  Sometimes the most important tasks are the most tedious as well - ensuring that myriad far-flung fixes were ported back and forth between different OSes; painstakingly crawling through very old Windows source code looking for security vulnerabilities.  I'm fairly certain that I never heard him ever utter a complaint - if he did, then I'm sure it was accompanied with a smile that seemed to say "well, these things happen." 

     Syon was a sterling example of the phrase "still waters run deep."  Over the years he built up considerable expertise in the multimedia arena, but you might not know it from watching his actions.  He always made time to help others, answering even the most basic questions.  Upon asking, one quickly discovered that he understood the overall system and how your question related - and he usually knew the technical details that you needed as well.  After RobinSp himself (overall architect for quartz/ActiveMovie/DirectShow), SyonB was the one to which we repeatedly went with hard problems facing that architecture. 

     Syon didn't have the "rough edges" sometimes found in SW engineers (including the stereotypical MS developer).  If you were wrong, he would couch his words with a soft-spoken "I believe the way it works is…."  He didn't have an egotistical bone in his body - in fact it was understood among managers that we needed to make sure that he got the recognition he deserved. 

     Syon was a class act - in this day and age, the industry needs more like him.  Truly, the world needs more like him.  He will be sorely missed as a coworker and a friend. 

    Steve Rowe:

    When I think of Syon I three adjectives come to mind:  quiet, helpful, and intelligent.  Syon was always soft spoken.  I never saw him get angry or snap at anyone.  He was always calm and collected.  Unlike some people who are good at what they do, he didn’t need to prove it.  He didn’t need the limelight.  Syon was always willing to help.  I never asked him a question he didn’t know the answer to and no matter how busy I’m sure he was, he always took the time to answer my questions.  All you had to do was ask and whatever small feature or tool or tweak you needed would be added.  I recall one time I stopped by to ask him to mock up a fix for a particular issue.  We didn’t need the full implementation, just a simple version to prove it would work.  The next day I had the complete version on my desk.  Syon was extremely intelligent.  He knew the system forwards and backwards.  He rarely had to consult the code, he just knew the answer.  We’ll miss his expertise around here but more importantly, we’ll miss him as a person.  It is rare someone so kind, so willing to help, and so smart comes along.

    Tuan Le:

    Hi Larry, please post another one from me.

     

    At work, Syon is simply brilliant. Syon will take on any task, big or small, challenging or tedious, with the same level of enthusiastic (in his own quiet, pleasant demeanor), and always come through with amazing execution. It is obvious that Syon takes pride in what he works on and set a very high bar for himself. As a person, Syon is a confidence, generous, patient, gentle, and thoughtful person. Syon is simple wonderful to have as a co-worker, and a friend.

     

    Syon is someone I instinctively trust and often share thought on things with. My kids love him! Syon always has things or toys to entertain them when ever they stop by his office. It’s hard for us to accept the fact that Syon has moved on, we often talk about Syon as if he is still with us. Of the many things that Syon enjoys, food and speed are high on his list. We talked often about different cuisine / food blog / car racing / driving school / traveling / etc, and we would go out a try a new restaurant whenever we get a chance. Syon enjoys trying and doing new things, he is always eager to join and share with us. We are very fortunate to have Syon in our lives, and he will miss all the good time we have with him.

    Savvy Dani:

    My first encounter with Syon’s hard-core technical skills was soon after I joined the group. There were some 20 odd high-priority non-trivial bugs that needed immediate attention on a Friday afternoon. I didn’t know the team well enough, but there were many strong voices saying ‘Give it Syon’ and I decided to play along. I understood why when I came back on Monday and all issues were resolved. When I tried to praise him, he just shrugged it off with a gentle, self-deprecating smile. I became a Syon fan after that. Time only added good things to my list - extremely smart, dedicated, gentle, compassionate, unruffled, good sense of humor and on and on. I don’t think anybody ever found anything negative in their interactions with him unless he was too good to be real.

    But Syon was real enough when I got to know him better, What stands out for me during the two years I have worked with Syon are my 1:1s with him. I usually started my Fridays with his meeting. Since he was very quiet, we could not go beyond 15-20 minutes initially and that with me doing most of the talking. Since technical issues were a no-brainer for him, our meetings dwindled into silence soon. I told him frankly that we have got to do better, so we came up with this idea to talk about personal things and get to know each other in non-work related ways as well. Syon accepted this gamely and we went on for a year or more. There was a lot of laughing and a good number of discussions during this time. We talked about his love for car racing and taking his Audi for a spin on the safe track (?). We would catch up on the latest movies, good restaurants, his unsuccessful experiments with Indian recipes, my fluctuating aspirations to be a literary fiction writer etc. I suddenly realized this summer that our meetings had gotten longer and that he was doing most of the talking. We would go past the slotted hour and then walk down to lunch. When we exchanged hugs as he went on leave, I knew I was going to miss my friend.

    Syon is not a typical Indian name and I asked him about it once. I believe there are two stories behind his name. a) He was named after Sayanacharya, a great Indian philosopher who lived in the 14th century A.D whose commentaries apparently defined the speed of light to be pretty close to the numbers we have today. b) He was born in London close to Syon park and his parents shortened his name to Sayana and then morphed it to Syon. ‘Acharya’ literally means Master, so Syon definitely lived up to his name.

    Bertrand Lee:

    Syon was from my ECE '95 cohort at CMU, and I remember seeing his name in the CMU newsgroups when he participated in various technical discussions.

     

    However, I only got to know him a bit better when I worked with him in WMDG, and he struck me as one of the most knowledgeable engineers I have ever had the privilege to work with. As many would attest to, he was _the_ DirectShow guru, and any time I had some intractable DirectShow bug that I was making no headway into, I would consult Syon and he would very willingly come over and help me to debug the cause of the problem, which due to his deep expertise took hardly any time at all, even for the most complex problems.

     

    More importantly however, he was one of the most gentle-natured and helpful people I've ever known, and I will always remember and miss him as a great person, coworker and friend.

     Steve Ball:

    Hey Larry -
    Although I barely knew Syon, and only had a very small set of direct interactions with him around DShow, I do have a few small observations from my experience in working with and near him over this past three years. 
     
    Syon was a like Zen master.
     
    While I run around like a headless chicken most of the time, being with Syon in a meeting or even simply passing him in the hall was always like being in the presence of a great Master.  His pace, his interactions, his movements were always intentional, methodical, calming, even charming.   He set an example just in his being who and how he was: collected, positive, responsive, and ready to embrace and solve even the toughest problems. 
     
    Just being near him was a calming.  His presence, sincere smile, and the peaceful look in his eyes often felt to me like a gift and provided a simple and wonderful reminder to slow down, collect myself, and be thankful for the amazing resources and opportunities we have at our fingertips everyday.
     
    His very presence was a gift, and his absence touches me deeply.
     
    With best wishes to his closest friends and family,
    -Steve
     
    A Co-Worker:

    I think you're experiencing what a lot of other people have: Syon was such a quiet, unassuming guy who didn't really like to talk about himself much that it wasn't easy to get to know him; he would probably have been embarassed by all this attention.  But everyone who came into contact with him remembers him as the kind, helpful, thoughtful person that he was.  As news of his passing has spread, we've been amazed at the number of people who've come forward with stories about Syon.  Some didn't even know he had been so sick - it just wasn't in his nature to talk about himself.  He died the way he lived: peacefully, with his quiet, inner strength shining through.

    Alex Wetmore (friend from college):

    Syon was always really quiet.  On our last visit together we were trying to remember how we met, but I'm not really sure.  From my freshman to junior years at CMU he spent a lot of time hanging out at my dorm room (my 4th year, his last year, I moved off campus and it wasn't as easy to do so). 
     
    Recollections are hard.  He was so quiet, but with a great sense of humor.  He never wanted to be a burden on anyone.  In his freshman year he had a collapsed lung and didn't even tell anyway -- I saw him everyday and never learned about it until I didn't see him for two days and his roommate found out where he was.  He loved food which I think made the stomach cancer even harder.  He and my wife Christine used to go hunting around Seattle for the best fried chicken, hamburgers, or other comfort/junk food.  He also loved really good food and knew all of the best resturants in town (but was quiet about it...not the normal belltown foodie type).  I think he was social at heart and liked to be around people, but had a hard time opening up.
     
    He was at CMU from 91 to 95, EE/CE

    Alok Chakrabarti:

    The most I remember about Syon:
    1.  Whenever I had a stress issue to debug involving a whole bunch threads and random locks taken by components such as DDraw.  I would pull my hair out for a while, narrow it down a bit, and then get totally stuck.  The next step was to walk over to Syon office, ask him to connect the remote (mostly wdeb in those days of Win9x) and go through all those threads and finally figuring out what the problem was, and what to do about it -- mostly assign the bug to someone appropriate.  It became a common thing almost everyday, and I am sure he had enough work to do, but he never stopped taking the time to help out.
     
    2.  His calmness and that smile: I have never seen Syon getting upset with anything.  He was so calm always.  And that slice of smile he had on his face -- I still remember so vividly.
     
    3.  His typing speed: That was unbelievable!!! I still can't really type after working with PCs for about 19 years.  But his typing was just out of the world.  And he would thinking even faster that his typing.
     
    I always will remember him as a person I wished I could be somewhat like, but knew I didn't even have a chance.  He was much younger than me, but still my hero, not to say just today -- I always thought that way.  Such a brilliant but unassuming person, so helpful and nice.  Syon has been so unique.
    Wendy Liu:

    When I first joined MS, Syon was recommended highly on the list as the goto guy for any technical question. He is one of the gurus on DSHOW.

     Syon was not the person who liked to talk too much at work. He always had a cool style and you never saw him rush in the hallway. When you chatted with him, he always spoke warmly, slowly and carried a smile; when you asked him for help, you never expected him to say no. From time to time, he brought fresh bagels and cream to put outside his office to share with us.

    For the last four more years, I have got many helps from Syon. I still clearly remember that I once asked him to take a look at bug which I had been working for the whole day. As usual, he didn’t talk much, and sitting in front of the machine and his fingers moving quickly on the key board, his mind was completely on the bug. He tried various ways to poke it, and more than half an hour passed, we still didn’t get any clue. I began to apologize for the time and asked him to stop there. Still in deep thinking, he kept working. Then he said “Let’s try this.” Bingo! We found the problem.

    Syon was one of the few people I have known who never showed any impatience or frustration. Even when he was telling me that his family doctor had mistaken his symptoms for years, he just sounded unhappy about it and there was no anger in his tone like what I felt at that moment. That was the only time I heard he complained, and it was in such a cool way.

    It is a great loss for all of us to miss such a good colleague. We will always member him.

    Wenhong Liu

    Brent Mills:

    While Syon and I were not close buddies, I always felt comfort in speaking with him, and he always made me curious enough to ask what he’s been up to and how he was doing (before he was sick).  I have not met a more genuinely nice man…Ever! 

    After he left MS, I exchanged a few mails with him and he seemed positive as usual, but I couldn’t help thinking that bad news may be on the horizon….I don’t shed tears easily or often, but I remembered thinking to myself that any person and especially not one as good as he, should be going through something like this; the tears flowed.

    I have and will continue to miss Syon and I hope he is in a better place.     

     Ted Youmans:

    Six years (I think) and I have no anecdotes or stories. What is so surprising about this is that I liked Syon quite a bit. He was one of the nicest and most intelligent people I have had the pleasure of working with here as MS. When I actually came up against something in DShow that I couldn’t find an answer to, he was usually the only one that could answer it. I truly wish I had something to offer for your LJ or for the memory book, but I am coming up with a complete blank. Maybe it’s because I don’t take enough notice of day to day happenings or maybe it’s because the extraordinary was an every day occurrence for Syon and none of it sticks out any more than any other day. What I can say is that he will be sorely missed and this place hasn’t really been the same since he left.

    Penelope Broomer:

    Other than building checkins for Syon during Win2K, he was the point person for the multimedia team, I never got to work with him directly, I therefore consider myself to have been one of Syon’s friends rather than a colleague.  Syon came to our home two or three times, we love our curries and he was very polite about the home made curry we inflicted upon him during his last visit!

     

    Like many, I have fond memories of Syon, one that springs to mind is the time that he rescued Stephen (stestrop) from the car park at Barnes & Noble in Bellevue.  I was working in the build lab at the time and it was my turn to be on the ‘late shift’, Stephen, facing another night in on his own, went off to Barnes & Nobel to pass some time.  He must have had a lot on his mind as it wasn’t until he was in the store browsing through the computer books that he realized that he didn’t have his car keys.  Concerned, as he thought he’d locked them in the car, he returned to the vehicle only to discover that he’d not only left them in the car but that the car was still running!  He called me in the build lab in a state of panic asking me to go and rescue him – this was as we were coming up to shipping Win2K - it was late in the evening, I was on my own and I couldn’t leave the build lab.  Several calls later Stephen asked me to try calling Syon in his office, Syon was still at work and without hesitation agreed to go to Stephen’s rescue, that’s just the kind of person he was.

    Soccer Liu:

    I remember him as an soft-spoken and sharp thinking gentleman. I worked with him on only on several instances. I had a couple of conversations with him. I really miss him.

    Robin Speed (and Eric Rudolph):

    I guess this old email from Eric sums up Syon rather nicely work-wise..

     He was also a really nice guy – sounds bland but in this case it is true.  He never pushed himself forward – almost to a frustrating level - but always had time for everyone.  People all over knew and respected him.  Someone the word humble truly applied to.  What an unfair world.

     Robin

     _____________________________________________
    From: Eric Rudolph
    Sent: Tuesday, May 04, 1999 8:53 PM
    To: Robin Speed
    Subject: Syon B, Master Brain

     Whatever we're paying Syon, it's not enough. He always knows exactly how to fix any weird compiler, linker, or base class problems I'm having. The man's a genious.

     

  • Larry Osterman's WebLog

    What does style look like, part 7

    • 37 Comments
    Over the course of this series on style, I've touched on a lot of different aspects, today I want to discuss aspects C and C++ style specifically.

    One of the things about computer languages in general is that there are often a huge number of options available to programmers to perform a particular task.

    And whenever there's a choice to be made while writing programs, style enters into the picture.  As does religion - whenever the language allows for ambiguity, people tend to get pretty religious about their personal preferences.

    For a trivial example, consider the act of incrementing a variable.  C provides three different forms that can be used to increment a variable. 

    There's:

    • i++,
    • ++i,
    • i+=1, and
    • i=i+1.

    These are all semantically identical, the code generated by the compiler should be the same, regardless of which you chose as a developer (this wasn't always the case, btw - the reason that i++ exists as a language construct in the first place is that the original C compiler wasn't smart enough to take advantage of the PDP-8's built-in increment instruction, and i++ allowed a programmer to write code that used it).

    The very first time I posted a code sample, I used my personal style, of i+=1 and got howls of agony from my readers.  They wanted to know why on EARTH I would use such a stupid construct when i++ would suffice.  Well, it's a style issue :)

    There are literally hundreds of these language specific style issues.  For instance, the syntax of an if statement (or a for statement) is:

    if (conditional) statement

    where statement could be either a single line statement or a compound statement.  This means that it's totally legal to write:

    if (i < 10)
        i = 0;

    And it's totally legal to write

    if (i < 10)
    {
        i = 0;
    }

    The statements are utterly identical from a semantic point of view.  Which of the two forms you choose is a style issue.  Now, in this case, there IS a fairly strong technical reason to choose the second form over the first - by putting the braces in always, you reduce the likelihood that a future maintainer of the code will screw up and add a second line to the statement.  It also spaces out the code (which is a good thing IMHO :) (there's that personal style coming back in again)).

    Other aspects of coding that ultimately devolve to style choices are:

    if (i == 10)

    vs

    if (10 == i)

    In this case, the second form is often used to prevent the assignment within an if statement problem - it's very easy to write:

    if (i = 10)

    which is unlikely to be what the developer intended.  Again, this is a style issue - by putting the constant on the left of the expression, you cause the compiler to generate an error when you make this programming error.  Of course, the compiler has a warning, C4706, to catch exactly this situation, so...

    Another common stylistic convention that's often found is:

    do {
        < some stuff >
    } while (false);

    This one exists to allow the programmer to avoid using the dreaded "goto" statement.  By putting "some stuff" inside the while loop, it enables the use of the break statement to exit the "loop". Personally, I find this rather unpleasant, a loop should be a control construct, not syntactic sugar to avoid language constructs.

    Speaking of goto...

    This is another language construct that people either love or hate.  In many ways, Edsger was totally right about goto - it is entirely possible to utterly misuse goto. On the other hand, goto can be a boon for improving code clarity.  

    Consider the following code:

    HRESULT MyFunction()
    {
        HRESULT hr;

        hr = myCOMObject->Method1();
        if (hr == S_OK)
        {
            hr = myCOMObject->Method2();
            if (hr == S_OK)
            {
                hr = myCOMObject->Method3();
                if (hr == S_OK)
                {
                    hr = myCOMObject->Method4();
                }
                else
                {
                    hr = myCOMObject->Method5();
                }
            }
        }
        return hr;
    }

    In this really trivial example, it's vaguely clear what's going on, but it suffices.  One common change is to move the check for hr outside and repeatedly check it for each of the statements, something like:

        hr = myCOMObject->Method1();
        if (hr == S_OK)
        {
            hr = myCOMObject->Method2();
        }
        if (hr == S_OK)
     

    What happens when you try that alternative implementation?

    HRESULT MyFunction()
    {
        HRESULT hr;

        hr = myCOMObject->Method1();
        if (hr == S_OK)
        {
            hr = myCOMObject->Method2();
        }
        if (hr == S_OK)
        {
            hr = myCOMObject->Method3();
            if (hr == S_OK)
            {
                hr = myCOMObject->Method4();
            }
            else
            {
                hr = myCOMObject->Method5();
            }
        }
        return hr;
    }

    Hmm.  That's not as nice - some of it's been cleaned up, but the Method4/Method5 check still requires that you indent an extra level.

    Now consider what happens if you can use gotos:

    HRESULT MyFunction()
    {
        HRESULT hr;

        hr = myCOMObject->Method1();
        if (hr != S_OK)
        {
            goto Error;
        }
        hr = myCOMObject->Method2();
        if (hr != S_OK)
        {
            goto Error;
        }
        hr = myCOMObject->Method3();
        if (hr == S_OK)
        {
            hr = myCOMObject->Method4();
        }
        else
        {
            hr = myCOMObject->Method5();
        }
        if (hr != S_OK)
        {
            goto Error;
        }
    Cleanup:
        return hr;
    Error:
        goto Cleanup;
    }

    If you don't like seeing all those gotos, you can use a macro to hide them:

    #define IF_FAILED_JUMP(hr, tag) if ((hr) != S_OK) goto tag
    HRESULT MyFunction()
    {
        HRESULT hr;

        hr = myCOMObject->Method1();
        IF_FAILED_JUMP(hr, Error);

        hr = myCOMObject->Method2();
        IF_FAILED_JUMP(hr, Error);

        hr = myCOMObject->Method3();
        if (hr == S_OK)
        {
            hr = myCOMObject->Method4();
            IF_FAILED_JUMP(hr, Error);
        }
        else
        {
            hr = myCOMObject->Method5();
            IF_FAILED_JUMP(hr, Error);
        }

    Cleanup:
        return hr;
    Error:
        goto Cleanup;
    }

    Again, there are no right answers or wrong answers, just choices.

    Tomorrow, wrapping it all up.

  • Larry Osterman's WebLog

    Life in a faraday cage

    • 32 Comments

    There was an internal discussion about an unrelated topic recently, and it reminded me of an early experience in my career at Microsoft.

    When I started, my 2nd computer was a pre-production PC/AT (the first was an XT). The AT had been announced by IBM about a week before I started, so our pre-production units were allowed to be given to other MS employees (since I had to write the disk drivers for that machine, it made sense for me to own one of them).

    Before I got the machine, however, it was kept in a room that we semi-affectionately called "the fishtank" (it was the room where we kept the Salmons (the code name for the PC/AT)).

    IBM insisted that we keep all the pre-production computers we received from them in this room - why?

    Two reasons.  The first was that there was a separate lock on the door that would limit access to the room.

    The other reason was that IBM had insisted that we build a faraday cage around the room.  They were concerned that some n'er-do-well would use the RF emissions from the computer (and monitor) to read the contents of the screen and RAM.  I was told that they had technology that would allow them to read the contents of an individual screen from across the street, and they were worried about others being able to do the same thing.

    Someone at work passed this link along to a research paper by Wim van Eyk that discusses the technical details behind the technology.

     

  • Larry Osterman's WebLog

    More proof that crypto should be left to the experts

    • 41 Comments

    Apparently two years ago, someone ran a static analysis tool named "Valgrind" against the source code to OpenSSL in the Debian Linux distribution.  The Valgrind tool reported an issue with the OpenSSL package distributed by Debian, so the Debian team decided that they needed to fix this "security bug".

     

    Unfortunately, the solution they chose to implement apparently removed all entropy from the OpenSSL random number generator.  As the OpenSSL team comments "Had Debian [contributed the patches to the package maintainers], we (the OpenSSL Team) would have fallen about laughing, and once we had got our breath back, told them what a terrible idea this was."

     

    And it IS a terrible idea.  It means that for the past two years, all crypto done on Debian Linux distributions (and Debian derivatives like Ubuntu) has been done with a weak random number generator.  While this might seem to be geeky and esoteric, it's not.  It means that every cryptographic key that has been generated on a Debian or Ubuntu distribution needs to be recycled (after you pick up the fix).  If you don't, any data that was encrypted with the weak RNG can be easily decrypted.

     

    Bruce Schneier has long said that cryptography is too important to be left to amateurs (I'm not sure of the exact quote, so I'm using a paraphrase).  That applies to all aspects of cryptography (including random number generators) - even tiny changes to algorithms can have profound effects on the security of the algorithm.   He's right - it's just too easy to get this stuff wrong.

     

    The good news is that there IS a fix for the problem, users of Debian or Ubuntu should read the advisory and take whatever actions are necessary to protect their data.

  • Larry Osterman's WebLog

    It's the platform, Silly!

    • 69 Comments

    I’ve been mulling writing this one for a while, and I ran into the comment below the other day which inspired me to go further, so here goes.

    Back in May, Jim Gosling was interviewed by Asia Computer Weekly.  In the interview, he commented:

    One of the biggest problems in the Linux world is there is no such thing as Linux. There are like 300 different releases of Linux out there. They are all close but they are not the same. In particular, they are not close enough that if you are a software developer, you can develop one that can run on the others.

    He’s completely right, IMHO.  Just like the IBM PC’s documented architecture meant that people could create PC’s that were perfect hardware clones of IBM’s PCs (thus ensuring that the hardware was the same across PCs), Microsoft’s platform stability means that you could write for one platform and trust that it works on every machine running on that platform.

    There are huge numbers of people who’ve forgotten what the early days of the computer industry were like.  When I started working, most software was custom, or was tied to a piece of hardware.  My mother worked as the executive director for the American Association of Physicists in Medicine.  When she started working there (in the early 1980’s), most of the word processing was done on old Wang word processors.  These were dedicated machines that did one thing – they ran a custom word processing application that Wang wrote to go with the machine.  If you wanted to computerize the records of your business, you had two choices: You could buy a minicomputer and pay a programmer several thousand dollars to come up with a solution that exactly met your business needs.  Or you could buy a pre-packaged solution for that minicomputer.  That solution would also cost several thousand dollars, but it wouldn’t necessarily meet your needs.

    A large portion of the reason that these solutions were so expensive is that the hardware cost was so high.  The general purpose computers that were available cost tens or hundreds of thousands of dollars and required expensive facilities to manage.  So there weren’t many of them, which means that companies like Unilogic (makers of the Scribe word processing software, written by Brian Reid) charged hundreds of thousands of dollars for installations and tightly managed their code – you bought a license for the software that lasted only a year or so, after which you had to renew it (it was particularly ugly when Scribe’s license ran out (it happened at CMU once by accident) – the program would delete itself off the hard disk).

    PC’s started coming out in the late 1970’s, but there weren’t that many commercial software packages available for them.  One problems developers encountered was that the machines had limited resources, but beyond that, software developers had to write for a specific platform – the hardware was different for all of these machines, as was the operating system and introducing a new platform linearly increases the amount of testing required.  If it takes two testers to test for one platform, it’ll take four testers to test two platforms, six testers to test three platforms, etc (this isn’t totally accurate, there are economies of scale, but in general the principal applies – the more platforms you support, the higher your test resources required).

    There WERE successful business solutions for the early PCs, Visicalc first came out for the Apple ][, for example.  But they were few and far between, and were limited to a single hardware platform (again, because the test and development costs of writing to multiple platforms are prohibitive).

    Then the IBM PC came out, with a documented hardware design (it wasn’t really open like “open source”, since only IBM contributed to the design process, but it was fully documented).  And with the IBM PC came a standard OS platform, MS-DOS (actually IBM offered three or four different operating systems, including CP/M and the UCSD P-system but MS-DOS was the one that took off).  In fact, Visicalc was one of the first applications ported to MS-DOS btw, it was ported to DOS 2.0. But it wasn’t until 1983ish, with the introduction of Lotus 1-2-3, that PC was seen as a business tool and people flocked to it. 

    But the platform still wasn’t completely stable.  The problem was that while MS-DOS did a great job of virtualizing the system storage (with the FAT filesystem)  keyboard and memory, it did a lousy job of providing access to the screen and printers.  The only built-in support for the screen was a simple teletype-like console output mechanism.  The only way to get color output or the ability to position text on the screen was to load a replacement console driver, ANSI.SYS.

    Obviously, most ISVs (like Lotus) weren’t willing to deal with this performance issue, so they started writing directly to the video hardware.  On the original IBM PC, that wasn’t that big a deal – there were two choices, CGA or MDA (Color Graphics Adapter and Monochrome Display Adapter).  Two choices, two code paths to test.  So the test cost was manageable for most ISVs.  Of course, the hardware world didn’t stay still.  Hercules came out with their graphics adapter for the IBM monochrome monitor.  Now we have three paths.  Then IBM came out with the EGA and VGA.  Now we have FIVE paths to test.  Most of these were compatible with the basic CGA/MDA, but not all, and they all had different ways of providing their enhancements.  Some had some “unique” hardware features, like the write-only hardware registers on the EGA.

    At the same time as these display adapter improvements were coming, disks were also improving – first 5 ¼ inch floppies, then 10M hard disks, then 20M hard disks, then 30M.  And system memory increased from 16K to 32K to 64K to 256K to 640K.  Throughout all of it, the MS-DOS filesystem and memory interfaces continued to provide a consistent API to code to.  So developers continued to write to the MS-DOS filesystem APIs and grumbled about the costs of testing the various video combinations.

    But even so, vendors flocked to MS-DOS.  The combination of a consistent hardware platform and a consistent software interface to that platform was an unbelievably attractive combination.  At the time, the major competition to MS-DOS was Unix and the various DR-DOS variants, but none of them provided the same level of consistency.  If you wanted to program to Unix, you had to chose between Solaris, 4.2BSD, AIX, IRIX, or any of the other variants.  Each of which was a totally different platform.  Solaris’ signals behaved subtly differently from AIX, etc.  Even though the platforms were ostensibly the same, they were enough subtle differences so that you either wrote for only one platform, or you took on the burden of running the complete test matrix on EVERY version of the platform you supported.  If you ever look at the source code to an application written for *nix, you can see this quite clearly – there are literally dozens of conditional compilation options for the various platforms.

    On MS-DOS, on the other hand, if your app worked on an IBM PC, your app worked on a Compaq.  Because of the effort put forward to ensure upwards compatibility of applications, if your application ran on DOS 2.0, it ran on DOS 3.0 (modulo some minor issues related to FCB I/O).  Because the platforms were almost identical, your app would continue to run.   This commitment to platform stability has continued to this day – Visicalc from DOS 2.0 still runs on Windows XP.

    This meant that you could target the entire ecosystem of IBM PC compatible hardware with a single test pass, which significantly reduced your costs.   You still had to deal with the video and printer issue however.

    Now along came Windows 1.0.  It virtualized the video and printing interfaces providing, for the first time, a consistent view of ALL the hardware on the computer, not just disk and memory.  Now apps could write to one API interface and not worry about the underlying hardware.  Windows took care of all the nasty bits of dealing with the various vagaries of hardware.  This meant that you had an even more stable platform to test against than you had before.  Again, this is a huge improvement for ISV’s developing software – they no longer had to wonder about the video or printing subsystem’s inconsistencies.

    Windows still wasn’t an attractive platform to build on, since it had the same memory constraints as DOS had.  Windows 3.0 fixed that, allowing for a consistent API that finally relieved the 640K memory barrier.

    Fast forward to 1993 – NT 3.1 comes out providing the Win32 API set.  Once again, you have a consistent set of APIs that abstracts the hardware and provides a constant API set.  Win9x, when it came out continued the tradition.  Again, the API is consistent.  Apps written to Win32g (the subset of Win32 intended for Win 3.1) still run on Windows XP without modification.  One set of development costs, one set of test costs.  The platform is stable.  With the Unix derivatives, you still had to either target a single platform or bear the costs of testing against all the different variants.

    In 1995, Sun announced its new Java technology would be introduced to the world.  Its biggest promise was that it would, like Windows, deliver platform independent stability.  In addition, it promised cross-operating system stability.  If you wrote to Java, you’d be guaranteed that your app would run on every JVM in the world.  In other words, it would finally provide application authors the same level of platform stability that Windows provided, and it would go Windows one better by providing the same level of stability across multiple hardware and operating system platforms.

    In Jim Gosling post, he’s just expressing his frustration with fact that Linux isn’t a completely stable platform.  Since Java is supposed to provide a totally stable platform for application development, just like Windows needs to smooth out differences between the hardware on the PC, Java needs to smooth out the differences between operating systems.

    The problem is that Linux platforms AREN’T totally stable.  The problem is that while the kernel might be the same on all distributions (and it’s not, since different distributions use different versions of the kernel), the other applications that make up the distribution might not.  Java needs to be able to smooth out ALL the differences in the platform, since its bread and butter is providing a stable platform.  If some Java facilities require things outside the basic kernel, then they’ve got to deal with all the vagaries of the different versions of the external components.  As Jim commented, “They are all close, but not the same.”  These differences aren’t that big a deal for someone writing an open source application, since the open source methodology fights against packaged software development.  Think about it: How many non open-source software products can you name that are written for open source operating systems?  What distributions do they support?  Does Oracle support other Linux distributions other than Red Hat Enterprise?  The reason that there are so few is that the cost of development for the various “Linux” derivatives is close to prohibitive for most shrink-wrapped software vendors; instead they pick a single distribution and use that (thus guaranteeing a stable platform).

    For open source applications, the cost of testing and support is pushed from the developer of the package to the end-user.  It’s no longer the responsibility of the author of the software to guarantee that their software works on a given customer’s machine, since the customer has the source, they can fix the problem themselves.

    In my honest opinion, platform stability is the single thing that Microsoft’s monoculture has brought to the PC industry.  Sure, there’s a monoculture, but that means that developers only have to write to a single API.  They only have to test on a single platform.  The code that works on a Dell works on a Compaq, works on a Sue’s Hardware Special.  If an application runs on Windows NT 3.1, it’ll continue to run on Windows XP.

    And as a result of the total stability of the platform, a vendor like Lotus can write a shrink-wrapped application like Lotus 1-2-3 and sell it to hundreds of millions of users and be able to guarantee that their application will run the same on every single customer’s machine. 

    What this does is to allow Lotus to reduce the price of their software product.  Instead of a software product costing tens of thousands of dollars, software products costs have fallen to the point where you can buy a fully featured word processor for under $130.  

    Without this platform stability, the testing and development costs go through the roof, and software costs escalate enormously.

    When I started working in the industry, there was no volume market for fully featured shrink wrapped software, which meant that it wasn’t possible to amortize the costs of development over millions of units sold. 

    The existence of a stable platform has allowed the industry to grow and flourish.  Without a stable platform, development and test costs would rise and those costs would be passed onto the customer.

    Having a software monoculture is NOT necessarily an evil. 

  • Larry Osterman's WebLog

    Anatomy of a software bug, part 1 - the NT browser

    • 20 Comments
    No, I don't mean that the NT browser's a software bug...

    Actually Raymond's post this morning about the network neighborhood got me thinking about the NT browser and it's design.  I've written about the NT browser before here, but never wrote up how the silly thing worked.  While reminiscing, I remembered a memorable bug I fixed back in the early 1990's that's worth writing up because it's a great example of how strange behaviors and subtle issues can appear in peer-to-peer distributed systems (and why they're so hard to get right).

    Btw, the current design of the network neighborhood is rather different than this one - I'm describing code and architecture designed for systems 12 years ago, there have been a huge number of improvements to the system since then, and some massive architectural redesigns.  In particular, the "computer browser" service upon which all this depends is disabled in Windows XP SP2 due to attack surface reduction.  In current versions of Windows, Explorer uses a different mechanism to view the network neighborhood (at least on my machine at work).

     

    The actual original design of the NT browser came from Windows for Workgroups.  Windows for Workgroups was a peer-to-peer networking solution for Windows 3.1 (and continued to be the basis of the networking code in Windows 95).  As such, all machines in a workgroup needed to be visible to all the other machines in the workgroup.  In addition, since you might have different workgroups on your LAN, it needed to be able to enumerate all the workgroups on the LAN.

    One critical aspect of WfW is that it was designed for LAN environments - it was primarily based on NetBEUI, which was a LAN protocol designed by IBM back in the 1980's.  LAN protocols typically scale quite nicely to several hundred computers, after which they start to fall apart (due to collisions, etc).  For larger networks, you need a routable protocol like IPX or TCP, but at the time, it wasn't that big a deal (we're talking about 1991 here - way before the WWW existed).

    As I mentioned, WfW was a peer-to-peer product.  As such, everything about WfW had to be auto-configuring.  For Lan Manager, it was ok to designate a single machine in your domain to be the "domain controller" and others as "backup domain controllers", but for WfW, all that had to be automatic.

    To achieve this, the guys who designed the protocol for the WfW browser decided on a three tier design.  Most of the machine on the workgroup would be "potential browser servers".  Some of the machines in the workgroup would be declared as "browser servers", one of the machine in the workgroup was the "master browser server".

    Client's periodically (every three minutes) sent a datagram to the master browser server, and the master browser would record this in it's server list.  If the server hadn't heard from the client for three announcements, it assumed that the client had been turned off and removed it from the list.  Backup browser servers would periodically (every 15 minutes) retrieve the browser list from the master browser.

    When a client wanted to browse the network, the client sent a broadcast datagram to the workgroup asking who the browser servers were on the workgroup.  One of the backup or master browser servers would respond within several seconds (randomly).  The client would then ask that browser server for its list of machines, and would display that to the user.

    If none of the browser servers responded, then the client would force an "election".  When the potential browser servers received the election datagram, they each broadcast a "vote" datagram that described their "worth".  If they saw a datagram from another server that had more "worth" than they did, they silently dropped out of the election.

    A servers "worth" was based on a lot of factors - the system's uptime, the version of the software running, their current role as a browser (backup browsers were better than potential browsers, master browsers were better than backup browsers).

    Once the master browser was elected, it nominated some number of potential browser servers to be backup browsers

    This scheme worked pretty well - browsers tended to be stable, and the system was self healing.

    Now once we started deploying the browser in NT, we started running into problems that caused us to make some important design changes.  The biggest one related to performance.  It turns out that in a corporate environment, peer-to-peer browsing is a REALLY bad idea.  There's no way of knowing what's going on on another persons machine, and if the machine is really busy (like if it's running NT stress tests), it impacts the browsing behavior for everyone in the domain.  Since NT had the concept of domains (and designated domain controllers), we modified the election algorithm for to ensure that NT server machines were "more worthy" than NT workstation machines, this solved that particular problem neatly.  We also biased the election algorithm towards NT machines in general, on the theory that NT machines were more likely to be more reliable than WfW machines.

    There were a LOT of other details about the NT browser that I've forgotten, but that's a really brief overview, and it's enough to understand the bug.  Btw, I'm the person who coined the term "Bowser" (as in "bowser.sys") during a design review meeting with my boss (who described it as a dog) :)

    Btw, Anonymous Coward's comment on Raymond's blog is remarkably accurate, and states many of the design criteria and benefits of the architecture quite nicely.  I don't know who AC is (my first guess didn't pan out), but I suspect that person has worked with this particular piece of code :)

     

  • Larry Osterman's WebLog

    Riffing on Raymond - FindFirst/FindNext

    • 16 Comments

    As I mentioned, I've been Riffing on Raymond a lot - Yesterdays post from Raymond got me to thinking about FindFirst and FindNext in MS-DOS.

    As Raymond pointed out:

    That's because the MS-DOS file enumeration functions maintained all their state in the find structure. The FAT file system was simple enough that the necessary search state fit in the reserved bytes and no external locking was necessary to accomplish the enumeration. (If you did something strange like delete a directory while there was an active enumeration on it, then the enumeration would start returning garbage. It was considered the program's responsibility not to do that. Life is a lot easier when you are a single-tasking system.)

    The interesting thing about the fact that MS-DOS kept its state in a the reserved bytes of the find structure was that there were a bunch of apps that figured this out.  And then they realized that they could make suspend and resume their searches by simply saving away the 21 reserved bytes at the start of the structure and spitting them into a constant find first structure.

    So a program would do a depth first traversal of the tree, and at each level of the tree, instead of saving the entire 43 byte FindFirst structure, they could save 22 bytes per level of the hierarchy by just saving the first 21 bytes of the structure.  In fact, some of them were even more clever, they realized that they could save just the part of the reserved structure that they thought were important (something like 8 bytes/level).

    And that's just what they did...

    Needless to say, that caused kittens when the structures used for search had to change - these apps looked into the internal data structures and assumed they knew what they did...

     

  • Larry Osterman's WebLog

    Keeping kids safe on the internet

    • 26 Comments

    Joe Wilcox over at Microsoft Monitor recently posted an article about keeping kids safe on the internet.

    It’s a good article, but I’d add one other thing to his suggestions:  If you’ve got more than one computer in your house, disable internet access to all but public computers.  And if you’ve only got one computer put it in a public location, like the kitchen.

    We’ve got six different computers in our household – each kid has their own, I’ve got two, Valorie's got one, and there’s a common computer in the kitchen.  Valorie's and my computers have internet access, as does the common computer, but none of the others are allowed to access the internet – we filter it off access at the firewall.

    The kids also have up-to-date virus scanners on their computer (although their signatures get a smidge out-of-date).

    Once a month, after patch day, I manually enable internet access and go to windows update and ensure that they’re fully patched and their virus signatures are updated.  I know I could use SUS to roll my own update server, but it’s not that big a deal.  SImilarly, I could set one of the internet connected machines as the virus update location for the kids computers, but again, it's not that big a deal.

    This works nicely for me, and the principles can be applied to anyone's computer, even without all the added hoopla I go through.  The first and most important part of the equation is that all internet browsing is done on a public computer – that means that they’re not going to be sneaking around the darker corners of the internet, with Mom and Dad in the same room.  

    The other part of the equation is that all accounts on the public computer are LUA accounts, which adds an additional level of safety to browsing - nobody can accidentally install ActiveX controls or other software, which again adds a HUGE level of protection.  We have an admin account, but it's password protected and the kids don't know the password. 

    Edit: Addressed Michael Ruck's comment.

     

  • Larry Osterman's WebLog

    Threat Modeling Again, STRIDE

    • 9 Comments

    As has been mentioned elsewhere, when we're threat modeling at Microsoft we classify threats using the acronym STRIDE. 

    STRIDE stands for "Spoofing", "Tampering", "Repudiation", "Information disclosure", "Denial of service", and "Elevation of privilege".

    Essentially the idea is that you can classify all your threats according to one of the 6 STRIDE categories.  Since each category has a specific set of potential mitigations, once you've analyzed the threats and categorized them, you should know how to mitigate them.

    A caveat: as David points out in his "Dreadful" post, STRIDE is not a rigorous classification mechanism - there's a ton of overlap between the various categories (a successful Elevation of Privilege attack could result in Tampering of data, for instance).  But it doesn't change the fact that it's an extremely useful mechanism for analyzing threats to a system.

    So what are each of the STRIDE categories?

    Spoofing

    A spoofing attack occurs when an attacker pretends to be someone they're not.  So an attacker using DNS hijacking and pretending to be www.microsoft.com would be an example of a "spoofing" attack.  Spoofing attacks can happen locally.  For instance, as I mentioned in "Reapplying the decal" one mechanism that the Decal plugin framework  injects itself into the Asheron's Call process is to spoof one of the COM objects that Asheron's Call uses.

    Tampering

    Tampering attacks occur when the attacker modifies data in transit.  An attacker that modified a TCP stream by predicting the sequence numbers would be tampering with that data flows.  Obviously data stores can be tampered with - that's what happens when the attacker writes specially crafted data into a file to exploit a vulnerability. 

    Repudiation

    Repudiation occurs when someone performs an action and then claims that they didn't actually do it.  Primarily this shows up on operations like credit card transactions - a user purchases something and then claims that they didn't do it.  Another way that this shows up is in email - if I receive an email from you, you can claim that you never sent it.

    Information disclosure

    Information Disclosure threats are usually quite straightforward - can the attacker view data that they're not supposed to view?  So if you're transferring data from one computer to another, if the attacker can sniff the data on the wire, then your component is subject to an information disclosure threat.  Data Stores are also subject to information disclosure threats - if an unauthorized person can read the contents of the file, it's an information disclosure.

    Denial of service

    Denial of service threats occur when an attacker can degrade or deny service to users.  So if an attacker can crash your component or redirect packets into a black hole, or consume all the CPU on the box, you have a Denial of service situation.

    Elevation of privilege

    Finally, there's Elevation of privilege.  An elevation of privilege threat occurs when an attacker has the ability to gain privileges that they'd not normally have.  One of the reasons that classic buffer overflows are so important is that they often allow an attacker to raise their privilege level - for instance, a buffer overflow in any internet facing component allows an attacker to elevate their privilege level from anonymous  to the local user (or whatever account is hosting the vulnerable component). 

     

    Please note, these are only rough classifications of threats (not vulnerabilities).  And many of them aren't relevant in every circumstance.  For instance, if your component is like PlaySound, you don't need to worry about information disclosure threats to the data flows between the Application and PlaySound.  On the other hand, if you're writing an email server, you absolutely DO care about information disclosure threats.

    UPDATE: Adam Shostak over on the SDL team has posted an enhanced definition of the STRIDE categories on the Microsoft SDL blog.  You can read that list here: http://blogs.msdn.com/sdl/archive/2007/09/11/stride-chart.aspx

    Next: STRIDE mitigations

     

    Edit: Larry can't count to 6.

     

  • Larry Osterman's WebLog

    No sound on a Toshiba M7 after a Vista install (aka: things that make you go "Huh?")

    • 31 Comments

    We recently had a bug reported to us internally.  The user of a Toshiba M7 had installed Vista on his machine (which was previously running XP) and discovered that he didn't get any more sounds from his machine after the upgrade.

    We tried everything we could to figure out his problem - the audio system was sending samples to the sound card, the sound card was updating its internal position register, everything looked great.

    Usually, at this point, we start asking the impolitic questions, like:

    "Sometimes some dirt collects between the plug and the internal connectors on the sound card - could you please unplug the speakers and plug them back in?" (this is the polite way of asking "Did you remember to plug your speakers in?").

    "Sometimes a set of speakers only turn on the speaker when they detect a signal being sent to them, could you try wiggling the volume knob to see if it fixes the problem?" (I actually have one of these in my office, it's excruciatingly annoying).

    "Is it possible there's an external volume control on your speakers?  What's it set to?" (this is the polite question that catches the people who accidentally hit the mute button on their speakers or turned the volume down - we get a surprising number of these).

    Unfortunately, in this case none of these worked.  So we had to dig deeper.  For some reason (I'm not sure why), someone asked the user to boot back to XP and see if he could get sound working on XP.  He booted back to XP and it worked.  He then booted back to Vista, and...

    The sounds worked!

    He mentioned to us that when he'd booted back to XP, the sound driver reported that the volume control was muted, so he un-muted it before booting to Vista.  Just for grins, we asked him to mute the volume control on XP and boot into Vista and yup, the problem had reappeared.  Somehow muting the sound card on XP caused it to be muted in Vista.

    We got on the horn with the manufacturer of the system and the manufacturer of the sound card and they informed us that for various and sundry reasons, the XP audio driver twiddled some hardware registers that were hidden from the OS to cause the sound card to mute.  The Vista driver for the sound card didn't know about those special hardware registers, so it didn't know that the sound card was muted, so Vista didn't know it was muted.

    Needless to say, this is quite annoying - the design of the XP driver for this machine made it really easy for the customer to have a horrible experience when running Vista, which is never good.  It's critical that the OS know what's going on in the hardware (in other words, back doors are bad).  When a customer has this experience, they don't blame their system vendor or their audio driver, they blame Vista.

     

    The good news is that there’s a relatively easy workaround for people with an M7 – make sure that your machine is un-muted before you upgrade, the bad news is that this is a relatively popular computer (at least at Microsoft) and sufficient numbers of people have discovered the problem that it’s made one of our internal FAQs.

  • Larry Osterman's WebLog

    Windows Error Reporting and Online Crash Analysis are your friends.

    • 31 Comments

    I normally don’t do “me too” posts, since I figure that most of the people reading my blog are also looking at the main weblogs.asp.net/blogs.msdn.com feed, but I felt obliged to chime in on this one.

    A lot of people on weblogs.msdn.com have been posting this, but I figured I’d toss in my own version.

    When you get an “your application has crashed, do you want to let Microsoft know about it?” dialog, then yes, please send the crash report in.  We’ve learned a huge amount of where we need to improve our systems from these reports.  I know of at least three different bug fixes that I’ve made in the audio area that directly came from OCA (online crash analysis) reports.  Even if the bugs are in drivers that we didn’t write (Jerry Pisk commented about creative lab’s drivers here for example), we still pass the info on to the driver authors.

    In addition, we do data mining to see if there are common mistakes made by different driver authors and we use these to improve the driver verifier – if a couple of driver authors make the same mistake, then it makes sense for us to add tests to ensure that the problems get fixed on the next go-round.

    And we do let 3rd party vendors review their data.  There was a chat about this in August of 2002 where Greg Nichols and Alther Haleem discussed how it’s done.  The short answer is you go here and follow the instructions.  You have to have a Verisign Class 3 code-signing ID to do participate though.

    Bottom line: Participate in WER/OCA – Windows gets orders of magnitude more stable because of it.  As Steve Ballmer said:

    About 20 percent of the bugs cause 80 percent of all errors, and — this is stunning to me — one percent of bugs cause half of all errors.

    Knowing where the bugs are in real-world situations allows us to catch the high visibility bugs that plague our users that we’d otherwise have no way of discovering.

  • Larry Osterman's WebLog

    The Windows command line is just a string...

    • 30 Comments

    Yesterday, Richard Gemmell left the following comment on my blog (I've trimmed to the critical part):

    I was referring to the way that IE can be tricked into calling the Firefox command line with multiple parameters instead of the single parameter registered with the URL handler.

    I saw this comment and was really confused for a second, until I realized the disconnect.  The problem is that *nix and Windows handle command line arguments totally differently.  On *nix, you launch a program using the execve API (or  it's cousins execvp, execl, execlp, execle, and execvp).  The interesting thing about these APIs is that they allow the caller to specify each of the command line arguments - the signature for execve is:

    int execve(const char *filename, char *const argv [], char *const envp[]);

    In *nix, the shell is responsible for turning the string provided by the user into the argv parameter to the program[1].

     

    On Windows, the command line doesn't work that way.  Instead, you launch a new program using the CreateProcess API, which takes the command line as a string (the lpComandLine parameter to CreateProcess).  It's considered the responsibility of the newly started application to call the GetCommandLine API to retrieve that command line and parse it (possibly using the CommandLineToArgvW helper function).

    So when Richard talked about IE "tricking" Firefox by calling it with multiple parameters, he was apparently thinking about the *nix model where an application launches a new application with multiple command line arguments.  But that model isn't the Windows model - instead, in the Windows model, the application is responsible for parsing it's own command line arguments, and thus IE can't "trick" anything - it's just asking the shell to pass a string to the application, and it's the application's job to figure out how handle that string.

    We can discuss the relative merits of that decision, but it was a decision made over 25 years ago (in MS-DOS 2.0).

     

    [1] Yes, I know that the execl() API allows you to specify a command line string, but the execl() API parses that command line string into argv and argc before calling execve.

  • Larry Osterman's WebLog

    Things you shouldn't do, part 1 - DllMain is special

    • 5 Comments

    A lot of people have written about things not to do in your DllMain.  Like here, and here and here.

    One other thing not to do in your DllMain is to call LoadLibraryEx.  As others have written, DllMain’s a really special place to be.  If you do anything more complicated than initializing critical sections, or allocating thread local storage blocks, or calling DisableThreadLibraryCalls, you’re potentially asking for trouble.

    Sometimes, however the interaction is much more subtle.  For example, if your DLL uses COM, you might be tempted to call CoInitializeEx in your DllMain.  The problem is that that under certain circumstances, CoInitializeEx can call LoadLibraryEx.  And calling LoadLibraryEx is one of the things that EXPLICITLY is forbidden during DllMain (You must not call LoadLibrary in the entry-point function).

     

  • Larry Osterman's WebLog

    Resilience is NOT necessarily a good thing

    • 66 Comments

    I just ran into this post by Eric Brechner who is the director of Microsoft's Engineering Excellence center.

    What really caught my eye was his opening paragraph:

    I heard a remark the other day that seemed stupid on the surface, but when I really thought about it I realized it was completely idiotic and irresponsible. The remark was that it's better to crash and let Watson report the error than it is to catch the exception and try to correct it.

    Wow.  I'm not going to mince words: What a profoundly stupid assertion to make.  Of course it's better to crash and let the OS handle the exception than to try to continue after an exception.

     

    I have a HUGE issue with the concept that an application should catch exceptions[1] and attempt to correct them.  In my experience handling exceptions and attempting to continue is a recipe for disaster.  At best, it takes an easily debuggable problem into one that takes hours of debugging to resolve.  At it's worst, exception handling can either introduce security holes or render security mitigations irrelevant.

    I have absolutely no problems with fail fast (which is what Eric suggests with his "Restart" option).  I think that restarting a process after the process crashes is a great idea (as long as you have a way to prevent crashes from spiraling out of control).  In Windows Vista, Microsoft built this functionality directly into the OS with the Restart Manager, if your application calls the RegisterApplicationRestart API, the OS will offer to restart your application if it crashes or is non responsive.  This concept also shows up in the service restart options in the ChangeServiceConfig2 API (if a service crashes, the OS will restart it if you've configured the OS to restart it).

    I also agree with Eric's comment that asserts that cause crashes have no business living in production code, and I have no problems with asserts logging a failure and continuing (assuming that there's someone who is going to actually look at the log and can understand the contents of the log, otherwise the  logs just consume disk space). 

     

    But I simply can't wrap my head around the idea that it's ok to catch exceptions and continue to run.  Back in the days of Windows 3.1 it might have been a good idea, but after the security fiascos of the early 2000s, any thoughts that you could continue to run after an exception has been thrown should have been removed forever.

    The bottom line is that when an exception is thrown, your program is in an unknown state.  Attempting to continue in that unknown state is pointless and potentially extremely dangerous - you literally have no idea what's going on in your program.  Your best bet is to let the OS exception handler dump core and hopefully your customers will submit those crash dumps to you so you can post-mortem debug the problem.  Any other attempt at continuing is a recipe for disaster.

     

    -------

    [1] To be clear: I'm not necessarily talking about C++ exceptions here, just structured exceptions.  For some C++ and C# exceptions, it's ok to catch the exception and continue, assuming that you understand the root cause of the exception.  But if you don't know the exact cause of the exception you should never proceed.  For instance, if your binary tree class throws a "Tree Corrupt" exception, you really shouldn't continue to run, but if opening a file throws a "file not found" exception, it's likely to be ok.  For structured exceptions, I know of NO circumstance under which it is appropriate to continue running.

     

    Edit: Cleaned up wording in the footnote.

  • Larry Osterman's WebLog

    Why is it FILE_SHARE_READ and FILE_SHARE_WRITE anyway?

    • 19 Comments

    Raymond’s post about FILE_SHARE_* bits reminded me of the story about why the bits are FILE_SHARE_READ in the first place.

    MS-DOS had the very same file sharing semantics as NT does (ok, NT adds FILE_SHARE_DELETE, more on that later).  But on MS-DOS, the file sharing semantics were optional – you had to load in the share.com utility to enable them.  This was because on a single tasking operating system, there was only ever going to be one application running, so the sharing semantics were considered optional.  Unless you were running a file server, in which case Microsoft strongly suggested that you should load the utility.

    On MS-DOS, the sharing mode was controlled by the three “sharing mode” bits.  They legal values for “sharing mode” were:

                000 – Compatibility mode. Any process can open the file any number of times with this mode.  It fails if the file’s opened in any other sharing mode.
                001 – Deny All.  Fails if the file has been opened in compatibility mode or for read or write access, even if by the current process
                010 – Deny Write.  Fails if the file has been opened in compatibility mode or for write access by any other process
                011 – Deny Read – Fails if the file has been opened in compatibility mode or for read access by any other process.
                100 – Deny None – Fails if the file has been opened in compatibility mode by any other process.

    Coupled with the “sharing mode” bits is the four “access code” bits.  There were only three values defined for them, Read, Write, and Both (Read/Write).

    The original designers of the Win32 API set (in particular, the designer of the I/O subsystem) took one look at these permissions and threw up his hands in disgust.  In his opinion, there are two huge problems with these definitions:

    1)                  Because the sharing bits are defined as negatives, it’s extremely hard to understand what’s going to be allowed or denied.  If you open a file for write access in deny read mode, what happens?  What about deny write mode – Does it allow reading or not?

    2)                  Because the default is “compatibility” mode, it means that by default most applications can’t ensure the integrity of their data.  Instead of your data being secure by default, you need to take special actions to guarantee that nobody else messes with the data.

    So the I/O subsystem designer proposed that we invert the semantics of the sharing mode bits.  Instead of the sharing rights denying access, they GRANT access.  Instead of the default access mask being to allow access, the default is to deny access.  An application needs to explicitly decide that it wants to let others see its data while it’s manipulating the data.

    This inversion neatly solves a huge set of problems that existed while running multiple MS-DOS applications – if one application was running; another application could corrupt the data underneath the first application.

    We can easily explain FILE_SHARE_READ and FILE_SHARE_WRITE as being cleaner and safer versions of the DOS sharing functionality.  But what about FILE_SHARE_DELETE?  Where on earth did that access right come from?  Well, it was added for Posix compatibility.  Under the Posix subsystem, like on *nix, a file can be unlinked when it’s still opened.  In addition, when you rename a file on NT, the rename operation opens the source file for delete access (a rename operation, after all is a creation of a new file in the target directory and a deletion of the source file).

    But DOS applications don’t expect that files can be deleted (or renamed) out from under them, so we needed to have a mechanism in place to prevent the system from deleting (or renaming) files if the application cares about them.  So that’s where the FILE_SHARE_DELETE access right comes from – it’s a flag that says to the system “It’s ok for someone else to rename this file while it’s running”. 

    The NT loader takes advantage of this – when it opens DLL’s or programs for execution, it specifies FILE_SHARE_DELETE.  That means that you can rename the executable of a currently running application (or DLL).  This can come in handy when you want to drop in a new copy of a DLL that’s being used by a running application.  I do this all the time when working on winmm.dll.  Sine winmm.dll’s used by lots of processes in the system, including some that can’t be stopped, I can’t stop all the processes that reference the DLL, so instead, when I need to test a new copy of winmm, I rename winmm.dll to winmm.old, copy in a new copy of winmm.dll and reboot the machine.

     

Page 5 of 33 (815 items) «34567»