May, 2007

Larry Osterman's WebLog

Confessions of an Old Fogey
  • Larry Osterman's WebLog

    Compatibility works both ways

    • 7 Comments

    Windows is rather famous for its ability to run applications that were written for previous versions of the Windows operating system.  Volumes have been written about Microsoft's backwards compatibility.

    On the other hand, people have long criticized Microsoft because applications developed for the current SDKs sometimes don't work on previous versions of the operating system.  The problem is that as features get added to the operating system, the headers get updated to reflect those new features.  If your application inadvertently uses those new features, then your application isn't going to work on those older versions.

    Recently, reader Nike was having problems with code I posted in one of my blog posts.  Fortunately he posted his errors, and I immediately knew what had happened.  Here's a snippet:

    c:\program files\microsoft sdks\windows\v6.0\include\structuredquery.h(372) : error C2061: syntax error : identifier '__RPC__out'

    c:\program files\microsoft sdks\windows\v6.0\include\structuredquery.h(376) : error C2061: syntax error : identifier '__RPC__in'

    Ok, Larry - I don't get what a compilation problem that some reader's having with your code has to do with compatibility in the SDK.  Well, it turns out that the root cause of Nike's problem was related to an SDK versioning issue.  It turns out that Microsoft HAS built in backwards compatibility into it's SDKs.  Raymond wrote about this here, but basically there used to be a mismash of manifest constants that you could set to instruct the SDK which OS you're targetting.

    As Raymond mentioned, at some point recently (I'm not sure when) the SDK guys decided to rationalize the various ways of specifying the target OS to one manifest constant which will set all the magic constants for whatever version of the OS you're targeting.

    In Nike's case, I realized that the errors related to a a number of SAL annotations that were added for Vista.  From that, I used my psychic debugging skills and realized that Nike hadn't set NTDDI_VERSION to NTDDI_LONGHORN, and thus didn't have the new definitions included.

    For those writing code for really old versions of Windows (pre Win2K), you're a bit stuck - the SDK guys only defined NTDDI_VERSION back to Win2K.  For operating systems before Win2K, you've got to use the old _WIN32_WINDOWS definitions as described in this MSDN article.

  • Larry Osterman's WebLog

    Is a DoS a valid security problem?

    • 25 Comments

    Time for some more controversy...

    Another Microsoft developer and I recently had a fairly long email discussion about a potential problem.  It turns out that it might be possible to craft a file in such a fashion that would cause some internal applications to crash.  The details aren't really important, but some of the comments that this developer (let's call him "R.") made were rather interesting and worthy of discussion.

    In this case, the problem could not be exploited - the worst that could happen with this problem is a crash, which is considered a local DoS attack (a classic buffer overrun also can cause a crash, but they allow the attacker to execute arbitrary code, instead, they're classified as elevation of privilege attacks (the buffer overrun allows the attacker to elevate from unauthenticated to authenticated).

    The SDL process states this kind of DoS attack isn't important enough to rate a security bulletin, instead they're classified as being in the "fix in the next service pack" category (IIRC, remote DoS attacks are service bulletin classed issues (I may be wrong on this one)).

    So as I mentioned, we're discussing to what level the system should protect itself from this kind of issue.  R. was quite adamant that he considered these kinds of problems to be "non-issues" because the only thing that could happen was that an app could crash.

    The rules of the road in this case are quite simple:  If it's possible to craft a file that would cause a part of the system to hang or crash, it's got to be fixed in the next service pack after the problem's discovered (or RTM if it's found in testing, obviously). 

    David LeBlanc has written a number of articles where he discusses related issues (see his article "Crashes are Bad, OK?" for an example). 

    But R.'s point was essentially the same as David's: Crashes (hangs) are bad got a point, but are they really "security bugs"?  There are tons of ways to trick applications to do bad things, do all of them classify problems that MUST be fixed, even if they're not exploitable?

     

    Just something to think about on a Thursday afternoon.

  • Larry Osterman's WebLog

    The C abstract machine

    • 21 Comments

    I mentioned yesterday that the C/C++ language was defined to operate on an abstract machine.  At the time I didn't know of an online reference to the C or C++ language standard, but a little birdie pointed me to this, which is a draft of the C language specification.

    In section 5.1.2.3, you find:

    1. The semantic descriptions in this International Standard describe the behavior of an
      abstract machine in which issues of optimization are irrelevant.
    2. Accessing a volatile object, modifying an object, modifying a file, or calling a function
      that does any of those operations are all side effects, which are changes in the state of
      the execution environment. Evaluation of an expression may produce side effects. At
      certain specified points in the execution sequence called sequence points, all side effects
      of previous evaluations shall be complete and no side effects of subsequent evaluations
      shall have taken place. (A summary of the sequence points is given in annex C.)
    3. In the abstract machine, all expressions are evaluated as specified by the semantics. An
      actual implementation need not evaluate part of an expression if it can deduce that its
      value is not used and that no needed side effects are produced (including any caused by
      calling a function or accessing a volatile object).
    4. When the processing of the abstract machine is interrupted by receipt of a signal, only the
      values of objects as of the previous sequence point may be relied on. Objects that may be
      modified between the previous sequence point and the next sequence point need not have
      received their correct values yet.
    5. The least requirements on a conforming implementation are:
      * At sequence points, volatile objects are stable in the sense that previous accesses are
      complete and subsequent accesses have not yet occurred.
      * At program termination, all data written into files shall be identical to the result that
      execution of the program according to the abstract semantics would have produced.
      * The input and output dynamics of interactive devices shall take place as specified in
      7.19.3. The intent of these requirements is that unbuffered or line-buffered output
      appear as soon as possible, to ensure that prompting messages actually appear prior to
      a program waiting for input.
    6. What constitutes an interactive device is implementation-defined.
    7. More stringent correspondences between abstract and actual semantics may be defined by
      each implementation.

      <The standard goes on and gives examples and clarifications associated with those examples>

     

    There are more details scattered throughout the document, but this is the section that defines the machine.  A couple of things that I find quite cool in it:

    Section 2 which states that side effects of previous evaluations must be complete at sequence points (for example at the ; at the end of a statement)  What that means is that a compiler is free to reorder operations in any way it chooses but it can't reorder them across sequence points IF the operations have side effects.

    So if you have:

    *p++ = p[i++];
    printf("p is: %p", p++);

    the compiler could generate about 4 different versions (depending on what order in which the post increments occur), but the first statement can't affect the printf statement.

    Section 3 states that a compiler can reorder code if there are no side-effects.  It also says that the compiler can discard code if it figures it's not used.

     

    What's also fascinating is that the standard says nothing about concurrency or multiple execution units - the C language definition defines what happens within ONE execution unit.  It is also explicitly mute about other execution units.  Why is this important?

    Consider the following code:

    static int myFirstSharedValue, mySecondSharedValue;

    myFirstSharedValue = 5;

    mySecondSharedValue = 42;

    if (mySharedValue == 5)
    {
        <do stuff>
    }

    The compiler could optimize the assignment of mySecondSharedValue to occur AFTER the if test (assuming that nothing in <do stuff> depends on the value of mySecondSharedValue.  The compiler also might reorder the assignment to put the second assignment first!

    What's worse, the processor might chose to reorder how the data was saved in memory.  As long as the read of mySecondSharedValue for that execution unit returns the value of 42, it doesn't actually matter when the value of 42 is saved in memory.  It might easily be before the first value is written.  As long as you've only got one thread running, it doesn't matter.

    On the other hand, if you have multiple threads that read those values, it would be easy to depend on the write to myFirstSharedValue happening before the write to mySecondSharedValue - after all, that's what the code says, and that's what the language defines.

    But the language is defined for the abstract execution unit above, and that might not match the real system.

     

    This is why people who try to write correct lock-free programming end up tearing their hair out, and why it's so hard to write lock free code. 

     

    Btw, Herb Sutter's been chairing a working group that's chartered to define the abstract machine definition for Windows programs, some of his work can be found here.

  • Larry Osterman's WebLog

    Blocking your UI thread with PlaySound

    • 8 Comments

    For better or worse, the Windows UI model ties a window to a particular thread, that has led to a programming paradigm where work is divided between "UI threads" and "I/O threads".  In order to keep your application responsive, it's critically important to not perform any blocking operations on your UI thread and instead do them on the "I/O threads".

    One thing that people don't always realize is that even asynchronous APIs block.  This isn't surprising - a single processor core can only do one thing at a time (to be pedantic, the processor cores can and do more than one thing at a time, but the C (or C++) language is defined to run on an abstract machine that enforces various strict ordering semantics, thus the C (or C++) compiler will do what is necessary to ensure that the languages ordering semantics are met[1]).

    So what does an "async" API really do given that most APIs are written in languages that don't contain native concurrency support[2] ?  Well, usually it packages up the parameters to the API and queues it to a worker thread (this is what the CLR does for many of the "async" CLR operations - they're not really asynchronous, they're just synchronous calls made on some other thread).

    For some asynchronous APIs (like ReadFile and WriteFile) you CAN implement real asynchronous semantics - under the covers, the ReadFile API adds a read request to a worker queue and starts the I/O associated with reading the data from disk, when the hardware interrupt occurs indicating that the read is complete, the I/O subsystem removes the read request from the worker queue and completes it [3].

    The critical thing to realize is that even for the APIs that do support real asynchronous activity there's STILL synchronous processing going on - you still need to package up the parameters for the operation and add them to a queue somewhere, and that can stall the processor.  For most operations it doesn't matter - the time to queue the parameters is sufficiently small that you can perform it on the UI thread.

     

    And sometimes it isn't.  It turns out that my favorite API, PlaySound is a great example of this.  PlaySound provides asynchronous behavior with the SND_ASYNC flag, but it does a fair amount of work before dispatching the call to a worker thread.  Unfortunately, some of the processing done in the application thread can take many milliseconds (especially if this is the first call to winmm.dll).

    I originally wrote down the operations that were performed on the application's thread, but then I realized that doing so would cement the behavior for all time, and I don't want to do that.  So the following will have to suffice:

    In general, PlaySound does the processing necessary to determine the filename (or WAV image) in the application thread and posts the real work (rendering the sound) to a worker thread.  That processing is likely to involve synchronous I/Os and registry reads.  It may involve searching the path looking for a filename.  For SND_RESOURCE, it will also involve reading the resource data from the specified module. 

    Because of this processing, it's possible for the PlaySound(..., SND_ASYNC) operation to take several hundred milliseconds (and we've seen it take as long as several seconds if the current directory is located on an unreliable network).  As a result, even the SND_ASYNC version of the PlaySound API should be avoided on UI threads[4].

     

     

    [1] I bet most of you didn't know that the C language definition strictly defines an abstract machine on which the language operates.

    [2] Yes, I know about the OpenMP extensions to C/C++, they don't change this scenario.

    [3] I know that this is a grotesque simplification of the actual process.

    [4] For those that are now scoffing: "What a piece of junk - why on earth would you even bother doing the SND_ASYNC if you're not going to really be asynchronous", I'll counter that the actual rendering of the audio samples for many sounds takes several seconds.  The SND_ASYNC flag moves all the actual audio rendering off the application's thread to a worker thread, so it can result in a significant improvement in performance.

  • Larry Osterman's WebLog

    What's wrong with this code, part 20: Yet another reason that named synchronization objects are dangerous, the answers

    • 19 Comments

    Microsoft can be quite obsessive about instrumentation and metrics.  We have a significant body of tools that perform static and dynamic analysis on our operating systems.  Some of these tools (for example prefast and FxCop) are public, some are not.

    Friday I posted a small snippet of code that showed a couple of the issues using named shared synchronization objects.  The example was abstract, but the problems called out is quite real.

    If you run the Microsoft prefix tool on the example, it will point out the first problem in the snippet:  The calls to CreateEvent are made without specifying a DACL.  This is bad, because the objects get a DACL based on the creator's token, which means you don't get to control it, and it's entirely possible that the DACL you get is to permissive.

    Even if you're not worried about a bad guy (and you always have to worry about the bad guy), if you're using the shared synchronization object in a client/server scenario, it means that the client might not have rights to access the server  because the newly created object will get a security descriptor based on the token of the creator.  Even worse, it's possible that the SD in the client grants the server rights but not vice-versa.  That means that depending on which side gets to run first, you might get different results.  That kind of timing problem can be a nightmare to debug.

    If the ACL creates is too permissive, then things get scary.  If the DACL grants full access, it means that a bad guy can do ANYTHING to your event - they can change it's state, they can set a new DACL on it (locking your application out), etc.  Weak DACLs are a recipe for denial of service attacks (or worse - depending on the access rights granted and the circumstances, they can even enable elevation of privilege attacks).

    Of course the fix is simple: Provide a DACL that grants the relevant access rights to the principles that are going to be opening the shared synchronization object.

    That's what we did when our source code analysis tools showed the first problem - they correctly detected that we were using a named object and hadn't specified a DACL for the named object.

     

    A couple of weeks later, I got a new bug report, this time from the security guys.  It seems that they had ran a second tool which looked for ACLs that were too permissive.  In their scan, they found our named shared object, which had granted GENERIC_ALL access to the windows audio service.

    <grumble>

    I fixed the problem by tightening the ACL on the shared object to only grant SYNCHRONIZE access to the audio service (that's all the audio service needed), and I ran afoul of the second problem.

    As the documentation for CreateEvent clearly spells out, if the named synchronization object already existed, CreateEvent API (and CreateMutex and CreateSemaphore) opens the object for EVENT_ALL_ACCESS (or SEMAPHORE_ALL_ACCESS or MUTEX_ALL_ACCESS).  That means that your DACL needs to grant EVENT_ALL_ACCESS (or...) or the call to CreateEvent will fail.  And I just changed the DACL to only grant SYNCHRONIZE access.

    The problem is that we used CreateEvent to prevent from the race condition that occurs if you try to call OpenEvent/CreateEvent - it's possible for the OpenEvent call to fail and have all the consumers of the named event fall through to the CreateEvent API - to avoid this race, the code simply used CreateEvent.  I wasn't willing to add a call to OpenEvent because it would leave the race condition still present.

     

     

    The good news is that the COSD people had identified this problem in the named synchronization object APIs and in Vista the new CreateEventEx was added to enable applications to work around this deficiency in the APIs: The CreateEventEx API allows you to specify an access mask to be used on the object after it is opened.  So now it's possible to have accurate DACLs on named synchronization objects AND prevent squatting attacks.

     

     

     

    NB: The current documentation for CreateEventEx says that it opens for EVENT_ALL_ACCESS if the named object already exists, this is a typo in the documentation, I've already pointed it out to the doc people to let them know about the error.
  • Larry Osterman's WebLog

    What's wrong with this code, part 20(!): Yet another reason that named synchronization objects are dangerous...

    • 38 Comments

    When you're doing inter-process communication, it's often necessary to use named synchronization objects to communicate state between the processes.  For instance, if you have a memory section that's shared between two processes, it's often convenient to use a named mutex on both processes to ensure that only one process is accessing the memory at a time.

    I recently had to fix a bug that was ultimately caused by a really common coding error when using named events.  I've taken the bug and stripped it down to just about the simplest form that still reproduced the error.

     

    const LPCWSTR EventName = L"MyEventName"; 
    DWORD WINAPI WorkerThread(LPVOID context) 
    { 
        HANDLE eventHandle = CreateEvent(NULL, TRUE, FALSE, EventName); 
        if (eventHandle == NULL) 
        { 
            return GetLastError(); 
        } 
    
        WaitForSingleObject(eventHandle, INFINITE); 
        // Do Some Work. 
        return 0; 
    } 
    
    int _tmain(int argc, _TCHAR* argv[]) 
    { 
        HANDLE threadHandle = CreateThread(NULL, 0, WorkerThread, NULL, 0, NULL); 
        HANDLE eventHandle = CreateEvent(NULL, TRUE, FALSE, EventName); 
    
        SetEvent(eventHandle); 
        WaitForSingleObject(threadHandle, INFINITE); 
        return 0; 
    }

    There are actually TWO things wrong with this code, and they're both really bad.  The second one won't become apparent until after the first one's found.

     

    Things that are not wrong:

    • Using CreateEvent in the worker thread instead of OpenEvent - remember, in the original incarnation of this code, the two components that deal with the event run in different processes - using CreateEvent allows you to protect against race conditions creating the event (if you call OpenEvent and follow it with a call to CreateEvent if the OpenEvent fails, there's a window where the other side could also call CreateEvent).
    • Calling CreateThread BEFORE calling CreateEvent - this was quite intentional to show off the potential for the race condition above.
    • There's limited error checking in this code.  While this code is not production quality, error handling could obfuscate the problem.
    • The _tmain/_TCHAR - this function's Unicode, VS stuck in the _T stuff in it's wizard.

     As always, kudos and explanations on Monday.

  • Larry Osterman's WebLog

    Curvy Volumes

    • 3 Comments

    Silly story.

    Way back when I first joined the audio team, Steve Ball (the PM for the team) showed up for work one day with his SPL meter.  It seems he'd gone home and used his SPL meter to measure the output of the speakers.  Much to his surprise, the volume curve generated by the volume control wasn't linear.  It was almost worse than that - on some devices it was exponential, on others it was logarithmic.

    The problem ended up being buried deep inside the XP audio stack, and was ultimately caused by an impedance mismatch between the various audio APIs.  Some of them expressed volume in 1/65535th of a dB, others expressed volume in unspecified units between 0 and 65535, still others represented volume as unspecified units between 0 and 255.

    In this case, the volume control used by the UX was using the mixer APIs, which represented volume as a number between 0 and 65535 (with no indication of scaling).  The audio stack converted that number to a dB value, but it was doing the correction inaccurately, resulting in a logarithmic volume curve - the volume control was very hot (active) at the bottom of the volume scale, but cold (inactive) at the top of the scale - basically you got a lot of volume change at the bottom of the curve, and very little at the top. 

    Making matters worse, some audio drivers decided that even though they were being told a specific attenuation value (in dB), they would ignore that value and interpret the dB value as if it represented a linear volume (remember that dB is a logarithmic scale - a 6dB reduction in volume represents a 50% reduction in signal amplitude).  I actually have a set of USB speakers in my office that has an exponential(!) volume curve - it's flat at the bottom and active at the top.

    For XP SP2, we added the ability for an OEM to provide a volume curve that can be used to control the curve that was used for the various audio APIs, that helped a great deal.

    For Vista, we decided to fix this for once and for all.  First off, we declared the units associated with all of our volume APIs - the stream volume, simple volume, and channel volume all represent volume as a floating point amplitude scalar ranging from 0.0 to 1.0.  Since amplitude scalars map directly to dB (dB=20*log10(scalar), scalar=10^(dB/20)), this essentially means that the stream, simple and channel volumes represent volume in dB.

    For the endpoint volume APIs, there are two variants of each of the APIs - there's a dB version which takes input in dB, and there's a "scalar" version which takes a linear floating point value between 0.0 and 1.0[1].  The "scalar" version is intended for applications that want to implement a master volume control because it provides a linear function that maps nicely to a position slider.  The other nice thing about the "scalar" version of the endpoint volume APIs is that it provides a volume taper that gives a linear volume experience for audio devices that correctly handle volume dB. 

    You can see the effect of this taper in Vista.  Start the sounds control panel.  On the sounds control panel, select properties on your default playback device, then select the "levels" tab.  Right click on the number in the volume control (the top slider), and select "decibels".  That instructs the volume control to represent the slider position in dB.  Now start the volume mixer.  Go to the master volume and set the slider position to 50% (there's a convenient tick in the slider at that location).  You'll notice that the slider in the speakers properties page is somewhat below the 50% value[2]. 

     

    Btw, if you're into audio geeky fun, it can be interesting playing with the volume sliders for various applications seeing if their volume control implements a linear or a logarithmic curve.  Now that I'm aware of the issue, I find it fascinating to see how many applications get this "wrong".

     

    [1] For audio geeks out there, yeah, we know this looks stupid - the API indicates that it takes an amplitude scalar but it actually takes a slider position, we know that :(.

    [2] This won't work on some machines - if your machine doesn't have a hardware volume control, then you won't have a volume control on the speakers property page (my laptop is a good example of this - it doesn't have a hardware volume control).

  • Larry Osterman's WebLog

    Volume in Windows Vista, part 3: Capture volume

    • 3 Comments

    As anyone who's read this blog with any regularity knows, my son Daniel is a budding actor.  As such, many of his friends are also budding actors, and that means that we get to see lots of high school shows (we try to go to see every show that his friends are in).

    Last Saturday, we went to see the Kamiak High School production of "Hello Dolly".  It was a very impressive production, with a 51 person cast (I don't know how they all fit on the stage at one time).  The lead (whose name is escaping me at the moment) was quite exceptional, and in general the production was very enjoyable, except for some notable technical issues.

     

    Right now, you're probably saying "???  I thought this was a post about capture volume, what does a high school musical have to do with capture volume?"

    Well, one of the notable technical issues was that the voices of several of the performers was horribly distorted.  Whenever I hear distortion of audio, I start looking for clipping - that's usually what's happening.

    Do you remember my picture from earlier that showed the distortion caused by amplification in the digital realm?

    It turns out that the same thing happens on capture - if the volume on a microphone is set too high, it clips and the input is horribly distorted.

     

    If you'll recall, my last post discussed the 4 types of volume in Vista.  We were really happy with the design, we implemented it for code complete on Vista Beta2, we deployed it and it worked.  Everyone was happy.

    Until the speech and RTC people started testing their stuff on Vista.  At which point, the audio volume team (me) got a little lesson in the realities of capture volume.

     

    When rendering, the clipping I mentioned above is manageable - as long as we keep the magnitude of the signal below 1.0 (0dB), the problem goes away.  The per-application (stream volume, session volume) paradigm works well in this scenario because the only thing that can clip is the master volume is limited by the render volume, so you can have per-application streams that feed into a single master-volume-limited stream without worrying about clipping (you do have to worry about clipping, especially if you're playing multiple full dynamic range streams, but that's out of scope for this discussion).

    But for capture, it's another story.  For capture, clipping happens whenever the volume control at the ADC (analog-digital converter) is set too high.  That means that the only volume control that actually matters for capture is the master volume.  The entire concept of per-application volume doesn't work for capture.

     

    Needless to say, this was a bit embarrassing.  Inside the audio engine, capture and render are essentially identical - the only difference between the two is the order in which the audio graph is built, so my internal mind-set treated them the same.  I'd been so focused on rendering scenarios that I simply didn't think about how capture was basically different from render.

    So how to resolve this?  Well, we turned off per-application volume for capture.  This means that for capture endpoints, the volume controls still control the hardware volume.  All four volume controls still exist for capture, but for capture the session volume and the endpoint volume manipulate the same hardware volume control.  That means that existing and new capture applications (like speech recognition applications and IM applications) should continue to work without modification.

    You can see this at work if you bring up the sounds control panel applet, select the recording tab and select your microphone input.  Go to the "levels" tab and look at the master volume slider.  Now run the speech tuning wizard for your favorite capture application (either one that came in Vista or an existing application).  You'll notice that as you run through the speech tuning wizard, you'll see the capture volume change.

  • Larry Osterman's WebLog

    Where on earth did Larry go?

    • 17 Comments

    No, nothing bad happened to me, I just got a bit caught up in work stuff.

     

    I spent the last three weeks volunteering to help another team at Microsoft finish off the final set of work items on a new tool they're building - one of the developers on the project quit for family reasons and they had a major milestone coming up last week and nobody to complete it.  Since the team doing the work knew that the tool would help in an area in which I was passionate, they asked and my manager was gracious enough to let me help them out.

    Actually, it was a bit funny - I've never used Visual Studio as a development environment for anything real before, I hadn't realized how amazing it is as a development environment (the ability to set breakpoints inside an XSL transform seems like magic).  It's also a much more intense development experience than I'm used to.  For most of the past 30 years, my normal development cycle has been (in college I didn't have the "copy it to my test machine" step, but otherwise it's essentially been unchanged):

      1. "write some code",
      2. "compile it",
      3. "copy it to my test machine",
      4. "test the changes",
      5. got to #1.

    Using Visual Studio changes the tempo of the cycle dramatically.  The JIT compiling, edit-and-continue support and the speed of the compiler combine to make a much more rapid turn-around time on changes than I'm used to.  This shows up in subtle ways - normally I have no trouble keeping up with my incoming email flow - I switch to Outlook while I'm in step #2 and #3 to read my email.  But while I was working in VS, I didn't have time - there were no significant slow times in the process - the edit/compile/test cycle was sufficiently quick that I didn't have the ability to keep up with my email.

    Go figure.

     

    Anyway, I'm back :)  I'm in a class all day today, so nothing more interesting, but I AM planning on finishing up the series on volume.  Oh, and I've got a new "API that should be banned" to write about :)

Page 1 of 1 (9 items)