December, 2005

Larry Osterman's WebLog

Confessions of an Old Fogey
  • Larry Osterman's WebLog

    And the blog goes dark (just for a while, I've not retired or been fired :))

    • 4 Comments

    I'm about to leave for my much needed Christmas-time vacation, I'll be back east with the family, so I'm not planning on writing any new posts until next year.

     

    Y'all have a great holiday season, y'hear?

     

  • Larry Osterman's WebLog

    Why doesn't Vista expose individual pairs of channels as separate endpoints?

    • 18 Comments

    One of the questions that was asked after EHO and my Channel 9 video aired was:

    is there an ability in vista to set on which speaker you hear an application? for example i want the media player on the 2 front speakers and the game on the 2 back speakers? if it's possible, is there also a possibility to set the volume for each application on a per speaker base?
     

    With a followup comment (from someone else) of:

    Last I checked most audio cards were either full of stereo outputs, for front, rear, side etc. You are saying that these stereo output pairs, one of which could be utilized as headphone output instead of "side speaker pair 7 & 8" (so many outputs are common in sound cardss, but so many speakers are not common in  average home computer), should be utilized only as one end point instead of how the user actually thinks about them?

    The answer to both of these is: It's an interesting idea, but has some technical issues.

    First off, many audio solutions don't have multiple sets of stereo jacks in the back.  Multi-channel solutions come in lots of forms, such as S/PDIF, USB, 1394, etc, most of which support multiple audio channels without separate jacks.

    Secondly, Such a solution isn't technically feasible given the way that the audio engine is architected - at a minimum, the fact that each endpoint has its own preferred audio format makes this quite difficult. 

    The other problem with this idea is discoverability.  Assuming that it was possible to work past the architectural issues, before you enable such a feature, you've got to tell the user about it.

    Clearly, out-of-the-box, each multi-channel audio adapter needs to be its own endpoint.  Otherwise owners of 5.1 audio systems will be a smidge peeved to find out that their 5.1 audio solution can't render 5.1 content.  So this "each channel is an endpoint" idea has to be an opt-in.

    Now how do you describe such a feature to the user?  How do you tell them "Make the rear left and right speakers their own endpoint"?  How do you get a user to hook that up correctly?  And how do you deal with the multi-channel audio solutions I mentioned above?  For those audio solutions, the "make each channel a separate endpoint" option results in a HORRIBLE user experience, so we would want to disable this option when you're running in one of those configurations.

    Adding to this, there are a significant number of audio applications that don't handle multiple audio devices - they simply assume that audio device 0 is the one that they want to use.

    And finally, you've got to consider the number of customers this would benefit.  It's my understanding that the market penetration of multi-channel (more than 2) audio solutions is somewhere around 1%.  That means that of the 500,000,000 Windows customers, only somewhere around 5,000,000 of them could take advantage of this feature.  And of the 5,000,000 or so people with multi-channel sound solutions, how many of them have only 2 speakers plugged into them?  I suspect that number is significantly smaller.

    So this feature is likely to benefit only a tiny fraction of the Windows customers.  Adding features, especially extremely complicated features like this one has huge costs associated with them - you've got to add tests for the feature, documentation, UI, localization, etc.

    There's an overarching thread throughout the Vista audio stuff - we're trying to make things easier for the vast majority of users and we're actively trying to reducing the complexity of the system.  This feature doesn't do any of that, instead it seems to me that it introduces a great deal of complexity that will benefit a very small subset of the Windows user base.

    IMCHO, this idea seems to be a complete non-starter to me - there are too few people who will take advantage of it and it has the potential of messing up a significant number of people's machines.

  • Larry Osterman's WebLog

    Volume control in Vista

    • 25 Comments

    Before Vista, all of the controls available to applications were system-wide - when you changed the volume using the wave volume APIs, you changed the hardware volume, thus effecting all the applications in the system.  The problem with this is that for the vast majority of applications, this was exactly the wrong behavior.  This behavior was a legacy of the old Windows 3.1 audio architecture, where you could only have one application playing audio at a time.  In that situation, there was only one hardware volume, so the behavior made sense.

    When  the WDM audio drivers were released for Win98, Microsoft added kernel mode audio mixing, but it left the volume control infrastructure alone.  The volume controls available to the Windows APIs remained the hardware volume controls.  The reason for this is pretty simple: Volume control really needs to be per-application, but in the Win98 architecture, there was no way of associating individual audio streams with a particular application, instead audio streams were treated independently.

    The thing is, most applications REALLY wanted to just control the volume for their audio streams.  They didn't want (or need) to mess with other apps audio streams, that was just an unfortunate side effect of the audio architecture.

    For some applications, there were solutions.  For instance, if you used DirectSound (or DirectShow, which is layered on DirectSound), you could render your audio streams into a secondary buffer, since DSound secondary buffers had their own volume controls, that effectively makes their volume control per-application.   But it doesn't do anything to help the applications that don't use DSound, they're stuck with manipulating the hardware volume.

     

    For Vista, one of the things that was deployed as part of the new audio infrastructure was a component called "Audio Policy".  One of the tasks of the policy engine is tracking which audio streams belong to which application.

    For Vista, each audio stream is associated with an "audio session", and the audio session is roughly associated with a process (each process can have more than one audio session, and audio sessions can span multiple process, but by default each audio session is the collection of audio streams being rendered by the process).

    Each audio session has its own volume control, and WASAPI exposes interfaces that allow applications to control the volume of their audio session.  The volume control API also includes a notification mechanism so applications that want to be notified when their volume control changes can implement this - this mechanism allows an application to track when someone else changes their volume.

    This is all well and good, but how does this solve the problem of existing applications that are using the hardware volume but probably don't want to?

    Remember how I mentioned that all the existing APIs were plumbed to use WASAPI?  Well, we plumbed the volume controls for those APIs to WASAPI's volume control interfaces too. 

    We also plumbed the mixerLine APIs to use WASAPI.  This was slightly more complicated, because the mixerLine API also requires that we define a topology for audio devices, but we've defined a relatively simple topology that should match existing hardware topologies (so appcompat shouldn't be an issue).

    The upshot of this is that by default, for Vista Beta2, we're going to provide per-application volume control for the first time, for all applications.

    There is a very small set of applications that may be broken by this behavior change, but we have a mechanism to ensure that applications that need to manipulate the hardware volume using the existing APIs will be able to work in Vista without rewriting the application (if you've got one of those applications, you should contact me out-of-band and I'll get the right people involved in the discussion).

  • Larry Osterman's WebLog

    Another Channel9 Video

    • 2 Comments

    Last month, Elliot Omiya and I sat down with Charles Torre of Channel 9 and did a brain dump of the audio stack and APIs in Windows Vista.

    The Channel9 guys just went live with the video, you can see it here.

     

    Enjoy.

     

  • Larry Osterman's WebLog

    Heard over the weekend...

    • 7 Comments

    We were chatting with friends about the interview with Tahrmina Martely on APM's Weekend America about her first experiences in the US (listen to it here (sorry, Real Media only :())

    Tahrmina tells the story about when she first came to the US as a high school exchange student.  Before she left Bangladesh, her mother told her to eat anything that her hosts offered her because it would be polite.  Well, she got to the US and her host family was having a barbeque in her honor.  Her host mom offered her a hot dog, and Tahrmina, remembering her mothers advice accepted it.  She knew that Americans ate unfamiliar stuff, but she had never realized we ate dogs.   Then she went to the grill and looked at them.  She thought and thought about it and the only thing she could figure out was that there were entire farms here in the U.S. with dogs that were all barking with really high pitched voices.  What kind of barbarians would do that to a dog?  And they fed them to their KIDS!

    Anyway, Tahrmina tells the story far better than I do, you should listen to the story, but it sets the stage...

    As we were all laughing about this event, Daniel made a comment about "Enoch's". 

    Danel's a really fast reader, and as a consequence, he doesn't always spend a huge amount of time figuring out the pronounciation of new words.  Normally, this isn't a problem, but...

    "Umm..  Daniel, it's pronounced "u-nik".

    "What's the big deal, Enoch, Eunuch, who cares?"

    "Daniel, Enoch is a mans name, a Eunuch is a castrata.  And Unix is an operating system.".

     

     

    Ok, I guess you had to be there, Dave Barry has no fear of losing his day job...

    Edit: Fixed spelling issues. and more accurately phoneticized eunuch.

     

  • Larry Osterman's WebLog

    Da Goat Horns

    • 9 Comments
    Raymond's post the other day about a giant inflatable bunny that showed up in his office finally prompted me to get off my duff and borrow a scanner from the PM in my group so I could scan in this picture:

    Larry wearing the goat horns hat

    The scan's pretty horrible, but you get the idea (I was so skinny back then :)). 

    The "Goat Horns" were the NT 3.1 team's idea of "Whimsical embarrassment" - basically when you broke the build, you had to wear the goat horns for the next couple of days (or until someone else broke the build).

    The original goat horns got lost a year or so after we shipped NT 3.1, but we had a "wake" for Windows NT back in 2000 (when it was renamed Windows 2000), and someone went out and bought a new set of goat horns for me to wear for the party (I was the first recipient of the goat horns, and somehow I managed to become permanently associated with them).  The 2nd generation horns are currently sitting in a box in my office somewhere (I think :))

     

  • Larry Osterman's WebLog

    Where does WASAPI fit in the big multimedia API picture?

    • 15 Comments

    I've previously mentioned that we're adding a new low level multimedia audio rendering api for Vista called WASAPI (Windows Audio Session API).  As I'd mentioned, all the existing audio rendering APIs in Windows have been replumbed to use this new API.

    I just got an email from someone asking for more details about WASAPI.

    One of his questions was "How does WASAPI fit in the big picture of multimedia APIs".  It's actually a good question, because there are so darned many APIs to chose from.

    We're currently targeting WASAPI for people who need to be as absolutely close to the metal as they need to be.  We're not intending it to be a general purpose audio rendering API, frankly WASAPI pushes too many requirements on the application that's rendering audio for it to be useful as a general purpose API.  The biggest hurdle is that to successfully use WASAPI, you need to be able to write a sample rate converter - using WASAPI to render audio requires that you be able to generate audio samples at the sample rate specified by the audio engine - the engine won't do the SRC for you.

    On the other hand, there ARE a set of vendors who have a need for a low level, low latency audio rendering API - most of the applications that fit into this category tend to be pro-audio rendering applications, so we're publishing the WASAPI interfaces to allow those ISVs to build their applications.  These ISVs need to run as close to the metal as humanly possible, and WASAPI will let them get pretty darned close.

    For Vista, we’re adding a new high level multimedia API called Media Foundation, or MF.  MF is intended to fix some of the limitations in the DirectShow architecture that really can’t be solved without a significant architectural change.  In particular, MF streamlines the ability to render media content, fixing several serious deficiencies in the DShow threading model, and allowing for dynamic multimedia graphs.  In addition, MF filters are self-contained – they can be built and tested outside the multimedia pipeline.  Oh, and MF makes it possible to play back next generation premium content☺.

    For the vast majority of applications, they should just stick with the existing audio rendering APIs - none of the changes we've done for Vista should break any existing APIs (with the exception of the AUX family of APIs).  If you're interested in playing back the next generation of multimedia content, then you should seriously look at MF - it provides some critical infrastructure that is necessary for high quality multimedia playback that simply isn't there in previous versions.

    It's my understanding that full documentation of WASAPI will be available for Vista Beta2 (our team has been doing documentation reviews for the past month or so).

     

  • Larry Osterman's WebLog

    What's wrong with this code, part 16, the answers

    • 13 Comments
    I intentionally made the last "What's wrong with this code" simple, because it was just intended to exhibit a broken design pattern.

    The bug appears here:

            cWaveDevices = waveOutGetNumDevs();
     

    The problem is that waveOutGetNumDevs is in winmm.dll, and MSDN clearly states:

    The entry-point function should perform only simple initialization or termination tasks. It must not call the LoadLibrary or LoadLibraryEx function (or a function that calls these functions), because this may create dependency loops in the DLL load order. This can result in a DLL being used before the system has executed its initialization code. Similarly, the entry-point function must not call the FreeLibrary function (or a function that calls FreeLibrary) during process termination, because this can result in a DLL being used after the system has executed its termination code..

    Because Kernel32.dll is guaranteed to be loaded in the process address space when the entry-point function is called, calling functions in Kernel32.dll does not result in the DLL being used before its initialization code has been executed. Therefore, the entry-point function can call functions in Kernel32.dll that do not load other DLLs. For example, DllMain can create synchronization objects such as critical sections and mutexes, and use TLS.

    Calling functions that require DLLs other than Kernel32.dll may result in problems that are difficult to diagnose. For example, calling User, Shell, and COM functions can cause access violation errors, because some functions load other system components. Conversely, calling functions such as these during termination can cause access violation errors because the corresponding component may already have been unloaded or uninitialized.

    The problem with waveOutGetNumDevs() (and all the other MME APIs) is that under the covers they call LoadLibrary (to load wdmaud.drv).  They also read from the registry, RPC into the audiosrv service, and lots of other stuff.  In Windows XP, it appears that a fair number of applications got lucky and didn't hit the deadlocks inherent in these functions, but for Vista, one of the consequences of the new audio engine architecture is that it is now 100% guaranteed that if you call the MME APIs from your DllMain, you are absolutely going to deadlock your application.

     

    Kudos:

    The first poster: Mike asked exactly the right question - waveOutGetNumDevs will trigger the loading of another DLL, as did most of the other commenters.

    Seth McCarus thought it was "interesting" that the last parameter was of type PCONTEXT. Since a PCONTEXT is a PVOID, it's not really that surprising.

    Skywing made the most complete description of the errors, I'm including it entirely because it's well written (I edited out the APC stuff because (to be frank) I have no idea if that's true or not):

    As for what's wrong with this code:

    - It's dangerous to try and call functions from DllMain that reside in DLLs that your DLL is dependant on. The reason is that in some cases that DLL may not have run its DllMain yet, and if those functions rely on some state initialized during DllMain they may not operate as expected. This situation doesn't occur always, and as far as I know, with the current implementation, you'll only really see it in the case of either circular dependencies or some situations where you are loading a DLL dynamically (i.e. not at process-time initialization). The exceptions to this are DLLs that have only depedencies that you are logically guaranteed to have already initialized by the time your DllMain occurs. These include NTDLL (always initializes first) and KERNEL32 for Win32 programs (only depends on NTDLL and the main process image or one of its dependencies will always have linked to it).

    - Depending on what dprintf does, that this might be bad. Particularly if it does something like make a CRT call and you are using the DLL version of the CRT. I would expect that it probably uses _vsnprintf or something of that sort, so I would flag it as dangerous.

    - Masking off the top 16-bits for waveOutGetNumDevs is curious; there should not be any reason to do this according to the documentation for that function. Perhaps this is an artifact of some code ported from Win16, or maybe there is some implementation detail of waveOutGetNumDevs that this DLL is relying on.

    Mea Culpas:

    My code was wrong - the internal message that's used to get the number of devices returns an error code in the high 16 bits of the returned count of devices, for some reason, I assumed that error propagation mechanism applied to waveOutGetNumDevs, it doesn't.

  • Larry Osterman's WebLog

    What's wrong with this code, part 16

    • 15 Comments

    This real-world problem shows up as an appcompat problem about every two or three weeks, so I figured I'd write it up as a "What's wrong with this code" snippet.

     

    BOOL DllMain(HMODULE hModule, ULONG Reason, PCONTEXT pContext)
    {
        ghInst = (HINSTANCE) hModule;

        if (Reason == DLL_PROCESS_ATTACH) {
            DWORD cWaveDevices = 0;
            DisableThreadLibraryCalls();
            cWaveDevices = waveOutGetNumDevs();
            if ((cWaveDevices & 0xffff0000) != 0)
            {
                dprintf("Error retrieving wave device count\n");
            }
            < Do some other initialization stuff >
        } else if (Reason == DLL_PROCESS_DETACH) {
            < Do some other initialization stuff >
        }
        return TRUE;
    }

    This one's really simple, and should be blindingly obvious, but it's surprising the number of times people get it wrong.

     

  • Larry Osterman's WebLog

    What's the big deal with Vista Betas?

    • 16 Comments
    In this somewhat silly Channel9 post about unsubstantiated rumors of Microsoft canceling Vista Beta 2, "Sabot" wrote the following question:

    What's all the fuss about a Beta? I don't actually get it? It comes when it's ready and I'm happy with that.

    I'm one of those guys that's not really all that fussed about knowing the release date of any product, I always architect systems using existing technology, it's a school-boy error to do otherwise because the plain truth is, there is a 50/50 risk it won't be released by the time you need it in the project plan, and that in terms of risk is to high a percentage to be acceptable

    As a consumer of Microsoft's operating systems, I'm 100% behind him.  I don't get the hype (although I think seeing the purported screen shots is cool).

    As a developer who is working on Microsoft's operating systems, I have a rather different view.

    You see, the betas of Microsoft's operating systems are really the only opportunity that Microsoft's developers have to see the "real world" outside the ivory towers of Microsoft.

    Within Microsoft, you see, we all tend to have more-or-less the same types of machines.  They come from maybe a dozen different manufacturers, and they all have reasonably different capabilities (for example, many don't have any audio).  But they tend to be pretty homogenous - 1.5 to 3 GHz machines with between 256M and 2G of RAM, an "ok" video card, and an on-the-motherboard audio and network solutions.  The thing is, they don't even come close to representing the diversity of systems that our customers have.  We typically don't have the $200 low-end computers in our office, neither do we have $7,000 watercooled ultra high-end gaming rigs.

    We also don't have all the myriad applications that our customers have.  We've got an amazingly huge library of apps to test (I know, I've been looking at appcompat failures), but we still get reports of problems in apps that we've never heard of.  We also don't have the applications that haven't yet shipped, so we have no ability to test them.

    As a result, the OS developers within Microsoft LOVE betas.  We love it when real customers get a chance to use our stuff, because their environments are so much more diverse than any we could possibly have.  In reality, we do a pretty darned good job of weeding out the bugs before we ship (for instance, we've only found a handful of new customer reported audio bugs since Beta1), but there's no substitute for honest-to-goodness in-the-field tests.

    It's kinda interesting - Platform software is very different from just about every other engineering endeavor, because the compatibility matrix is SO complicated.  You can build an automobile totally in isolation - you can solve every engineering problem you're likely to ever find before it leaves the plant, because you can pretty accurately gauge how your customers are going to use your product.  But you can't do that with a software platform, because the number of potential use scenarios is essentially limitless.  So for software platforms, betas are indispensable tools that provide critical feedback for the developers of the platform.

Page 1 of 1 (10 items)