Larry Osterman's WebLog

Confessions of an Old Fogey
Blog - Title

Volume control in Vista

Volume control in Vista

  • Comments 25

Before Vista, all of the controls available to applications were system-wide - when you changed the volume using the wave volume APIs, you changed the hardware volume, thus effecting all the applications in the system.  The problem with this is that for the vast majority of applications, this was exactly the wrong behavior.  This behavior was a legacy of the old Windows 3.1 audio architecture, where you could only have one application playing audio at a time.  In that situation, there was only one hardware volume, so the behavior made sense.

When  the WDM audio drivers were released for Win98, Microsoft added kernel mode audio mixing, but it left the volume control infrastructure alone.  The volume controls available to the Windows APIs remained the hardware volume controls.  The reason for this is pretty simple: Volume control really needs to be per-application, but in the Win98 architecture, there was no way of associating individual audio streams with a particular application, instead audio streams were treated independently.

The thing is, most applications REALLY wanted to just control the volume for their audio streams.  They didn't want (or need) to mess with other apps audio streams, that was just an unfortunate side effect of the audio architecture.

For some applications, there were solutions.  For instance, if you used DirectSound (or DirectShow, which is layered on DirectSound), you could render your audio streams into a secondary buffer, since DSound secondary buffers had their own volume controls, that effectively makes their volume control per-application.   But it doesn't do anything to help the applications that don't use DSound, they're stuck with manipulating the hardware volume.


For Vista, one of the things that was deployed as part of the new audio infrastructure was a component called "Audio Policy".  One of the tasks of the policy engine is tracking which audio streams belong to which application.

For Vista, each audio stream is associated with an "audio session", and the audio session is roughly associated with a process (each process can have more than one audio session, and audio sessions can span multiple process, but by default each audio session is the collection of audio streams being rendered by the process).

Each audio session has its own volume control, and WASAPI exposes interfaces that allow applications to control the volume of their audio session.  The volume control API also includes a notification mechanism so applications that want to be notified when their volume control changes can implement this - this mechanism allows an application to track when someone else changes their volume.

This is all well and good, but how does this solve the problem of existing applications that are using the hardware volume but probably don't want to?

Remember how I mentioned that all the existing APIs were plumbed to use WASAPI?  Well, we plumbed the volume controls for those APIs to WASAPI's volume control interfaces too. 

We also plumbed the mixerLine APIs to use WASAPI.  This was slightly more complicated, because the mixerLine API also requires that we define a topology for audio devices, but we've defined a relatively simple topology that should match existing hardware topologies (so appcompat shouldn't be an issue).

The upshot of this is that by default, for Vista Beta2, we're going to provide per-application volume control for the first time, for all applications.

There is a very small set of applications that may be broken by this behavior change, but we have a mechanism to ensure that applications that need to manipulate the hardware volume using the existing APIs will be able to work in Vista without rewriting the application (if you've got one of those applications, you should contact me out-of-band and I'll get the right people involved in the discussion).

  • I watched your most recent Channel 9 Vid and I thought the architectural details were really interesting. I thought of two questions about it:

    1. The user-mode mapped DMA memory sits in the svchost which contains WASAPI, right? How much memory copying has to go on to get rendered audio into that and how does WASAPI efficiently inform the soundcard that it's time to grab that data.

    2. The mixing seems to be done in software. Can WASAPI also take advantage of mixing hardware in the soundcard? Maybe I have some misconceptions and such mixers don't exist for PCM in hardware.

    Thanks for doing the interview.
  • Are the settings of each such "audio sessions" going to be persisted? Or will I be able to control the volume for each app individually but it will reset to some default every time I close and restart that application? If the settings are going to be persisted what is going to be the key to those, process name?
  • Jerry, yes they are persistent unless the application over-rides the volume on load (e.g. Making it 100% each load).
  • I think both kinds of volume controls are needed, because 10% of the time the user really wants to adjust and/or mute their overall volume and the other 90% of the time the user wants each application to remember its own settings (i.e. the user doesn't want to repeat those settings 90% of the time). At least each application should be able to make it clear to the user which kind of setting is being done at each time.

    Compare this to keyboard input. Switching Caps Lock on or off applies to all applications in the PC (except for terminal services clients, and sometimes Virtual PC gets funnied), but switching IME modes applies only to an individual application. Users can get used to this inconsistency and double-check each time they switch applications. It would be better if the user could specify how widely they want each change to be applied.
  • Jerry, Manip's right, they persist.

    nksing: No memory copies are done when using a WaveRT compliant device in exclusive mode - the application renders directly into the DMA buffer on the audio solution. You don't need to inform the sound card, they just render whatever samples are in the DMA buffer.

    And no, we don't take advantage of hardware mixing. Realistically, "mixing" is just adding the samples together. It would be more work to use the hardware, especially since there is processing that needs to be done post-mix (global audio effects, software volume if there's no hardware volume, software metering if there's no hardware metering, loopback processing, etc). A hardware mixer wouldn't help.
  • Btw, Norman, your comment is 100% spot on and consistent with our designs.
  • Many keyboards come with volume up/down/mute buttons. For all but couple applications they change the global mixer settings, but when the couple specific apps are active, they control just that applications volume. (WinDVD is such case I believe)

    Is there any changes in the keyboard volume control behaviour?
  • Do you have some specs/sdk about WASAPI? Which functions are available? (Despite old mixerxxx functions)
  • Joku, we've plumed the HID volume messages handled by the shell to talk to the hardware volume interfaces.

    For WinDVD, in XP it controls the hardware volume - it's a great example of an application that doesn't need to control the master volume but does.

    The SZ: Yes, specs will be forthcoming, they're in review right now and should be available by Beta2.
  • Regarding your last statement --

    Does this imply that there will be some kind of (advanced) UI that will let users pick whether a program will control the "system" volume or the "application" volume? Or is it just like the existing appcompat hack support, where you would have to download something like the appcompat toolkit and create a custom policy for a program that makes the API use the "system" volume and not the "application" volume?
  • Skywing, there's no UI (that would be too hard). You need to install an appcompat shim to have the mixer APIs use the hardware.

    We've not yet found any application that needs to use the hardware volume, btw, it's my expectation that the number of apps that will need to be shimmed could probably be counted on one persons fingers and toes (I may be wrong, but it's not likely to be many).
  • In the December CTP the volume control slider (controls) still needs some work. Try moving it to 50% volume and then using a high sensitivity mouse quickly push it to max, so that mouse pointer goes out from the mixer panel. The volume will be left at somewhere between 50 and 95% quite often instead of the 100% the user intended.
  • Joku, we know. That version of the volume UI used a custom slider control, in the current builds, we've changed to using the slider common control and the behavior should be much more reliable.
  • What about Windows NT 3.x?
  • I find pretty much 100% of the time I want to alter global settings; the individual volume settings on iTunes, WMP, etc are left at 100% all the time because otherwise one ends up having to keep track of two volume settings.

    I guess if there was an app that was consistently loud or soft, the individual app settings would be applicable, but how often does that happen, really? I would think normalization standards (assuming there are some) should take of that. I'm really curious what these apps are that people would want user-specifc settings. (I guess something like lowering the music volume when a Skype call comes in would work, but I would guess most people would just pause it.)

    I'm really surprised usability testing showed that individual volume controls are the way to go. I would think people would want fewer volume controls, not more. I'm surprised the different volume controls in WMP and Windows hasn't been a huge usability red flag.

    I find, in my experience, the #1 one problem is the volumn control doesn't have nearly enough nearly enough sensitivity at the bottle, going from medium loud to silent without much in between... thus resulting in adjusting both the global setting and iTunes to reach a comfortable level. (Or adjusting the volume directly on the external speakers.)

    Especially for things like, say, my Dell laptop, where a comfortable listening volume for speakers will blow your ears out when listening through earphones. If someone corrected for that with the iTunes volume control, and then a loud Flash animation played in IE... yowza!
Page 1 of 2 (25 items) 12